Zend Framework: Zend_Syntax Component Proposal
| Proposed Component Name | Zend_Syntax |
|---|---|
| Developer Notes | http://framework.zend.com/wiki/display/ZFDEV/Zend_Syntax |
| Proposers | My E-mail Address |
| Revision | 1.1 - 16 Feb 2008: Created. (wiki revision: 12) |
Table of Contents
1. Overview
Zend_Syntax is a component that provides syntax highlighting.
2. References
- // todo
3. Component Requirements, Constraints, and Acceptance Criteria
- This component will provide a set of classes that can be used to create syntax highlighting for (almost) any syntax grammar
- This component will ship with classes that provide highlighting for many common languages
- This component will use flexible view pattern for markup
4. Dependencies on Other Framework Components
- Zend_Exception
- Zend_View
5. Theory of Operation
The component is initialized with a Zend_Syntax_Definition adapter.
The component is used with a call to highlight($string_needing_highlighting) member function and returns the highlighted string.
6. Milestones / Tasks
- Milestone 1: Prototype robust flexible non cumbersome parsing and markup process
- Milestone 2: Finalize naming conventions & Revisions based on community input
- Milestone 3: Working prototype checked into the incubator when requirements are met.
- Milestone 4: Unit tests exist, work, and are checked into SVN.
- Milestone 5: Initial documentation exists.
7. Class Index
- Zend_Syntax_Highlight
- Zend_Syntax_Highlight_Definition
- Zend_Syntax_Highlight_TokenList
- Zend_Syntax_Highlight_Token
- Zend_Syntax_Highlight_State
- Zend_Syntax_Highlight_Keyword
8. Use Cases
| UC-01 |
|---|
usage
Controller Action View
output
9. Class Skeletons
Default view script to be used for syntax definitions will be located in Zend/Syntax/Highlight/Scripts/
| Name | Size | Creator (Last Modifier) | Creation Date | Last Mod Date | Comment | ||
|---|---|---|---|---|---|---|---|
| 9 kb | Jeremy Giberson | Feb 16, 2008 | Feb 16, 2008 | Zend_Syntax_State_Chunk class for matching word types (hex, metrics, etc) | |||
| 3 kb | Jeremy Giberson | Mar 28, 2008 | Mar 28, 2008 | ||||
| 2 kb | Jeremy Giberson (modified by Jeremy Giberson) | Feb 16, 2008 | Mar 28, 2008 | ||||
| 4 kb | Jeremy Giberson | Feb 16, 2008 | Feb 16, 2008 | Zend_Syntax_State_Simple class for matching enclosed content by tags | |||
| 0.3 kb | Jeremy Giberson | Feb 16, 2008 | Feb 16, 2008 | Zend_Syntax_Definition_Keywords class demo definition for simple keyword matching | |||
| 5 kb | Jeremy Giberson | Feb 16, 2008 | Feb 16, 2008 | Zend_Syntax_Definition_TexPad class Demo definition for supporting TextPad syn files | |||
| 2 kb | Jeremy Giberson (modified by Jeremy Giberson) | Feb 16, 2008 | Mar 28, 2008 | ||||
| 1 kb | Jeremy Giberson | Feb 16, 2008 | Feb 16, 2008 | Zend_Syntax component class | |||
| 1 kb | Jeremy Giberson (modified by Jeremy Giberson) | Feb 16, 2008 | Mar 28, 2008 | ||||
| 0.1 kb | Jeremy Giberson | Feb 16, 2008 | Feb 16, 2008 | Zend_Syntax_Exception class | |||
| 1 kb | Jeremy Giberson (modified by Jeremy Giberson) | Feb 16, 2008 | Mar 28, 2008 | ||||
| 2 kb | Jeremy Giberson (modified by Jeremy Giberson) | Feb 16, 2008 | Mar 28, 2008 |
Looks like a usefull component. I'd like to see more semantic output however. Maybe the font tags should replaced by <span class="keyword"> type tags. This way, style kan be applied using stylesheets.
Another concern is the namespace. Syntax can also refer to a syntax checker (lint, tidy). Maybe Zend_Syntax_Highlighter is a better namespace. A third group of welcome functionality has to do with code formatting (tidy again). How would this interact/relate with this component?
BTW, as a fanatic Textpad user, I love the TextPad .syn file support.
The replacement of font with span tags is trivial and I have no problem with it. Either way I would intend to add a helper functions to change the tags on the fly. I would also like to create a format file to work in tandem with textpad .syn files that maps the .syn sections to opening and closing tags and color settings. IE [keywords1] could be mapped to <font color='blue'></font> and [keywords2] could be mapped to <span color='red'></span>. For the css example I presented the change might look like:
One of the initial goals I had for Zend Syntax was to make it the parent component to a slew of syntax related tools. Highlighting, Parsing, Validations etc. The first version I created with Zend_Syntax I had two extended components, Zend_Syntax_Parse_Csv and Zend_Syntax_Highlight_Csv, highlight provided syntax highlighting for a csv formatted file and parse actually mapped the data to an associative array. Both used the same core set of state definitions for parsing the file.
However as I was extending the package for html and other languages it was requiring more and more tweaks for parsing vs highlighting each kept breaking functionality of the other. In the end to simplify matters I built syntax in mind of providing highlighting support only. With that in mind I wouldn't have a problem changing the namespace to Zend_Syntax_Highlight. Which could open doors for Zend_Syntax_Tidy, or Zend_Syntax_Parse. However, I think Tidy is ambiguous and may better fit in the validation components?
Thanks for the comments.
It doesn't seem right to specify formatting with a tokenizer/parser. If I want to use MyFantasticCssParser instead of the TextPad .syn file, I should be able to use the same formatting rules. CSS is more trivial, but for instance a more advanced XML parser might be able to load keywords from DTD's and schema's.
Furthermore, by using a <span class="TOKEN_TYPE"> style formatting, you are able to style the resulting HTML using CSS. Maybe a collection of CSS fragments should accompany this module with some sane defaults.
For instance:
can be used to style this output (shortened for clarity):
Regarding Tidy: I frequently use it to just format (indent) (X)HTML and not for validation. This could however also be done using PHP's XML extentions and there may be other uses for 'tidying' source code.
I don't like the idea of including css defaults simply because it clobbers the namespace. If the user is using .keyword in their style sheet, I don't want it conflicting with the Zend_Syntax_Highlight component. The cheap way around this is using prefix's to the css selectors like ZSH_Keyword, but that seems very inelegant to me.
I came up with the idea of introducing Zend_Syntax_Highlight_Markup (ZSHM) class. It represents the output markup that will be applied to syntax. ZSHM will either extend or utilize the Zend_View component and basically use a view script to output marked up syntax. This comes with some nice advantages:
1) The user will determine what tag mechanism is used for markup, be it span, font, bold, italic, or custom xml tags they use xlst to beautify, etc.
2) The markup object will have a reference to the text it is marking up which can be used by the script
3) Thanks to the Zend_View component, the markup will have access to the view helper system. Used in tandem with 2) this can be used for really powerful highlighting. ie hyper linking function calls or included files to relevant anchor tags later in the syntax or even on other pages, API documents for instance.
Here are the variables that would be available to the markup script:
1) $token_type
The basic syntax element that can be used for simplified highlighting. values include chunk, keyword, comment, string, value (to be solidified when package is finalized).
2) $token_type_extended
Extended token types, for more elaborate highlighting. Values here are dependent on the Zend_Syntax_Definition being used. For example, with html syntax might be "tag", "attribute", "script" etc. For c++ might include "macro", "function", "compiler_directive" etc.
3) $token_text
The text of the syntax being applied this markup.
Here is a sample script that the markup classes would utilize:
<span class='myCssPrefix-<?=$this->token_type?>'><?=$this->token_text?></span>
Currently, Zend_Syntax works by figuring out where the different syntax states starts and ends, and pre/post fixes those positions with the hard coded mark up tags. To utilize this method, the new flow will be:
1) Determine the begin and end positions for each syntax state
2) Build an array of Zend_Syntax_Highlight_Markup, with the state information (type, type_extended, text)
3) For each Zend_Syntax_Highlight_Markup concatinate the $out string with $zshm_obj->render(script.tpl);
4) Return $out (finalized highlighted string)
As far as Tidy goes, I'd prefer to see it as its own Zend Component with full functionality. If you want to beautify the html you are highlighting you should pass the html to the tidy component first, then to Zend_Syntax_Highlight. This way, Zend_Syntax_Highlight (appropriately) covers highlighting (any) syntax, and Zend_Tidy
can properly implement validation, tidying and all the other fringe features it supports.
I've redone the proposal to reflect the new component name (to open up Zend_Syntax to other components like parse, clean?, etc).
I've also retooled the component to make use of Zend_View for rendering markup. This move should make way for some really powerful and easy to implement markup features. The current proposal is very bare boned, several states need to be refactored to fit the new design.
A new feature of the component is the base definition class is able to serve directly as an implementable syntax defintion. In the previous version it had to be extended first for what ever purpose you were going to use it with. The new test case now makes use of the immediate feature and derives a custom english grammar syntax highlighting usage in a relatively simple manner.
ZF Home Page
Code Browser
Wiki Dashboard
As of the initial creation of this proposal, all use cases are functional. All files included in the component in working prototype condition have been uploaded.