|
back to the help index
Highlighting Patterns
Writing Syntax Highlighting Patterns
Patterns are the mechanism by which language syntax highlighting is
implemented in NEdit (see Syntax Highlighting under the heading of Features
for Programming). To create syntax highlighting patterns for a new
language, or to modify existing patterns, select "Recognition Patterns" from
"Syntax Highlighting" sub-section of the "Default Settings" sub-menu of the
"Preferences" menu.
First, a word of caution. As with regular expression matching in general, it
is quite possible to write patterns which are so inefficient that they
essentially lock up the editor as they recursively re-examine the entire
contents of the file thousands of times. With the multiplicity of patterns,
the possibility of a lock-up is significantly increased in syntax
highlighting. When working on highlighting patterns, be sure to save your
work frequently.
NEdit's syntax highlighting is unusual in that it works in real-time (as you
type), and yet is completely programmable using standard regular expression
notation. Other syntax highlighting editors usually fall either into the
category of fully programmable but unable to keep up in real-time, or
real-time but limited programmability. The additional burden that NEdit
places on pattern writers in order to achieve this speed/flexibility mix, is
to force them to state self-imposed limitations on the amount of context that
patterns may examine when re-parsing after a change. While the "Pattern
Context Requirements" heading is near the end of this section, it is not
optional, and must be understood before making any any serious effort at
pattern writing.
In its simplest form, a highlight pattern consists of a regular expression to
match, along with a style representing the font an color for displaying any
text which matches that expression. To bold the word, "highlight", wherever
it appears the text, the regular expression simply would be the word
"highlight". The style (selected from the menu under the heading of
"Highlight Style") determines how the text will be drawn. To bold the text,
either select an existing style, such as "Keyword", which bolds text, or
create a new style and select it under Highlight Style.
The full range of regular expression capabilities can be applied in such a
pattern, with the single caveat that the expression must conclusively match
or not match, within the pre-defined context distance (as discussed below
under Pattern Context Requirements).
To match longer ranges of text, particularly any constructs which exceed the
requested context, you must use a pattern which highlights text between a
starting and ending regular expression match. To do so, select "Highlight
text between starting and ending REs" under "Matching", and enter both a
starting and ending regular expression. For example, to highlight everything
between double quotes, you would enter a double quote character in both the
starting and ending regular expression fields. Patterns with both a
beginning and ending expression span all characters between the two
expressions, including newlines.
Again, the limitation for automatic parsing to operate properly is that both
expressions must match within the context distance stated for the pattern
set.
With the ability to span large distances, comes the responsibility to recover
when things go wrong. Remember that syntax highlighting is called upon to
parse incorrect or incomplete syntax as often as correct syntax. To stop a
pattern short of matching its end expression, you can specify an error
expression, which stops the pattern from gobbling up more than it should.
For example, if the text between double quotes shouldn't contain newlines,
the error expression might be "$". As with both starting and ending
expressions, error expressions must also match within the requested context
distance.
Coloring Sub-Expressions
It is also possible to color areas of text within a regular expression
match. A pattern of this type associates a style with sub-expressions
references of the parent pattern (as used in regular expression substitution
patterns, see the NEdit Help menu item on Regular Expressions).
Sub-expressions of both the starting and ending patterns may be colored. For
example, if the parent pattern has a starting expression "\<", and end
expression "\>", (for highlighting all of the text contained within angle
brackets), a sub-pattern using "&" in both the starting and ending expression
fields could color the brackets differently from the intervening text. A
quick shortcut to typing in pattern names in the Parent Pattern field is to
use the middle mouse button to drag them from the Patterns list.
Hierarchical Patterns
A hierarchical sub-pattern, is identical to a top level pattern, but is
invoked only between the beginning and ending expression matches of its
parent pattern. Like the sub-expression coloring patterns discussed above,
it is associated with a parent pattern using the Parent Pattern field in the
pattern specification. Pattern names can be dragged from the pattern list
with the middle mouse button to the Parent Pattern field.
After the start expression of the parent pattern matches, the syntax
highlighting parser searches for either the parent's end pattern or a
matching sub-pattern. When a sub-pattern matches, control is not returned to
the parent pattern until the entire sub-pattern has been parsed, regardless
of whether the parent's end pattern appears in the text matched by the
sub-pattern.
The most common use for this capability is for coloring sub-structure of
language constructs (smaller patterns embedded in larger patterns).
Hierarchical patterns can also simplify parsing by having sub-patterns "hide"
special syntax from parent patterns, such as special escape sequences or
internal comments.
There is no depth limit in nesting hierarchical sub-patterns, but beyond the
third level of nesting, automatic re-parsing will sometimes have to re-parse
more than the requested context distance to guarantee a correct parse (which
can slow down the maximum rate at which the user can type if large sections
of text are matched only by deeply nested patterns).
While this is obviously not a complete hierarchical language parser it is
still useful in many text coloring situations. As a pattern writer, your
goal is not to completely cover the language syntax, but to generate
colorings that are useful to the programmer. Simpler patterns are usually
more efficient and also more robust when applied to incorrect code.
Deferred (Pass-2) Parsing
NEdit does pattern matching for syntax highlighting in two passes. The first
pass is applied to the entire file when syntax highlighting is first turned
on, and to new ranges of text when they are initially read or pasted in. The
second pass is applied only as needed when text is exposed (scrolled in to
view).
If you have a particularly complex set of patterns, and parsing is beginning
to add a noticeable delay to opening files or operations which change large
regions of text, you can defer some of that parsing from startup time, to
when it is actually needed for viewing the text. Deferred parsing can only
be used with single expression patterns, or begin/end patterns which match
entirely within the requested context distance. To defer the parsing of a
pattern to when the text is exposed, click on the Pass-2 pattern type button
in the highlight patterns dialog.
Sometimes a pattern can't be deferred, not because of context requirements,
but because it must run concurrently with pass-1 (non-deferred) patterns. If
they didn't run concurrently, a pass-1 pattern might incorrectly match some
of the characters which would normally be hidden inside of a sequence matched
by the deferred pattern. For example, C has character constants enclosed in
single quotes. These typically do not cross line boundaries, meaning they
can be parsed entirely within the context distance of the C pattern set and
should be good candidates for deferred parsing. However, they can't be
deferred because they can contain sequences of characters which can trigger
pass-one patterns. Specifically, the sequence, '\"', contains a double quote
character, which would be matched by the string pattern and interpreted as
introducing a string.
Pattern Context Requirements
The context requirements of a pattern set state how much additional text
around any change must be examined to guarantee that the patterns will match
what they are intended to match. Context requirements are a promise by NEdit
to the pattern writer, that the regular expressions in his/her patterns will
be matched against at least <line context> lines and <character context>
characters, around any modified text. Combining line and character
requirements guarantee that both will be met.
Automatic re-parsing happens on EVERY KEYSTROKE, so the amount of context
which must be examined is very critical to typing efficiency. The more
complicated your patterns, the more critical the context becomes. To cover
all of the keywords in a typical language, without affecting the maximum rate
at which users can enter text, you may be limited to just a few lines and/or
a few hundred characters of context.
The default context distance is 1 line, with no minimum character
requirement. There are several benefits to sticking with this default. One
is simply that it is easy to understand and to comply with. Regular
expression notation is designed around single line matching. To span lines
in a regular expression, you must explicitly mention the newline character
"\n", and matches which are restricted to a single line are virtually immune
to lock-ups. Also, if you can code your patterns to work within a single
line of context, without an additional character-range context requirement,
the parser can take advantage the fact that patterns don't cross line
boundaries, and nearly double its efficiency over a one-line and 1-character
context requirement. (In a single line context, you are allowed to match
newlines, but only as the first and/or last character.)
back to the help index
|