[ nedit-Bugs-1760116 ] Negated escape sequences misinterpreted
in character class
Joerg Fischer
jf505 at gmx.de
Fri Jul 27 20:05:12 CEST 2007
> Interestingly, I notice that (?n\W) does not match newlines (my patch
> allows (?n[\W]) to do so, which is rather inconsistent). This is true also
> for \L, \D. Also \y without (?n ) around it will match newline. I believe
> these to be faults. What about you?
Well, what's right and what's wrong? Here is another quote from
NEdit's help:
By default, NEdit regular expressions will NOT match a
newline character for the following regex tokens: dot
(`.'); a negated character class (`[^...]'); and the
following shortcuts for character classes:
`\d', `\D', `\l', `\L', `\s', `\S', `\w', `\W', `\Y'
The matching of newlines can be controlled for the `.'
token, negated character classes, and the `\s' and `\S'
shortcuts by using one of the following parenthetical
constructs:
(?n<regex>) `.', `[^...]', `\s', `\S' match newlines
(?N<regex>) `.', `[^...]', `\s', `\S' don't match
newlines
`(?N<regex>)' is the default behavior.
This is unusual, since in Perl only the anchors ^ and $ and the any
character . are affected by a m(ultiple) or s(ingle) line modifier.
That is, in Perl \s = [ \t\n\r\f] and \S = [^ \t\n\r\f], so \s matches
newlines (always) and \S does not (never), which is what one would
normally expect. Similarly, for the other escape sequences.
(In NEdit by default \s = [ \t\v\r\f], which includes vetical tabs,
and \S = [^ \t\n\v\r\f], where we needn't list the \n, because by
NEdit's default a negated character class will not match \n anyway.)
Now, we are not Perl but inside a text editor, and the preceding NEdit
convention was done, because
MATCHING NEWLINES
NEdit regular expressions by default handle the matching
of newlines in a way that should seem natural for most
editing tasks. There are situations, however, that
require finer control over how newlines are matched by
some regular expression tokens.
However, the fact that (?n\S) matches newlines just like (?n\s) seems
somehow upside down. In effect, the (?N<regex>) construct excludes
newlines from all character classes, and (?n<regex>) does just the
opposite, ie, even includes newlines to classes where they normally
wouldn't belong to.
I can't tell whether or not these conventions are natural, or at least
handy - in any case they can't be no more than that, because you can
always define a regex doing what you want and if it is by mixing
(?n<regex>) and (?N<regex>) constructs.
However, these conventions are incompatible with standards (ie, Perl),
and I vaguely recall that from time to time some folks complain
because NEdit regexes aren't more Perl compatible...
OK, so what to do now? I don't know, really. I think character
classes, or these shorthand notations, should generally match what
they say. This holds also for the NEdit specific \y and \Y which IMO
shouldn't be affected by the (?n<regex>) or (?N<regex>) constructs.
Moreover, \S should never match newlines (I believe), and \D, \L, \W
should match newlines -- at least with the help of (?n<regex>).
Cheers,
Jörg
More information about the Develop
mailing list