UTF-8 support not on the road map?

Thomas Orgis thomas-forum at orgis.org
Mon Dec 31 03:21:04 CET 2007


Am Sun, 30 Dec 2007 23:57:16 +0100
schrieb Thorsten Haude <yoo at vranx.de>: 

> * Jörg Fischer wrote (2007-12-30 23:30):
> >In addition, as next step, the internal encoding could be changed from
> >one byte per character to two byte (fixed-length, eg UCS-2) per
> >character.

I guess this will be hard enough to do; supporting the Unicode BMP with
16bit encoding.
Of course that is not full Unicode... one could go 32bit and encompass
all of Unicode as we know today, but then, should one be serious and
_really_ support all the funny rules of it? -- I doubt that
this is feasible for nedit, while staying efficient.

At last, it's no text/word processor in the "office" sense.

But I still use it for editing my larger text documents (scientific
stuff) -- in LaTeX!
LaTeX is sort of programming again and nedit has support for that
language (even a macro pack).
While LaTeX takes care of the "nasty" things, like correct word spacing
and line breaking or ligatures (not to mention all the math fun), on
its own, it does support basic input of non-ASCII characters via UTF-8
nowadays.
I like to use that support since my Linux systems are using UTF locales
(by my own decision) and I'd like to have either plain english or UTF-8
text documents.

Well, my language bias is also in the latin range, being a german native,
so the Unicode BMP would serve me well with a fixed 16bit internal
representation (so, ignoring the F in UTF-16), perhaps that's different
for Zhang Weiwu.

> If we seriously start doing this, we should have a lengthy discussion
> about the best solution anyway.

I have the fear that after making nedit _completely_ Unicode compilant,
including bidi writing and whatnot, it won't be the efficient
programming tool anymore. I still have the horrifying memory of waiting
for kate (KDE's editor) to finish scrolling down in a bigger text file.
Perhaps that fear is is unfounded; perhaps one just has to do it "right"
(and I shouldn't try to use the resulting editor on my older computers).

But doing _something_ about unicode is needed for nedit. It think that
switching to 16bit internal encoding and limiting support to the
Unicode BMP (including the input filter that finds the UTF-8/UTF-16
sequences indicating other planes and warns the user appropriately) is
one viable choice that should hold for the not too distant future.

Hm, or, perhaps for the level of support that nedit would offer
supporting more planes won't be the problem as such.
But actually, I must admit that I'm a bit dizzy about how hard the task
really is. Do we have someone here who actually worked with Unicode's
details and knows what nedit really has to do?

I know I basically want to be able to work with the same
latin languages (german, english, french...) as before (plus some math
symbols just because I can;-) in UTF-8 documents with nedit.
Now I use wrapper scripts with iconv to achieve parts of that, when
nedit has native unicode support, it should be a bit better than this.
Also, my terminal (urxvt) does display greek, japanese and whatnot...
would be just if my text editor would be able to do so, too.


In the hope of not being too prosaic here,

Thomas O.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.nedit.org/pipermail/discuss/attachments/20071231/759080dc/signature-0001.bin


More information about the Discuss mailing list