length()

Tony Balinski ajbj at free.fr
Wed Mar 15 20:08:18 CET 2006


Quoting "A.V.Kuznetsov" <kuzn at umps.mephi.ru>:

> Hi All!
>
> Function length(string) returns length of a string in "symbol
> notation", i.e. length("\t\t\t") returns 3.

It returns the number of (single byte) characters, in fact.

> Macro is needed to calculate length of string in "nedit notation",
> since length of string "\t\t\t" is equal to 3$tab_dist.

You're asking for the column width of the string. All "normal"
characters (isprint/isgraph) have a width of one. Control characters
use up more columns, since their "visual representations" are strings
such as <vt>, <ack> and so on (the names are hard-coded in the source
code). The exceptions here are newline (which has no visual representation)
and tab, whose column width depends on two things - the value of $tab_dist
and the column at which it is found.

> Is it possible to add some keyword to length() arguments to switch
> "notations"? I.e. length("\t") returns 1, but
> lentgh("\t", "keyword") returns value of $tab_dist.

I do think there should be a bit more support for this sort of thing,
but it is, in my experience, rarely required. I don't really know what form
a new built-in function, or an "extended" old one, should take. Of course,
you can use nedit macro to write what you need here. eg

  # width = colwidth(s [, colpos [, tab_dist]])
  # return the width in columns of a string s positioned at
  # column colpos in a line (defaults to zero), and uses a
  # tab width of tab_dest (defaults to $tab_dest) - it
  # assumes no control characters in the string other than
  # tab, and no newlines.
  define colwidth
    {
    s = $1

    if ($n_args > 1)
      colpos = $2
    else
      colpos = 0

    if ($n_args > 2)
      tab_dist = $3
    else
      tab_dist = $tab_dist

    etab = 0
    last = 0
    col = colpos
    # search for tab sequences
    for (tab = search(s, "\t+", 0, "regex"); \
         tab >= 0; \
         tab = search(s, "\t+", etab, "regex"))
      {
      etab = $search_end
      # add non-tab char widths
      col += tab - last
      # add variable width of first tab in tab sequence
      col += tab_dist - (col % tab_dist)
      # add width of following tabs in tab sequence
      col += (etab - tab - 1) * tab_dist
      }
    # add width of chars following last tab
    col += length(s) - etab
    # finally, remove our start column position
    return col - colpos
    }

You can also check out the detab/retab macro functions on Niki:
  http://www.nedit.org/niki/index.php/ConvertingTabs

Another issue is i18n where character sets like Unicode provide
"double-width glyphs" for characters, mainly Asian. When we go that
way, we will have to rethink a number of assumptions: valid text
characters will not necessarily be one byte wide (so number of
characters will not equal number of bytes) and more work will be
needed to provide a suitable description of column widths for
double-width characters at least (this shouldn't be too bad because
variable width glyphs are already reasonably well handled by the
nedit text widget - but see what happens when you do column selects
in a help window!).

> Alexey Kuznetsov

Hope this helps.

Tony


More information about the Discuss mailing list