Thursday, August 7, 2008

emacs python mode from scratch: stage 3 - some utilities

As I mentioned last time my plan was to do tabbing/indentation next but there was a lot of reliance on the utility section of code so the next step becomes getting these helper functions copied over and understood.

There are about 100 lines in the utility section so let's do half at a time. In the first 50 lines we have the following functions/definitions:

(defsubst python-in-string/comment ()
(defconst python-space-backslash-table
(defun python-skip-comments/blanks (&optional backward)
(defun python-backslash-continuation-line-p ()
(defun python-continuation-line-p ()


  • defsubst is a marcro for inlining a function which seems strange. I wonder if this was just a performance booster. (Is there any other reason to inline code?)
  • syntax-ppss returns info on what the parser state would be at the current position. Below are the states and its clear why nth 8 is the thing we need for determining if we are in a string or comment

    0. The depth in parentheses, counting from 0. Warning: this can be negative if there are more close parens than open parens between the start of the defun and point.
    1. The character position of the start of the innermost parenthetical grouping containing the stopping point; nil if none.
    2. The character position of the start of the last complete subexpression terminated; nil if none.
    3. Non-nil if inside a string. More precisely, this is the character that will terminate the string, or t if a generic string delimiter character should term inate it.
    4. t if inside a comment (of either style), or the comment nesting level if inside a kind of comment that can be nested.
    5. t if point is just after a quote character.
    6. The minimum parenthesis depth encountered during this scan.
    7. What kind of comment is active: nil for a comment of style â?oaâ?? or when not inside a comment, t for a comment of style â?ob,â?? and syntax-table for comment that should be ended by a generic comment delimiter character.
    8. The string or comment start position. While inside a comment, this is the position where the comment began; while inside a string, this is the position where the string began. When outside of strings and comments, this element is nil.
    9. Internal data for continuing the parsing. The meaning of this data is subject to change; it is used if you pass this list as the state argument to another call.

  • syntax-ppss seems like an insanely powerful feature. My respect for the sophistication going on behind the scenes when emacs opens a file for a certain language continues to grow.

python-space-backslash-table and python-skip-comments/blanks

  • This seems to be a "throw away" data structure for overriding the *real* syntax table temporarily
  • It provides a syntax table that redefines "\" as a whitespace class. Presumably this is for allowing movement over otherwise blank continuation lines as if they were whitespace.
  • What this meant wasn't immediately obvious, but it seems to be addressing the type of situation below (which honestly I haven't seen in the wild before, but seems to be legal python code).

    if x and \
    print z

  • For movement we will use forward-comment which moves over up to X comments. It moves backward if arg is negative. The forward motion is straight forward. The backward motion starts by positioning point so that it is at the *start* of the comment (probably to avoid having it forward-comment move it to the start of the comment itself which would probably seem like a noop to the user)


  • This is pretty straight forward. Check if the last char on the previous line is a \ and make sure not in a comment or string
  • syntax-ppss-context is a an undocumented function (but trivial) that checks if the current syntax parsing state is string or comment


  • is an extension of the above function and checks for the case of a continuation char (ie the above function) or if in a matching paren type context that is allowed to span multiple lines.
  • The syntax-ppss-depth function tells you how far nested you are in parens, braces, etc. The interesting part of this function is that if you have unmatched parens then it tries to move up the list and assumes that if you succeed then you must be in something close enough to matching lists after all. I wasn't able come up with a set of braces that tickled this case but otherwise the logic makes sense.

So that is the first half of the utility functions.

I'm still overwhelmed by the amount of complexity required to get a language mode working. I haven't even gotten something seemingly simple like tab support working. I guess this is like a lot of things. When you start to understand a new system, many of the hard things turn out to be trivial and many of the seemingly trivial things turn out to be quite complex.

I had never really browsed through the emacs lisp manual before and I'm finding it is quite helpful and not too hard to navigate.

Eventually I'll have to understand a mode for some other language(s) as well. I'm curious how much python's significant whitespace makes makes things harder. Presumably the default language mode behaviors are tuned for something like lisp and/or C.

I also wonder if it would be possible to use pythons own parser in place of emacs for doing some of the syntax checking activities. That would be an interesting project to pursue. As will be looking at the pymacs module.

Settle down there tiger. One thing at a time....

No comments: