Wednesday, September 17, 2008

emacs python mode from scratch: stage 6 - python-indentation-levels

OK now let's see if python-indentation-levels works as advertised when it is copied over.

For the following test code block (with cursor at position POINT):


class Foo(object):
def __init__(self, *args, **kwargs):
print "hi"
print 'qwerty' [POINT]

it returns the following list:


((0 . #("class Foo(object):" 0 5 (face font-lock-keyword-face fontified t) 5 6 (fontified t) 6 9 (face font-lock-type-face fontified t) 9 18 (fontified t)))
(4 . #("def __init__(self, *args, **kwargs):" 0 3 (face font-lock-keyword-face fontified t) 3 4 (fontified t) 4 12 (face font-lock-function-name-face fontified t) 12 13 (fontified t) 13 17 (face font-lock-keyword-face fontified t) 17 36 (fontified t)))
(8 . #("print \"hi\"" 0 5 (face font-lock-keyword-face fontified t) 5 6 (fontified t) 6 10 (face font-lock-string-face fontified t))))

This is a list of lists where each internal list consists of a pair of tab position and the object with which it is matching.

So everything seems to be functioning correctly,now we just need to delve into how this works exactly.

Basically the whole function is a cond with three cases:


  • statement following a block open statement
  • comment following a comment
  • everything else

Before we go through these three cases I have to say that the style is very new to me. The "predicate" part for the first two cases of each case branch is a block of code that acts as if what it is testing is true. If it succeeds then it is true and it sets the indent list appropriately, otherwise it tries the next case. It is just an unusual style (to me) to run code in a "if" statement like this. But it obviously works, so I'll just adjust my expectations accordingly.

  • statement following a block open statement

    The logic here is to that check all of the following actions/tests work:


    • move to a previous statement
    • check that it is an opening block statement
    • save the value of the indent
    • move to the end of the statement
    • skip comments and blanks
    • make sure we are at a ":"
    • add the fixed indent amount to the indent of the previous block statement

    I don't really understand all the machinations here.

    Intuitively I would have stopped after the first 3 steps.

    Ahh... now I see. The comments are useful here.


    ;; Check we don't have something like:
    ;; if ...: ...

    So if we go to the end of the statement and don't find a ":" we have the above scenario and the "normal" indenting rule won't work.

  • comment following a comment

    This is more straightforward. If the current line is a comment and the previous line is also a comment, then there is only one choice for indent levels: the indent level of the previous line.

  • all other cases.

    This logic doesn't look as bad as I would have at first suspected.

    The first thing added to the list of indentation levels will be the position of the previous lines indentation *if* the previous line is part of a pair like if/else that makes sense to line up with AND it is not a block closer (e.g. return) that doesn't make sense to line up with

    Next we are going to crawl up a block at a time and collect indentation levels on the way up. We only skip a level if we had a word like "else" and the block we are examining doesn't match our "start" word

    Even if we had nothing we throw a 0 position on the list for good measure and then set a couple global values (python-indent-list and python-indent-list-length)

So that was one of the first "big" functions I've had to work through and it wasn't too bad. It's amazing how many functions were required do something as simple sounding as get suggested indentation levels.

And of course we *still* can't indent. So the next phase will hopefully be using the output of this function to actually navigate using tab. I would never have guessed so much magic was going on when the tab key was hit.

I will need to keep in mind that these values are being set globally and see if I can guess why. As a guess I'd say its because this operation is expensive and while you are on a line there is no need to recalculate it. If that's true I should at some point see some code that recognizes that point has moved to a new line and invalidates the current python-indent-list

We are now 23% of the way through the code (by lines - including comments).

No comments: