Saturday, September 27, 2008
100 Pushups: week 5 (finished)
But I did finish it. So, I may suck, but I don't suck as much as people who *haven't* finished week 5 ever. :)
Any way, on to week 6 now. (For who knows how long).
Wednesday, September 17, 2008
emacs python mode from scratch: stage 6 - python-indentation-levels
OK now let's see if python-indentation-levels works as advertised when it is copied over.
For the following test code block (with cursor at position POINT):
class Foo(object):
def __init__(self, *args, **kwargs):
print "hi"
print 'qwerty' [POINT]
it returns the following list:
((0 . #("class Foo(object):" 0 5 (face font-lock-keyword-face fontified t) 5 6 (fontified t) 6 9 (face font-lock-type-face fontified t) 9 18 (fontified t)))
(4 . #("def __init__(self, *args, **kwargs):" 0 3 (face font-lock-keyword-face fontified t) 3 4 (fontified t) 4 12 (face font-lock-function-name-face fontified t) 12 13 (fontified t) 13 17 (face font-lock-keyword-face fontified t) 17 36 (fontified t)))
(8 . #("print \"hi\"" 0 5 (face font-lock-keyword-face fontified t) 5 6 (fontified t) 6 10 (face font-lock-string-face fontified t))))
This is a list of lists where each internal list consists of a pair of tab position and the object with which it is matching.
So everything seems to be functioning correctly,now we just need to delve into how this works exactly.
Basically the whole function is a cond with three cases:
- statement following a block open statement
- comment following a comment
- everything else
Before we go through these three cases I have to say that the style is very new to me. The "predicate" part for the first two cases of each case branch is a block of code that acts as if what it is testing is true. If it succeeds then it is true and it sets the indent list appropriately, otherwise it tries the next case. It is just an unusual style (to me) to run code in a "if" statement like this. But it obviously works, so I'll just adjust my expectations accordingly.
- statement following a block open statement
The logic here is to that check all of the following actions/tests work:
- move to a previous statement
- check that it is an opening block statement
- save the value of the indent
- move to the end of the statement
- skip comments and blanks
- make sure we are at a ":"
- add the fixed indent amount to the indent of the previous block statement
I don't really understand all the machinations here.
Intuitively I would have stopped after the first 3 steps.
Ahh... now I see. The comments are useful here.
;; Check we don't have something like:
;; if ...: ...So if we go to the end of the statement and don't find a ":" we have the above scenario and the "normal" indenting rule won't work.
- move to a previous statement
- comment following a comment
This is more straightforward. If the current line is a comment and the previous line is also a comment, then there is only one choice for indent levels: the indent level of the previous line.
- all other cases.
This logic doesn't look as bad as I would have at first suspected.
The first thing added to the list of indentation levels will be the position of the previous lines indentation *if* the previous line is part of a pair like if/else that makes sense to line up with AND it is not a block closer (e.g. return) that doesn't make sense to line up with
Next we are going to crawl up a block at a time and collect indentation levels on the way up. We only skip a level if we had a word like "else" and the block we are examining doesn't match our "start" word
Even if we had nothing we throw a 0 position on the list for good measure and then set a couple global values (python-indent-list and python-indent-list-length)
So that was one of the first "big" functions I've had to work through and it wasn't too bad. It's amazing how many functions were required do something as simple sounding as get suggested indentation levels.
And of course we *still* can't indent. So the next phase will hopefully be using the output of this function to actually navigate using tab. I would never have guessed so much magic was going on when the tab key was hit.
I will need to keep in mind that these values are being set globally and see if I can guess why. As a guess I'd say its because this operation is expensive and while you are on a line there is no need to recalculate it. If that's true I should at some point see some code that recognizes that point has moved to a new line and invalidates the current python-indent-list
We are now 23% of the way through the code (by lines - including comments).
Tuesday, September 9, 2008
emacs python mode from scratch: stage 5 - more movement methods
So let's continue with the seemingly modest goal of getting tabbing to work
Presumably my current target is to get python-indent-line working
From some quick browsing this function has the following important dependencies:
db-python-indent-line
db-python-indent-line-1 ;; which sets global: db-python-indent-list-length
db-python-calculate-indentation (104 lines)
And then looking at python-calculate-indentation we find it requires
- db-python-indent-string-contents # global var
- db-python-indentation-levels # 59 lines
So I guess this time our target is to get the support functions in place for python-indentation-levels.
- python-indent
- python-block-pairs
and the function(s)
- python-first-word
- python-initial-text
- python-beginning-of-block
- python-end-of-block
- python-indent:
This is pretty straight forward, set a customizable variable for what the default number of columns for indentation will be. Just to be thorough we need to look up what safe-local-variable means.
From poking around in files.el
;; Safe local variables:
;;
;; For variables defined by major modes, the safety declarations can go into
;; the major mode's file, since that will be loaded before file variables are
;; processed.
;;
;; For variables defined by minor modes, put the safety declarations in the
;; file defining the minor mode after the defcustom/defvar using an autoload
;; cookie, e.g.:
;;
;; ;;;###autoload(put 'variable 'safe-local-variable 'stringp)
;;
;; Otherwise, when Emacs visits a file specifying that local variable, the
;; minor mode file may not be loaded yet.
;;
;; For variables defined in the C source code the declaration should go here:So basically it's a way to do some simple type checking on a variable.
- python-block-pairs
This is an alist of python keywords and the keywords that they are the "closers" for.
- python-first-word
A simple function that returns to the beginning of code for a line and calls current-word
- python-initial-text
A function to grab the non-comment code on a line. Interestingly this function doesn't seem to work as advertised. Both in my hand copied subset of code and in the full working original both seem to keep the comments at the end of the line.
I will have to keep this in mind when looking at how it is used to see if this will affect the functionality.
- python-beginning-of-block
Move point to beginning of containing block. It starts by moving past any blank space and/or comments and then going to start of current statement.
Then while point is not on the 0th column and/or the arg passed in is not 0 continue recursively until one of the target conditions is true
The logic here is a bit of a challenge (for me) to follow. Its a twisty maze of whiles, whens, ands, recursion and throw/catches.
It seems to be doing something like:
Move up a line and continue doing it until either we can't go up any more or we hit the condition where we generate a 'done (ie the conditions match a new outer block have been hit).
This is probably the most "lispy" code so far and seemed excessively weird when I first tried to wrap my brain around it. But it seems halfway sensible to me now
- python-end-of-block
For some reason I was expecting this function to be a simple variation of python-beginning-of-block, but the logic is fairly different. It *does* use many of the same function but the way they are combined is not a simple reverse of the above logic.
This function and the above are definitely a bit more sophisticated in their logic and the code is more strange (to me). As an example:
within a let* expression there is the following "variable assignent" which is really not a variable assignment.
(_ (if (db-python-comment-line-p)
(db-python-skip-comments/blanks t)))At this point I'm not sure whether to think this is an ugly hack or just to try and get used to this as idomatic emacslisp. Seems like it would be cleaner just to do this test and function call before the let*, or within a nested let. Hmmm... Can't do it before because we need the value of point before moving and having a nested let makes the code more complicated. I guess I'm already starting to see this more as a clever hack than first blush.
More strangely, python-beginning-of-block is called recursively and this function is an iteration. I wonder if these were done by different people or in different moods. Or perhaps there is some compelling reason not to do it recursively. Not obvious to me in any case why the difference in style.
Anyway, we are now closer to being able to implement some tab bevhavior. With any luck next phase will have us implementing python-indentation-levels
By lines of code I'm about 20% of the way through.
Sunday, September 7, 2008
100 Pushups: week 5
*sigh*
True, there was a week off because I was sick and another following that for a week vacation. And yes, I did lose a lot of progress during those two weeks. But man, this is a non-trivial challenge.
I really have to wonder how many people have followed this (and exclusively this) plan to victory.
Any way, as long as each time I go through week 5 I get a little better then I don't feel *too* bad about being relentless. It's when I plateau that I have to start wondering about my sanity.
For what it's worth I can currently do a max of 40 pushups in one set. 100 stills seems absurdly far off.
Tuesday, August 26, 2008
Puzzle: Determine the smallest integer
Ten (not necessarily distinct) integers have the property that if all but one of them are added the possible results are: 82, 83, 84, 85, 87, 89, 90, 91, 92. What is the smallest of the integers?
(I'll post the solution and my method in a few days.)
Tuesday, August 19, 2008
emacs python mode from scratch: stage 4 - utilities (the rest)
The rest of the utilities:
(defun python-comment-line-p ()
(defun python-blank-line-p ()
(defun python-beginning-of-string ()
(defun python-open-block-statement-p (&optional bos)
(defun python-close-block-statement-p (&optional bos)
(defun python-outdent-p ()
python-comment-line-p
- Go to the end of the line, if the emacs parser says we we are in a comment, go to the beginning of the line per indentation and check if we are looking at a comment or the end of a line.
- This is weird. Seems like there is no way for this to ever be an end of line.
- comment-start seems to be a symbol required by rx not necessarily a value set in define-derived-mode (which I'm not setting yet)
- As an aside, the regex for start of comment is \s<
- I don't think that line-end is a necessary part of that regular expression or at least if i take it out it doesn't seem to change the behavior but I'll just leave it in for now.
- Presumably we are just depending on the comment character set in python-mode-syntax-table
python-blank-line-p
- This is about as straight forward as you get. Go to the beginning of the line and check if you are looking at 0 or more white space chars followed by an end of line. \\s- is the ever so strange looking way emacs regular expressions represent white space.
python-beginning-of-string
- Hey, this stuff is starting to look familiar. First we determine what state a parser would be at the current position. Then if the parser says we are in a string, we go the the starting point (8th item of the state list)
python-open-block-statement-p
- So this and python-close-block-statement-p and python-outdent-p call a number of things that aren't already defined which is strange since you'd think some functions described as utilities would be self contained. The additional functions are:
python-beginning-of-statement and python-previous-statement
So we'll assume these functions do what they claim for now and investigate them more closely down below - This logic is pretty straight forward. Go to the beginning of the statement and then use a regular expression match to see if we are at the beginning of a block.
- NOTE: there is an optional parameter that let's you skip the movement to the beginning of the statement. Seems like a simple performance helper.
python-close-block-statement-p
- This is the exact analog of python-open-block-statement-p. Except here we check to see if it is something that for sure is the last member of a block.
python-outdent-p
- Check if current line should be "outdented". Again pretty straight forward
code. Move to the beginning of the the current indentation and check if all of the following are true: looking at (else, finally, except or elif) and not in a comment or string and check that the previous statement is neither a close block nor an open block. - In other words if on something like an else statement and the previous statement is not something that requires indentation you should outdent.
- Seems like this is just sort of a heuristic and not quite accurate.
E.g. for an improperly nested "else" following a return. It will say not to outdent. Of course, what *should* you do when the code is invalid?
python-beginning-of-statement
- (this requires python-skip-out so we add that to the list as well)
Here we will move to the beginning of the line and then if on a continuation of some sort we'll either check if we are on a backslash style of continuation and move backward over these or we will move backward over strings and "skip out" (jump up) from nested brackets.
python-previous-statement
- (this needs python-next-statement so we add that to the list as well)
If it receives a negative argument they really want to go forward, so pass off to python-next-statement. Otherwise, go to the beginning of the statement and skip over comments and blanks and continue going to the beginnnig of the statement until counter is 0 (or reach beginning of buffer)
python-skip-out
- Pop up out of nested brackets to the front by default and to the
end if "forward" is set. Additionally if "syntax" of point is already available, then that can be passed in. For well formed nesting we just call backward-up-list with the depth. For ill-matched brackets we try to go backward up the list over and over until we get an error.
python-next-statement
- This is pretty much an exact analogue of python-previous-statement.
There were a surprising number of functions that the utilities them self depended on. I continue to be amazed by the complexity of making a language sensitive mode for emacs.
I also noticed that I'm over 10% of the way done. Making progress.
Thursday, August 7, 2008
emacs python mode from scratch: stage 3 - some utilities
There are about 100 lines in the utility section so let's do half at a time. In the first 50 lines we have the following functions/definitions:
(defsubst python-in-string/comment ()
(defconst python-space-backslash-table
(defun python-skip-comments/blanks (&optional backward)
(defun python-backslash-continuation-line-p ()
(defun python-continuation-line-p ()
python-in-string/comment
- defsubst is a marcro for inlining a function which seems strange. I wonder if this was just a performance booster. (Is there any other reason to inline code?)
- syntax-ppss returns info on what the parser state would be at the current position. Below are the states and its clear why nth 8 is the thing we need for determining if we are in a string or comment
0. The depth in parentheses, counting from 0. Warning: this can be negative if there are more close parens than open parens between the start of the defun and point.
1. The character position of the start of the innermost parenthetical grouping containing the stopping point; nil if none.
2. The character position of the start of the last complete subexpression terminated; nil if none.
3. Non-nil if inside a string. More precisely, this is the character that will terminate the string, or t if a generic string delimiter character should term inate it.
4. t if inside a comment (of either style), or the comment nesting level if inside a kind of comment that can be nested.
5. t if point is just after a quote character.
6. The minimum parenthesis depth encountered during this scan.
7. What kind of comment is active: nil for a comment of style â?oaâ?? or when not inside a comment, t for a comment of style â?ob,â?? and syntax-table for comment that should be ended by a generic comment delimiter character.
8. The string or comment start position. While inside a comment, this is the position where the comment began; while inside a string, this is the position where the string began. When outside of strings and comments, this element is nil.
9. Internal data for continuing the parsing. The meaning of this data is subject to change; it is used if you pass this list as the state argument to another call. - syntax-ppss seems like an insanely powerful feature. My respect for the sophistication going on behind the scenes when emacs opens a file for a certain language continues to grow.
python-space-backslash-table and python-skip-comments/blanks
- This seems to be a "throw away" data structure for overriding the *real* syntax table temporarily
- It provides a syntax table that redefines "\" as a whitespace class. Presumably this is for allowing movement over otherwise blank continuation lines as if they were whitespace.
- What this meant wasn't immediately obvious, but it seems to be addressing the type of situation below (which honestly I haven't seen in the wild before, but seems to be legal python code).
if x and \
\
\
\
\
y:
print z - For movement we will use forward-comment which moves over up to X comments. It moves backward if arg is negative. The forward motion is straight forward. The backward motion starts by positioning point so that it is at the *start* of the comment (probably to avoid having it forward-comment move it to the start of the comment itself which would probably seem like a noop to the user)
python-backslash-continuation-line-p
- This is pretty straight forward. Check if the last char on the previous line is a \ and make sure not in a comment or string
- syntax-ppss-context is a an undocumented function (but trivial) that checks if the current syntax parsing state is string or comment
python-continuation-line-p
- is an extension of the above function and checks for the case of a continuation char (ie the above function) or if in a matching paren type context that is allowed to span multiple lines.
- The syntax-ppss-depth function tells you how far nested you are in parens, braces, etc. The interesting part of this function is that if you have unmatched parens then it tries to move up the list and assumes that if you succeed then you must be in something close enough to matching lists after all. I wasn't able come up with a set of braces that tickled this case but otherwise the logic makes sense.
So that is the first half of the utility functions.
I'm still overwhelmed by the amount of complexity required to get a language mode working. I haven't even gotten something seemingly simple like tab support working. I guess this is like a lot of things. When you start to understand a new system, many of the hard things turn out to be trivial and many of the seemingly trivial things turn out to be quite complex.
I had never really browsed through the emacs lisp manual before and I'm finding it is quite helpful and not too hard to navigate.
Eventually I'll have to understand a mode for some other language(s) as well. I'm curious how much python's significant whitespace makes makes things harder. Presumably the default language mode behaviors are tuned for something like lisp and/or C.
I also wonder if it would be possible to use pythons own parser in place of emacs for doing some of the syntax checking activities. That would be an interesting project to pursue. As will be looking at the pymacs module.
Settle down there tiger. One thing at a time....