Wednesday, October 29, 2008

emacs python mode from scratch: stage 9 - rest of the python indent code

So my idea was to move onto something completely new but wouldn't you know it, I found a couple more indentation related functions that needed dealing with.

The functions/variables in question are:


python-guess-indent (both variable and function with this name)
python-guess-indent
python-beginning-of-defun
python-end-of-defun
python-backspace

And just for fun we'll throw in the key mapping for python-backspace

(define-key map "\177" 'python-backspace)

So let's quickly browse through these functions and see how they work and make sure each works as promised.

  • python-guess-indent (variable)

    Seems weird to me that there can be a variable and function with the same name. I'm guessing this is a due to emacslisp being a lisp-2

    This guess seems to be confirmed by: http://coding.derkeiler.com/Archive/Lisp/comp.lang.lisp/2008-01/msg00683.html

    In any case having this being true means that python mode will be using the following indentation guessing function.

  • python-guess-indent (function)

    This function does what it's name implies.

    Looks for a ":" character and then moves to the beginning of that line. Then it moves to the following line and takes the difference. If the different is >= 2 and <= 8 then it take that as a guess. Otherwise it continues to the next ":".

    If the value it found is different from the default value then it sets it to this new value.

  • python-beginning-of-defun

    Look for a regular expression consisting of space and either def or class.

  • python-end-of-defun

    This is a little trickier than python-beginning-of-defun. Here's another place where python's whitespace makes life hard. Good thing it's pretty to look at.

    First find the beginning of the block we are currently in. If we are at the beginning of a line then but not at the start of a block, move forward until we find a block starting line.

    From here python-end-of-block does the rest of the work.

  • python-backspace

    Look at the list of valid indentation levels for the current line and then back track to the next smallest.

    Also present a message explaining what block you just closed.


I'm now 40% through. Almost starting to seem like I might make it all the way through this puppy.

Tuesday, October 28, 2008

99 problems - python - 13

Run-length encoding of a list (direct solution). Implement the so-called run-length encoding data compression method directly. I.e. don't explicitly create the sublists containing the duplicates, as in problem 9, but only count them. As in problem P11, simplify the result list by replacing the singleton lists (1 X) by X

def last_element(lst):
if not lst:
return None
elif type(lst[-1]) == tuple:
return lst[-1][1]
else:
return lst[-1]

def increment_last(lst,x):
if not lst:
lst.append(x)
elif type(lst[-1]) == tuple:
count = lst[-1][0]+1
else:
count = 2

lst[-1] = (count, x)

def encode_frugal(lst):
result = []
for x in lst:
if x == last_element(result):
increment_last(result,x)
else:
result.append(x)

return result

Here's another one that seems a little more verbose than necessary, but that's what I come up with for a first pass.

Monday, October 27, 2008

99 problems - python - 12

Decode a run-length encoded list. Given a run-length code list generated as specified in problem 11. Construct its uncompressed version.

def decode(encoded):
for x in encoded:
if type(x) == tuple:
for _x in range(x[0]):
yield x[1]
else:
yield x

By now you should know to throw a "list" into the mix if that's the way you swing.

Sunday, October 26, 2008

99 problems - python - 11

Modified run-length encoding. Modify the result of problem 10 in such a way that if an element has no duplicates it is simply copied into the result list. Only elements with duplicates are transferred as (N E) lists.

import itertools
def run_length_encode2(lst):
# could just reuse run_length_encode function from last time
groups = [list(x[1]) for x in itertools.groupby(lst)]
return [len(g) > 1 and (len(g), g[0]) or g[0] for g in groups]

Hmmm.... I use 2.4 at work, but I need to keep up with the times, so here is the more modern way (let's get ternary with it)

import itertools
def _run_length_encode2(lst):
groups = [list(x[1]) for x in itertools.groupby(lst)]
return [(len(g), g[0]) if len(g) > 1 else g[0] for g in groups]

Saturday, October 25, 2008

99 problems - python - 10

Run-length encoding of a list. Use the result of problem P09 to implement the so-called run-length encoding data compression method. Consecutive duplicates of elements are encoded as lists (N E) where N is the number of duplicates of the element E.

import itertools
def run_length_encode(lst):
# could just call "group" from last solution
groups = [list(x[1]) for x in itertools.groupby(lst)]
return [(len(g), g[0]) for g in groups]

99 problems - python - 9

Pack consecutive duplicates of list elements into sublists. If a list contains repeated elements they should be placed in separate sublists.

import itertools
def group(lst):
return [list(x[1]) for x in itertools.groupby(lst)]

99 problems - python - 8

Eliminate consecutive duplicates of list elements.

import itertools
def squeeze(lst):
return [x[0] for x in itertools.groupby(lst)]

What can I say? I like itertools.

99 problems - python - 7

Flatten a nested list structure.

import types
def flatten_list(lst):
if type(lst) not in (list, tuple, types.GeneratorType):
yield lst
else:
for x in lst:
for _x in flatten_list(x):
yield _x

I'm being a little loose with the definition of a list here but I think this a behavior I would want from a function like this. Call it as list(flatten_list(my_list)) to get an actual list.

Something doesn't sit quite right with me about that nested for statement. Maybe something more elegant will come to me later.

Friday, October 24, 2008

99 problems - python - 6

Find out whether a list is a palindrome. A palindrome can be read forward or backward; e.g. (x a m a x).
def is_palindrome(lst):
return lst == lst[::-1]

The first thing that came to mind was lst == list(reversed(lst)), but that doesn't quite work if the "sequence" you pass in is a string. Besides I hate to pass up an opportunity to use "[::-1]"

I have to admit that using languages like Haskell has made me sensitive to weirdnesses of python that I never really noticed before. For instance the fact that doing list("some string") doesn't return a list of characters but instead a list of strings of length 1. Ugh.

Thursday, October 23, 2008

99 problems - python - 5

Reverse a list.

reversed(my_list)

# or for you purists
list(reversed(my_list))

BTW, sorted and reversed are two of my favorite semi-recent additions to python

99 problems - python - 4

Find the number of elements of a list.

len(my_list)

OK, bear with me, these really do get harder after a while.

Wednesday, October 22, 2008

99 problems - python - 3

Find the K'th element of a list. The first element in the list is number 1.

def kth_item(lst):
return lst[k-1]

99 problems - python - 2

Find the last but one element of a list.

my_list[-2]

I could go on all day like this.

99 problems - python - 1

Find the last element of a list.

my_list[-1]

Talk about your l33t hacking skills. :)

Saturday, October 18, 2008

99 problems - python - intro

There is a fun looking set of problems making the rounds. Here is the haskell flavor of them.

They start off sort of ridiculously easy and then get to pretty damn challenging. I'll see how far I can get with my mad skillz.

Twelve Virtues of Rationality

I have a calendar entry set to remind me once a month to reread the following:

Twelve Virtues of Rationality

I highly recommend it for anyone who values clear thinking and considers pursuing the truth as a good in itself.

(If that sip was to your liking, you can turn on the firehose here)

Wednesday, October 15, 2008

emacs python mode from scratch: stage 8 - python-indent-line

OK, now that we have python-calculate-indentation finished let's see who calls it (and hopefully work our way up to the point that tab cycling actually works).

So here are the call chains for python-calculate-indentation and a couple of related functions.


python-indent-line
python-indent-line-1
python-calculate-indentation
python-block-end-p

python-indent-region
python-indent-line-1
python-calculate-indentation

python-electric-colon
python-calculate-indentation

So our new functions we will bring over this time are:


python-block-end-p
python-indent-line-1
python-indent-line
python-indent-region
python-electric-colon

That's about 90 lines so that's good for one session. In addition to these functions we'll add a in the key mapping for python-electric-colon and the indent assignments within define-derived-mode (indent-line-function, indent-region-function).

Well, I don't know yet know *why*, but I can confirm that tab cycling works as I expected. Hopefully it will be clear as I go through the code what exactly is happening.

As for the electric colon, after experimenting around I was able to figure out a scenario which makes a line of code "outdent" as promised:


if foo:
bar
else:

My first instinct is to try to understand the tab cycling by digging down from the top and seeing how the tab key itself is bound.


TAB (translated from ) runs the command indent-for-tab-command
which is an interactive compiled Lisp function in `indent.el'.
It is bound to TAB.
(indent-for-tab-command &optional arg)

Indent line in proper way for current major mode or insert a tab.
Depending on `tab-always-indent', either insert a tab or indent.
If initial point was within line's indentation, position after
the indentation. Else stay at same point in text.
The function actually called to indent is determined by the value of
`indent-line-function'.

So this points us to indent.el which is where the magic happens and defines why the major mode needs to define indent-line-function and indent-region-function.

My next step was to look for descriptions of this indent functionality in the emacs manual but I quickly got tired of poring over that. So back to looking at the code in the library.

So let's work through the functions above:


python-block-end-p
python-indent-line-1
python-indent-line
python-indent-region
python-electric-colon
  • python-block-end-p

    Check to see if the current line is the last line of an indented block. It's sort of surprising that this functionality hasn't been needed so far.

    It checks that the current line is not a comment and that that either: it's for sure the end of a block (e.g. raise, etc) or the current indentation is less that the previous line's indentation.

    That seems a little loose on the meaning of what I would expect a block-end checker to do. But from looking at how it is used in context I see that it probably is sufficient for it purpose. Basically it is simply being used to determine whether or not to display a message about what block is being closed by the position that the current indenting level. So close enough. Might be confusing if someone tried to use this function more generally.

  • python-indent-line-1

    The comments for this function indicate that this is how indenting would be done if there was no cycling. Which essentially means it will take the last value from the full list of possibilities and just use that.

  • python-indent-line

    So this is essentially the money function for doing tab cycling and now that I know so much more about the work going on underneath the covers it seems comically short. And I certainly wouldn't have really understood just how magical this function is without looking at the gory details myself.

    The basic logic is to check if we are already cycling. If so then we move to the next known position and try to show what code block we are closing if that makes sense.

    If we are just starting a tab cycle then we just call python-indent-line-1 and set up python-indent-index to be ready for cycling.

  • python-indent-region

    So this function is a little strange. According to the emacs manual indent-region-function is used to act as a fast way to indent lines. The default behavior is to just run the indent-line-function on each line. But this code just goes ahead and loops over every line and just runs python-indent-line for each. Nothing obvious in this code to indicate what the advantage of walking over each line by hand would be. A mystery.

  • python-electric-colon

    So this guy is mapped to ":" and usually just inserts a ":" unless we have the case I outlined above. The basic logic is to check if we are at the end of a line, whether we are in an outdentable context (python-outdent-p), not in a string or comment and we are past the point that it is predicted we should be indented to.

    This function is followed by:

    (put 'python-electric-colon 'delete-selection t)

    Which as far as I can tell is related to delsel.el. Apparently delete-selection is used by this mode when you are deleting selected text a la windows behavior (ie delete selected text when you type).


Now that I have tab cycling pretty much wrapped up I have this hope that much of the rest of the code will be a little more straight forward.What can I say, I'm an optimist at heart.

As far as code coverage goes, I've copied over 33% of the code. Wow, I'm a third of the way done.

Wednesday, October 1, 2008

emacs python mode from scratch: stage 7 - python-calculate-indentation

We continue with our task of trying to get tabbing working by copying over python-calculate-indentation and examining how it works.

First thing that is apparent is that we also need these variables to be defined and set:

  • python-indent-string-contents
    • if true, then for doc strings do alignment

  • python-continuation-offset
    • how many spaces to add if indenting within a line following a continuation (ie "\")

  • python-honour-comment-indentation
    • go ahead and align a comment with a previous comment even if it's not itself in a "normal" position

So on a first glance python-calculate-indentation doesn't seem to work as i would have expected. I just get one position back and the value doesn't change on successive runs even if there are various logical candidates. So my guess is that this function returns the best first guess and then a higher level function calling this will changes some global variable so that this function will return the next best and so on until it cycles through.

So let's see what is really going and and how good my guess is.

This function (similarly to python-indentation-levels) is just one big "cond". The various sections are:

  • multi-line string
  • after a backslash or in brackets (ie continuation context)
  • beginning of buffer
  • non-indentable comment
  • the rest

The number of new cases here is a bit of a surprise since I had half expected that the previous function we looked at (python-indentation-levels) already handled the main cases. But I guess not.

  • multi-line string

    If we are currently in a string then do one of two things. If we have set our preferences so that we don't want to indent string contents, then just take the current indentation.

    Otherwise if we are on a line that starts the string, then just keep the current indentation. The actual work horse of this section is to keep walking backward a line at a time until we find a non-blank line and then use that as the indentation level.

  • after backslash or in brackets

    The logic here might be best summarized with pseudo code


    if in bracketed expression
    find first item in list and align with that
    otherwise align with the start statement (plus a "step" per bracketing level)
    else # must be backslash continuation line
    if this is not the first continuation line
    take indent of previous line
    else
    indent one "step" (plus even one more if previous line
    is a opening block)

    Fortunately this function is pretty well documented or it would be a bear to get through. Well, it's still a bear, but it's less of a bear. I wonder if instead of good documentation this would be a good place to create well named smaller functions. My personal style is to aim for functions that are the width of my hand (or two). Not being able to see the function in its entirety on one screen is pretty obnoxious to me.

    Note: needed to remember that "if" has one clause for the then statement (hence we find a progn) but the else can be any number of clauses. Strange asymmetry.

  • beginning of buffer

    No choice here. It's gotta be 0.

  • non-indentable comment

    There is a note here that basically says that they copied this behavior from python-mode.el but that it's not necessarily a good idea to have done so.

    A non-indentable comment seems to be a comment line that has no space-like characters between the comment starter ("#") and the text


    e.g.
    # hello
    #there

    That seems weird. But now I know to look for this behavior when editing.

  • the rest.

    So this is weird. This is where the last function we looked at (python-indentation-levels) will get called. but at this point I'm confused about how how they interact. Any way let's wade through this and see what we can figure out

    There are four main items here


    • move backward over white space if need to
    • call our friend python-indentation-levels
    • prefer indenting of comments with a following statement
    • take the caar (car of the car) of the last thing in python-indent-list


I have to admit at this point that I can mostly grok the code but the actual details are quite daunting. I would be hard pressed to recreate this logic from scratch. Also I'm still completely nonplussed as to how the magic cycling of tabbing options occurs.

Peeking around the original file I see a reference to cycling so I hope things get more clear when I get to that. But, once again we've been through a bunch of code and we still don't have a working tab cycler yet. But I think we are within spitting distance now.

For what it's worth, I'm now about 30% of the way through the code.