lair of the dustbunny: 2008

Sunday, December 21, 2008

99 problems - python - 56

Symmetric binary trees

Let us call a binary tree symmetric if you can draw a vertical line through the root node and then the right subtree is the mirror image of the left subtree. Write a predicate symmetric/1 to check whether a given binary tree is symmetric. Hint: Write a predicate mirror/2 first to check whether one tree is the mirror image of another. We are only interested in the structure, not in the contents of the nodes.


# Example in Haskell:

# *Main> symmetric (Branch 'x' (Branch 'x' Empty Empty) Empty)
# False
# *Main> symmetric (Branch 'x' (Branch 'x' Empty Empty) (Branch 'x' Empty Empty))
# True

def mirror(tree):
    if tree == None:
        return None
    else:
        val, left, right = tree
        return (None, mirror(right), mirror(left))

def symmetric_binary_tree(tree):
    return mirror(mirror(tree)) == mirror(tree)

99 problems - python - 55

Construct completely balanced binary trees

In a completely balanced binary tree, the following property holds for every node: The number of nodes in its left subtree and the number of nodes in its right subtree are almost equal, which means their difference is not greater than one.

Write a function cbal-tree to construct completely balanced binary trees for a given number of nodes. The predicate should generate all solutions via backtracking. Put the letter 'x' as information into all nodes of the tree.


# Example:

# * cbal-tree(4,T). 
# T = t(x, t(x, nil, nil), t(x, nil, t(x, nil, nil))) ;
# T = t(x, t(x, nil, nil), t(x, t(x, nil, nil), nil)) ;
# etc......No
def completely_balanced_trees(node_count):
    if node_count == 0:
        return [None]
    else:
        right_count  = (node_count - 1) / 2
        left_count = (node_count - 1) - right_count
        left_trees = completely_balanced_trees(left_count)
        right_trees = completely_balanced_trees(right_count)

        trees = []
        for left in left_trees:
            for right in right_trees:
                trees.append(("x", left, right))
                if left_count != right_count:
                    trees.append(("x", right, left))

        return trees

OK, I'm using tuples for trees again. So sue me.

Saturday, December 20, 2008

Babylon 5: not so cautious, not so optimistic

So the general rule for our tv watching these days is that we pretty much just watch tv series that are complete and available in their entirety on netflix. It turns out there are only a limited number of these. After a while you start considering things that have been on your "maybe" list for a while. One very big maybe for me has been Babylon 5. It shows up on all of these top 10 best scifi series lists and people rave about it having one of the best multi season story arcs. Ever.

Having heard that the first season is a bit weak and the things really start to get good in season 2 I asked my wife for some patience and and decided to dig in.

And, well, so far it's been worse than I could have imagined. Terrible set designs. Unbelievably bad diaglog. Very long in the tooth special effects. Unimaginitive aliens. Recycled scifi tropes. Wooden acting. And we are only two episodes away from season 2 starting. And I have to admit I'm really intrigued. Is there really any chance that things turn around *so* dramatically?

I'm a total fanboy for Farscape. In my opinion, the first season there was pretty weak (not Babylon 5 weak, but not great). And yet it slowly grew into one of my all time favorite series. And this was a show that had serious problems throughout but had some amazing highs to make up for it. So I do believe that things can turn around for B5 and that I will like it.

But man, this first season has been so extremely painful. I'm afraid my wife will get revenge on me by making me watch Sex in the City. Oh, god, the horror. The horror....

emacs python mode from scratch: stage 18 - imenu

So first of all, what *is* imenu.

... read, read ....

So it partly seems like a one file only form of etags. But if you do M-x imenu-add-menubar-index it will add an "index" menu to the menu bar with a list of all files for easy access. Being a bit of a mouse-ophobe, I'm not so sure how I'd like that. Also you have to hit rescan whenever you change the contents of the file (or set up imenu-auto-rescan).

Seems like another case of having more than one way to do things. I'll probably stick with etags, but for completeness here is a brief overview of the imenu support functions.

So there is really just one function:

python-imenu-create-index

Go to the beginning of the file. look for "def and "class" references (unless in comments or string) and add them to the list.

Hmm, I wonder if something else more wonderful is going on here. Doesn't really seem like functionality worth supporting.

Perhaps if you had it set up to go off a simple mouse motion as described here.


(if (featurep 'xemacs)
     (global-set-key [(shift button3)] 'imenu) ; XEmacs
   (global-set-key [S-mouse-3] 'imenu)) ; GNU Emacs

Also looks like it integrates in with ido to some extent. Maybe there is some goodness from that.

Any way, that's enough on that topic.

Wednesday, December 17, 2008

emacs python mode from scratch: stage 17 - ffap

This is a small bit of code, but it's another little gem that I didn't know exists. There is only one function:

python-module-path

This function uses emacs.modpath to determine the module path for use with ffap-alist

What this gets us is support for M-x find-file-at-point.

It's not *quite* as cool as I first anticipated because the file path finder seems to get confused about what is actually a module, but if point is on a module name on an import line then it will pop you directly over to the python lib directory and open the module. If you already have etags set up to do this then you probably won't be *too* amazed, but this is a nice extra way to accomplish the same thing.

emacs python mode from scratch: stage 16 - completion

This is a collection of functions that uses emacs.py to find a list of likely completions for module.


python-imports
python-find-imports
python-symbol-completions
python-partial-symbol 
python-complete-symbol
python-try-complete

python-imports

A list of python import statements in the buffer
python-find-imports

Populate the above list. search by looking for "^import" or "^from" (skipping those in comments or strings).

Then reverse the list, clean out text properties and change \n to \\n so that output doesn't end up wrong.
python-symbol-completions

Run emacs.complete(symbol, python-imports) and return the list of completions (in another buffer)
python-partial-symbol

Use a regular expression to go backward and thereby find the complete symbol before point. This seems a little odd to me. I seems like it would be easier to just trust the syntax-table that has already been defined for the mode and just do backward-word from the current position (with a save-excursion). As a guess I'd say it's so that it can treat "." as part of the symbol and skip over that to the beginning of a module/class name.
python-complete-symbol

Find the list of completions (using python-symbol-completions) or just scroll the completions window if they've already been found.
python-try-complete

"hippie-expand" version for doing symbol completions. Basically "he-" does all the heavy lifting hereit just needs to be
given the python-symbol-completions and python-partial-symbol functions.

Saturday, December 13, 2008

99 problems - python - 54A

Check whether a given term represents a binary tree

In Prolog or Lisp, one writes a predicate to do this.

Example in Lisp:


* (istree (a (b nil nil) nil))
T
* (istree (a (b nil nil)))
NIL
def is_tree(t):
    if t == None:
        return True
    elif type(t) == tuple and len(t) == 3:
        (val, left, right) = t
        return is_tree(left) and is_tree(right)
    return False

Is using lispy tuple for representing tree cheating?

99 problems - python - 50

Huffman codes.

We suppose a set of symbols with their frequencies, given as a list of fr(S,F) terms. Example: [fr(a,45),fr(b,13),fr(c,12),fr(d,16),fr(e,9),fr(f,5)]. Our objective is to construct a list hc(S,C) terms, where C is the Huffman code word for the symbol S. In our example, the result could be Hs = [hc(a,'0'), hc(b,'101'), hc(c,'100'), hc(d,'111'), hc(e,'1101'), hc(f,'1100')] [hc(a,'01'),...etc.]. The task shall be performed by the predicate huffman/2 defined as follows:


# stats: [(letter, count)]
# return: [(count, letter OR (left, right)]
def make_tree(stats):
    pool = sorted([(count, letter) for letter, count in stats])

    while len(pool) > 1:
        first = pool.pop(0)
        second = pool.pop(0)

        node = (first[0]+second[0], (first, second))
        pool.append(node)
        pool.sort()

    return pool[0]

def codes_from_tree(node, path, results):
    count, letter_or_trees = node
    letter = left = right = None
    if type(letter_or_trees) == tuple:
        left, right = letter_or_trees
    else:
        letter = letter_or_trees

    if letter:
        results.append((letter, path))
    else:
        codes_from_tree(left, path+"0", results)
        codes_from_tree(right, path+"1", results)

def huffman(stats):
    results = []
    tree = make_tree(stats)
    codes_from_tree(tree, "", results)
    
    #to sort by huffman code value
    #results.sort(key=lambda x: int(x[1], 2))
    results.sort()

    return results

That solution *feels* a little hacky for me, but I can't think of why it bothers me. It might be that the last time I wrote this was in java, and it was a lot more work. Maybe this just seems too easy.

Friday, December 12, 2008

emacs python mode from scratch: stage 15 - info look

Probably the main lesson of my explorations of the python mode is that there is a lot of functionality available that I had never heard of before. info-look is a good case in point.

python-after-info-look

This function essentially let's you hook into the info system to look for documentation.

As far as I can tell the main use is for calling info-lookup-symbol (C-h S) and having the info page for the python entity show up.

You need to make sure the info flavor of python documentation is installed on your system. (And hopefully it takes you less time than it did me to realize it wasn't).

Not much else to say on this.

Monday, December 8, 2008

emacs python mode from scratch: stage 14 - context sensitive help

Here are the relevant variables and function in the context-sensitive help section


VARS:
(defconst db-python-dotty-syntax-table
(defvar view-return-to-alist)
(defvar db-python-imports)

FUNCS:
python-describe-symbol
python-send-receive
python-eldoc-function

db-python-dotty-syntax-table

This creates an altered syntax table that treats "." as "symbol constituent". In other words, "." is treated as part of the variable name rather than as an other type of syntax.
view-return-to-alist

Used by python-describe symbol. Consists of a window and help-return-method
db-python-imports

Used below when calling emacs.* helper functions.
python-describe-symbol

C-c C-f is bound to this and will do a python help(module|func|etc) for thing at point. It dumps out the standard python documentation in another temporary buffer. It uses emacs.py to get the help. Somewhat useful, but i usually have a python prompt open already.
python-send-receive

Convenience function for sending a command to the python interpreter and reading the result.
python-eldoc-function

Use emacs's own documentation reader to populate the message line with brief description of function and args. this is something that I had not noticed before and am glad I came across it while perusing the code.
If you have a python session going and you either run M-x eldoc-mode or you set up a hook such as
```
  (add-hook 'python-mode-hook 'turn-on-eldoc-mode)
```
then this will automatically run the eldoc help facility. Not sure how useful it will be day to day (maybe it's just developer eye candy), but I've seen this sort of thing while running in slime or in the haskell mode and now at least I see where it is coming from. Seems like it only really works with core python libraries. Strangely doesn't seem to know about the functions in the actual file you are currently working in.

Only 4 sections left to walk through. I'm actually pretty eager to be done with this, but I definitely want to finish (however shallow my exploration of this mode is).

Sunday, December 7, 2008

Brains games that make you cry

I think almost as long as I've been a programmer I've tried making little games to test my reflexes and challenge my memory and other cognitive faculties. It's just been a given to me that: (a) you can train your brain in a manner similar to lifting weights or jogging and (b) this is a highly desirable goal.

The problem is that I've never really had enough faith in the games that I created that the effort I put into them (both writing and training with them) would be worth it. Doesn't stop me from getting interested in the idea anew every once in a while. And it seems that there is a general growing interest in these types of games, growing research that they work and (best of all) an increasing number of open source games that are created specifically to help you train your brain.

Of these sorts of games and research results it *seems* that the one with the most credibility is the "n-back task" as described in this wired article and as implemented in Brain Workshop.

Besides the research, one of the things that gives this game credibility is that it is really challenging and not very game like. It really does feel like exercise (in the sense of forcing yourself to go to the gym because you know it is good for you, a necessary evil). And honestly the first few passes through are almost laughably hard. You are trying to maintain a memory of positions and letter values being thrown out you and notice if the position or letter value is the same as n turns ago.

And the funny thing is that even after a couple of days of not really spending too much time on it, I'm definitely getting better at it. It's actually a weird feeling. The first few times through is just chaos and then all of a sudden you start moving from a slight guess that, yes, I think they did say "C" 2 turns ago and are now repeating it, to being really confident. But of course as soon as you get good at 2-back, it graduates you to 3-back.

Now the problem is how do I test if I am indeed getting smarter? I'll let you know if my brain seems like it has kicked it up a notch or two.

Friday, December 5, 2008

emacs python mode from scratch: stage 13 - python inferior mode

My plan was to look at context sensitive help next but it dependeds on a python inferior process so let's get that working next.

Here are the function/variables of relevance:


(defcustom python-python-command "python"
(defcustom python-jython-command "jython"
(defvar python-command python-python-command
(defvar python-buffer nil
(defconst python-compilation-regexp-alist
(defvar inferior-python-mode-map
(defvar inferior-python-mode-syntax-table
(define-derived-mode inferior-python-mode comint-mode "Inferior Python"
(defcustom inferior-python-filter-regexp "\\`\\s-*\\S-?\\S-?\\s-*\\'"
(defun python-input-filter (str)
(defun python-args-to-list (string)
(defvar python-preoutput-result nil
(defvar python-preoutput-leftover nil)
(defvar python-preoutput-skip-next-prompt nil)
(defun python-preoutput-filter (s)
(defun run-python (&optional cmd noshow new)
(defun python-send-command (command)
(defun python-send-region (start end)
(defun python-send-string (string)
(defun python-send-buffer ()
(defun python-send-defun ()
(defun python-switch-to-python (eob-p)
(defun python-send-region-and-go (start end)
(defcustom python-source-modes '(python-mode jython-mode)
(defvar python-prev-dir/file nil
(defun python-load-file (file-name)
(defun python-proc ()
(defun python-set-proc ()

The main mechanism for creating a python inferior process appears to be creating a derived mode based on comint-mode (defined in comint.el). Most of the code then is defining helper functions for making this mode python aware.

So without further ago, here is a brief look at the above functions:

python-python-command
python-jython-command

Variables which allow customization of what command to use for invoking python (jython)
python-command

The actual command (probably one of the above) that will actually be executed by run-python
python-buffer

The python buffer that will be the target of code issued from files in python-mode
python-compilation-regexp-alist

compilation-error-regexp-alist is set to this value. This is set by inferior-python-mode and is used by compile.el. This regular expression basically matches a python exception stacktrace:
```
   Traceback (most recent call last):
     File "foo.py", line 13, in 
       1/0
   ZeroDivisionError: integer division or modulo by zero
```
Honestly I'm a little confused by this since I don't think this regex actually matches the above and I'm not clear what purpose "compile" would need with it.
inferior-python-mode-map

Add a few keys for loading a file and doing pychecker. The key sequence for pychecker seems superfluous here. I never feel like kicking off pychecker while i'm in an interacive session.
inferior-python-mode-syntax-table

Adjust syntax table so that single quotes don't mess up the syntax coloring. "." is the syntax table entry for punctuation-like things.
inferior-python-mode

This is where the python interactive mode is defined. It overrides the default commint-mode. There is a comment here that the "python-mode" type things (keybindings, etc) should be inherited from python mode itself.

The basic customization is to adjust some regular expressions that are used by commint to decide what sort of things get maintained in the history, what the "prompt" looks like (>>>), etc.
inferior-python-filter-regexp

The default is not to save things in the history if they are 3 or less in length. I hadn't noticed this behavior before.

I'm not sure if there is some rhyme or reason why sometimes they use the rx style and sometime they use traditional emacs style regexps (in this case "\\`\\s-*\\S-?\\S-?\\s-*\\'").
python-input-filter

This basically is a wrapper for ignoring inferior-python-filter-regexp
python-args-to-list

This will be used by run-python below to collect args for kicking off a python process. Doesn't work with quoted white space. Don't know emacs lisp well at all but it seems like a complicated way to just tokenize on SPACE and TAB.
python-preoutput-result

Not sure what "_emacs_out" is about, and not sure I care to spend the time tracking this down. (Possibly for emacs.py)
python-preoutput-leftover

related to _emacs_out
python-preoutput-skip-next-prompt

This (as well as previous 2 functions) are used by python-preoutput-filter
python-preoutput-filter

This function appears to clean up the output a bit and prevent spurious >>> ... ... >>> from littering the output. I'm going to punt on this. I'm not especially interested in the logic here but it is interesting passing to see a complex function needed to make the python mode behave in a friendly fashion.
run-python

Start a new python process or use an existing one if available. runs python-command with -i.
python-send-command

A wrapper around python-send-string that is used by python-send-region and python-load-file
python-send-region

Copy a section of code to temporary file and evaluate it. Can't do it directly in the interpreter since empty lines will confuse it
python-send-string

Evaluate a python string in the buffer. This function checks for trailing \n or intermediate \t and throws in an extra \n so that the command is properly terminated in a way that the interpreter is expecting.
python-send-buffer

Wrap python-send-region but use the entire buffer as the region
python-send-defun

Send a region but use beginning-of-defun/end-of-defun to delimit the region
python-switch-to-python

Create a new python process if necessary and switch to it. Giving it an argument makes it go to the end of the buffer
python-send-region-and-go

Combine python-send-region and python-switch-to-python
python-source-modes

A list of mode names so we can automatically tell if the buffer has python code in it
python-prev-dir/file

A cache of the directory and file used by python-load-file so that it has a default value if necessary
python-load-file

Import a python file in its entirety into the python process. Uses emacs.py if the file ends in .py otherwise uses "execfile"
python-proc

Return to or create and move to a python process
python-set-proc

Associate the python-buffer with the current buffer (which is a python process). This is used by things like python-send-region so they know where to send output.

I think unless you understand the details of how comint works (which I don't) it's hard to get a strong feel for how the inferior-python-mode works. As always it's interesting to see all of the crazy details that have to be taken care of to get a simple looking thing like an embedded python interpreter working.

That was a very cursory overview of how the python process inferior mode works (even by my already lax standards). I admit it, I'm getting a little exhausted by this project. But I'm going to keep slogging through and finish doing at least a cursory inspection of every line in python.el.

Another thing that is becoming clear is that, while this process is helping me understand emacs and emacslisp, I have a lot of work to do, and at some point I really need to just sit down and learn emacslisp in a focused way.

Thursday, December 4, 2008

Is it possible that I like sushi merely as a means to confound my wife?

Sometimes I wonder....

Tuesday, December 2, 2008

Choose Two

I was cracking wise with my co-workers the other day and later in the day I related the hi-jinx to my wife. I had proffered the following theory on the relative combination of desirable attributes in your mate.

"Gentleman, you've got cute, smart and nice. And you get to choose two."

My attractive, intelligent wife confirms that this is a sound theory.

Tuesday, November 25, 2008

99 problems - python - 49

Gray codes.

An n-bit Gray code is a sequence of n-bit strings constructed according to certain rules. For example,


n = 1: C(1) = ['0','1'].
n = 2: C(2) = ['00','01','11','10'].
n = 3: C(3) = ['000','001','011','010',´110´,´111´,´101´,´100´].

Find out the construction rules and write a predicate with the following specification:


def gray(n):
    if n == 0:
        yield ""
    else:
        base = list(gray_code(n-1))

        for code in base:
            yield "0"+code

        for code in reversed(base):
            yield "1"+code

Saturday, November 22, 2008

99 problems - python - 48

Truth tables for logical expressions (3).

Generalize problem P47 in such a way that the logical expression may contain any number of logical variables. Define table/2 in a way that table(List,Expr) prints the truth table for the expression Expr, which contains the logical variables enumerated in List.


Example:
* (table (A,B,C) (A and (B or C) equ A and B or A and C))
true true true true
true true fail true
true fail true true
true fail fail true
fail true true true
fail true fail true
fail fail true true
fail fail fail true

def int_to_bool_tuple(x,size):
    result = []
    while x:
        result.insert(0, (x & 1) == 1)
        x >>= 1
    return ([False]*(size-len(result))) + result
    
def table2(var_count, func):
    for x in range(2**var_count):
        bools = int_to_bool_tuple(x,var_count)
        for b in bools:
            print "%-5s" % b,
        print func(*bools)

As I was exploring ideas for this solution I noticed that python 2.6 has a bin() function which gives a string representation of a number as binary. Need to upgrade to 2.6 I guess. Of course if I'm going to upgrade, might as well wait for 3.0.

99 problems - python - 46

Define predicates and/2, or/2, nand/2, nor/2, xor/2, impl/2 and equ/2 (for logical equivalence) which succeed or fail according to the result of their respective operations; e.g. and(A,B) will succeed, if and only if both A and B succeed.

A logical expression in two variables can then be written as in the following example: and(or(A,B),nand(A,B)).

Now, write a predicate table/3 which prints the truth table of a given logical expression in two variables.


def and_(x,y): return x and y
def or_ (x,y): return x or y
def not_(x): return not x
def nand_(x,y): return not (x and y)
def nor_(x,y): return not (x or y)
def xor_(x,y): return x ^ y
def impl_(x,y):return not x or y
def equ_(x,y): return x == y

def bool_table(func):
    inputs = [(x,y) for x in (True,False) for y in (True,False)]
    for x,y in inputs:
        print "%-5s %-5s %-5s" % (x,y,func(x,y))

Note: I'm just going to skip 47 since it's pretty much a noop based on how python already works, and how I solved this problem. Unless I'm missing something subtle.

Friday, November 21, 2008

100 Pushups: week 6 (8th try)

It just occurred to me that I've been doing this crazy 100 pushups thing since at least July. I'm still humorously far from my goal, but damn if I'm still making some progress. For instance this week after doing my various reps (150+ pushups in 20+ increments) I was able to get to 53 which is a new record for me.

But nevertheless it is still pretty funny that I've been doing the 6th week of a 6 week program for 8 weeks now. And it *only* took me 6 weeks or so to finish week 5. But I did finish.

The idea of doing 100 pushups seemed sort of comical to me when I started this bad boy, but now it's my mission. I just hope I finish this decade sometime.

Tuesday, November 18, 2008

Pragmatic Thinking and Learning

I've just started this book and while I think that I will get lots of interesting things out of it, it has already set off some alarm bells with me. The same alarm bell that goes off when I hear someone quote the "fact" that we only use 10% of out brains (well, maybe that's true for you...).

In any case the author seems to stick pretty close to the Betty (Drawing on the Right Side of the Brain) Edwards L/R distinction which I've grown skeptical of over the years. That was a low buzz no real alarm, just be wary.

But next came a reference to the"Unskilled and Unaware" paper. I've always thought that result sounded pretty reasonable, but today I came across this.

Hopefully the book doesn't continue to be full of too many debunked/sketchy studies.

I actually expect I will like the book and recommend it, but I just wish you could get a clear, unambiguous picture of state of the art learning/creativity techniques with minimal "woo".

Sunday, November 16, 2008

99 problems - python - 41

Given a range of integers by its lower and upper limit, print a list of all even numbers and their Goldbach composition.

In most cases, if an even number is written as the sum of two prime numbers, one of them is very small. Very rarely, the primes are both bigger than say 50. Try to find out how many such cases there are in the range 2..3000.


# using goldbach as defined previously
def goldbach_list(start,end,threshold=None):
    start = max([start,4])
    for n in range(start,end+1):
        if n % 2 == 0:
            i,j = goldbach(n)
            if threshold == None or i > threshold:
                print "%d = %d + %d" % (n, i, j)

99 problems - python - 40

Goldbach's conjecture. Goldbach's conjecture says that every positive even number greater than 2 is the sum of two prime numbers. Example: 28 = 5 + 23. It is one of the most famous facts in number theory that has not been proved to be correct in the general case. It has been numerically confirmed up to very large numbers (much larger than we can go with our Prolog system). Write a predicate to find the two prime numbers that sum up to a given even integer.


# using primes_range from previous problem
def goldbach(n):
    assert n % 2 == 0

    for i in primes_range(2, n):
        for j in primes_range(i, n):
            if i + j == n:
                return (i,j)

Saturday, November 15, 2008

99 problems - python - 39

A list of prime numbers


# using is_prime from earlier problem
def primes_range(start,end):
    return [n for n in range(start, end+1) if is_prime(n)]

99 problems - python - 37

Calculate Euler's totient function phi(m) (improved). See problem 34 for the definition of Euler's totient function. If the list of the prime factors of a number m is known in the form of problem 36 then the function phi(m) can be efficiently calculated as follows: Let ((p1 m1) (p2 m2) (p3 m3) ...) be the list of prime factors (and their multiplicities) of a given number m. Then phi(m) can be calculated with the following formula:

phi(m) = (p1 - 1) * p1 ** (m1 - 1) + (p2 - 1) * p2 ** (m2 - 1) + (p3 - 1) * p3 ** (m3 - 1) + ...

Note that a ** b stands for the b'th power of a. Note: Actually, the official problems show this as a sum, but it should be a product.


# using prime_factors_mult from earlier
def totient_phi2(n):
    if n == 1:
        return 1

    def product(lst):
        return reduce(lambda x,y: x*y, lst)
    return product([((p-1)*(p**(m-1)))  for (p, m) in prime_factors_mult(n)])

I'm pretty sure that "reduce" will be leaving python 3. We hardly knew ye. I don't understand that removal. Of course I'll just be adding it right the hell back in. And no one can stop me!

99 problems - python - 36

Determine the prime factors of a given positive integer.

Construct a list containing the prime factors and their multiplicity.

Example:

* (prime-factors-mult 315)
((3 2) (5 1) (7 1))


#using prime_factors from the previous problem
import itertools
def prime_factors_mult(n):
    return [(x, len(list(y))) for x,y in itertools.groupby(prime_factors(n))]

99 problems - python - 35

Determine the prime factors of a given positive integer. Construct a flat list containing the prime factors in ascending order.


def prime_factors(n):
    factor = 2
    while n > 1:
        if n % factor == 0:
            yield factor
            n = n / factor
        else:
            factor += 1

This ended up being simpler and more beautiful than I was expecting. I think I'm falling in love with python all over again. *snif*

99 problems - python - 34

Calculate Euler's totient function phi(m). Euler's so-called totient function phi(m) is defined as the number of positive integers r (1 <= r < m) that are coprime to m. Example: m = 10: r = 1,3,7,9; thus phi(m) = 4. Note the special case: phi(1) = 1.


# using coprime defined previously
def totient_phi(m):
    if m == 1:
        return 1
    return len([r for r in range(1,m) if coprime(r,m)])

emacs python mode from scratch: stage 12 - bicycle repair man

For this session we'll get bicycle repair man support working.

First let's see if we can get brm working by itself and then see how it integrates into emacs.

To start it I ran:


M-x python-setup-brm

It took a little googling and experimentation to figure out how to use this. But I finally figured out that if you select some text then the menus do their thing.

This is a strange little tool. I guess I'm so used to refactoring with global search and replaces that this seems overly constraining. But perhaps it's just unfamiliar. I suspect that if you got used to it then you could learn to like it, but I'm skeptical. One behavior that was very counter intuitive is that you can't use the normal emacs undo to revert the refactoring change. You have to do it via the menus (or presumably the brm console). In any case let's look at the code that runs brm.

Firstly it does an autoload of pymacs.

I've never really played with this before but you can see that it is loaded correctly and working by doing the following in *scratch*:


(pymacs-eval "2 + 2")
4

The second prep step is to get the bicycle repair man module itself loaded:


(autoload 'brm-init "bikemacs")

Which I'm assuming works since I was able to do a refactoring

python-setup-brm

This code is mostly just busy work to set up of menus.

Strangely this code ends with an (error ... ) embedded in an (error ...).
Not sure what that is about.

Interestingly in the comments the author of python.el expresses doubts that brm is useful.
Note to self, I should also look at ropemacs. Looks like rope has more features and if I recall an earlier experiment correctly does autocomplete of python variables/modules as well.

Any way, onward and upward. Next stop, context sensitive help.

99 problems - python - 33

Determine whether two positive integer numbers are coprime. Two numbers are coprime if their greatest common divisor equals 1.

Using gcd as defined in the previous problem:


def coprime(m,n):
    return gcd(m,n) == 1

Wednesday, November 12, 2008

99 problems - python - 32

Determine the greatest common divisor of two positive integer numbers.


def gcd(m,n):
    m,n = sorted([m,n])

    while m != 0:
        m, n = (n % m), m

    return n

Humorously, I resisted using a recursive algorithm. I don't know why, but it just seemed more reasonable to use a straight iterative one. Also I really like how m,n = (n%m), m saves us from having to declare a temporary variable.

emacs python mode from scratch: stage 11 - skeleton

We are actually getting to the point where this mode would actually be useful for developing with.

Next let's get skeleton support working now.

Here are the new functions I've copied over:


python-use-skeletons
python-skeletons
python-mode-abbrev-table
def-python-skeleton
def-python-skeleton if
define-skeleton python-else
def-python-skeleton while
def-python-skeleton for
def-python-skeleton try/except
define-skeleton python-target
def-python-skeleton try/finally
def-python-skeleton def
def-python-skeleton class
python-default-template "if"
python-expand-template

python-use-skeletons

This controls whether or not skeleton functions are added to the python-mode-abbrev-table
python-skeletons

A list of all the skeletons. apparently used when working on compound statements.
python-mode-abbrev-table

This is where the actual abbrevs are stored
def-python-skeleton

A macro for getting a skeletons "registered" in the abbrev system. First it creates a name for the skeleton of the form: python-insert-foo. Then if adds this name to the pyton-mode-abbrev-table. Finally it calls define-skeleton and creates the skeleton.

Each of the following create the appropriate skeleton by name


def-python-skeleton if
def-python-skeleton while
def-python-skeleton for
def-python-skeleton try/except
def-python-skeleton try/finally
def-python-skeleton def
def-python-skeleton class

define-skeleton python-else

else clause used by if and other else-able statements. One reason I would find it hard to get excited about using this particular skeleton/template system is that it seems sort of ridiculous to be prompted after a for loop or a while loop, etc for an else clause. I can count on 1 hand how many times i've ever used an else clause other than part of an if cascade. It seems like an unnecessary burden to make the user consider this case every time they are writing a for loop.
define-skeleton python-target

Helper used by try/except
python-default-template

Variable that defines what template to use by default in python-expand-template
python-expand-template

This function is for allowing a skeleton to be used directly as a key sequence rather than as an abbrev. Interestingly if defaults to the "if" template but then let's you update what the default is.

I'm not totally sold on the value of skeletons for python. The syntax is already so minimal that the distraction of writing code through the minibuffer doesn't seem like a big win but it's something i'd like to explore in detail sometime and see if it works for me if i give it a chance.

Next let's get bicycle repair man support working.

Saturday, November 8, 2008

99 problems - python - 31

Determine whether a given integer number is prime.


import math
def is_prime(n):
    for test in range(2, int(math.sqrt(n) + 1)):
        if n % test == 0:
            return False

    return True

(no, I'm not skipping problems, the original set skips too)

BTW, below is the prime number generator function in Haskell in case you haven't seen it before. For my money it's about the most beautiful line (or 2) of code I've ever seen. It almost gives me goose bumps when I see it.

It's deceptively simple looking, but the more you look at it the more your mind is guaranteed to be blown as it sinks in how subtly cool this is


primes = sieve [2..]
    where sieve (p:xs) = p : sieve [x | x<-xs, x `mod` p /= 0]

* (guarantee not binding any where)

99 problems - python - 28

Sorting a list of lists according to length of sublists

a) We suppose that a list contains elements that are lists themselves. The objective is to sort the elements of this list according to their length. E.g. short lists first, longer lists later, or vice versa.

Example:
* (lsort '((a b c) (d e) (f g h) (d e) (i j k l) (m n) (o)))
((O) (D E) (D E) (M N) (A B C) (F G H) (I J K L))

b) Again, we suppose that a list contains elements that are lists themselves. But this time the objective is to sort the elements of this list according to their length frequency; i.e., in the default, where sorting is done ascendingly, lists with rare lengths are placed first, others with a more frequent length come later.

Example:
* (lfsort '((a b c) (d e) (f g h) (d e) (i j k l) (m n) (o)))
((i j k l) (o) (a b c) (f g h) (d e) (d e) (m n))

(a) it should be enough to do

sorted(lst_of_lsts, key=lambda x: len(x))

If you want the lists of same length to be alphabetically sorted as well then you could do:

sorted(sorted(lst_of_lsts), key=lambda x: len(x))

(b)


def length_frequency_sort(lst_of_lists):
    freqs = {}

    for lst in lst_of_lists:
        freqs.setdefault(len(lst), 0)
        freqs[len(lst)] += 1

    return sorted(lst_of_lists, key=lambda x: freqs[len(x)])

99 problems - python - 27

Group the elements of a set into disjoint subsets.

a) In how many ways can a group of 9 people work in 3 disjoint subgroups of 2, 3 and 4 persons? Write a function that generates all the possibilities and returns them in a list.

Example:
* (group3 '(aldo beat carla david evi flip gary hugo ida))
( ( (ALDO BEAT) (CARLA DAVID EVI) (FLIP GARY HUGO IDA) )
... )

b) Generalize the above predicate in a way that we can specify a list of group sizes and the predicate will return a list of groups.

Example:
* (group '(aldo beat carla david evi flip gary hugo ida) '(2 2 5))
( ( (ALDO BEAT) (CARLA DAVID) (EVI FLIP GARY HUGO IDA) )
... )


def combination(lst, size):
    #defined as in previous problem

def grouped_combos(lst, sizes):
    assert len(lst) == sum(sizes)

    if not sizes:
        yield []
    else:
        for combo in combinations(lst, sizes[0]):
            rest = sorted(list(set(lst) - set(combo)))
            for rest_combo in grouped_combos(rest, sizes[1:]):
                yield [combo] + rest_combo

I have a sorted thrown in since using set gave unreliable ordering after the set difference. I really love recursion. It seems like the above would be quite annoying without it.

99 problems - python - 26

Generate the combinations of K distinct objects chosen from the N elements of a list In how many ways can a committee of 3 be chosen from a group of 12 people? We all know that there are C(12,3) = 220 possibilities (C(N,K) denotes the well-known binomial coefficients). For pure mathematicians, this result may be great. But we want to really generate all the possibilities in a list.


def combinations(lst, size):
    if len(lst) < size:
        return
    elif size == 0:
        yield []
    else:
        # keep first
        for combo in combinations(lst[1:], size-1):
            yield [lst[0]] + combo
        # skip first
        for combo in combinations(lst[1:], size):
            yield combo

After I wrote this I remembered that python's test/test_generators.py also had an implementation of this:


>>> def gcomb(x, k):
...     "Generate all combinations of k elements from list x."
...
...     if k > len(x):
...         return
...     if k == 0:
...         yield []
...     else:
...         first, rest = x[0], x[1:]
...         # A combination does or doesn't contain first.
...         # If it does, the remainder is a k-1 comb of rest.
...         for c in gcomb(rest, k-1):
...             c.insert(0, first)
...             yield c
...         # If it doesn't contain first, it's a k comb of rest.
...         for c in gcomb(rest, k):
...             yield c

Which is amazingly similar. I guess in python there really is OBWTDI (one best way to do it).

So anyway I think from here on out the questions will continue to be less and less "trivial".

99 problems - python - 25

Generate a random permutation of the elements of a list.

Once again python provides us with a solution, so we don't have to code at all.

random.shuffle(lst)

Since this works in place you may want to copy your list first before operating on.

Keep your shirt on, these do get interesting after a while. But it is nice to see how many "problems" are already solved for us.

Friday, November 7, 2008

99 problems - python - 24

Lotto: Draw N different random numbers from the set 1..M.

No reason to use anything other than random.sample(population, count)

emacs python mode from scratch: stage 10 - pychecker

My next step is to get pychecker working. Turns out it's not much code. Here are the functions:

python-check-command

the default string to use as the command for running pychecker
python-saved-check-command

a cache of the command above and the name of the file we are
currently working with
python-check

Pretty straightforward but a couple of things to note.

I wasn't aware that the "interactive" command took straight elisp as an argument. I had just assumed you had to use the magic argument strings.

It's a little strange that they use an old school emacs lisp regular expression ("(\\([^,]+\\), line \\([0-9]+\\))") rather than the rx style they seemed to use everywhere else.

A note on pychecker. I find this utility fascinating. For good and bad reasons. It's really quite useful. C-c C-w is reflexively typed after saving a file. And the fact that it is so useful makes it extra confusing that it it seems sort of abandoned (at least last I checked).

As a future project I'd like to look at it (or one of it's cohorts, pylint, pyflakes, etc) and see how it works and perhaps complain less and help more,but it seems strangely abandoned now. I did start to take a look at one point and it seemed very complex so I ran away with my tail between my legs. But fortunately I have a short memory for scary things and I'll likely return to it someday.

Getting pychecker working required less code than I suspected.

Next I'll do skeleton support which if nothing else is a lot more lines of code.

Wednesday, November 5, 2008

99 problems - python - 23

Extract a given number of randomly selected elements from a list.


import random
def random_select(lst, n):
    for x in range(n):
        yield random.choice(lst)

Depending on how you interpret the problem you may just want to use random.sample

99 problems - python - 22

Create a list containing all integers within a given range.

OK, just use range(). Anything else would be silly. Depending on how you interpret this problem you may need to add one to the second parameter.

Monday, November 3, 2008

99 problems - python - 21

Insert an element at a given position into a list.


def insert_at(x, lst, n):
    return lst[:n-1] + [x] + lst[n-1:]

Sunday, November 2, 2008

99 problems - python - 20

Remove the K'th element from a list.


def remove_at(lst, n):
    return lst[:(n-1)] + lst[n:]

The simplest thing to do would be to just use .pop(), but I assuming that a non-destructive operation would be preferred in this case.

99 problems - python - 19

Rotate a list N places to the left.


def rotate(lst, n):
    n = n % len(lst)
    x, y = lst[:n], lst[n:]
    return y + x

99 problems - python - 18

Extract a slice from a list.

Given two indices, i and k, the slice is the list containing the elements between the i'th and k'th element of the original list (both limits included). Start counting the elements with 1


def slice(lst, i, k):
    return lst[i-1:k]

99 problems - python - 17

Split a list into two parts; the length of the first part is given.


def split(lst, n):
    return (lst[:n], lst[n:])

Trust me, these problems do get more challenging after a while. But it is a nice testament to the beauty of python how straightforward something like this is.

99 problems - python - 16

Drop every N'th element from a list.


def drop(lst, n):
    for i,x in enumerate(lst):
        if (i+1) % n != 0:
            yield x

99 problems - python - 15

Replicate the elements of a list a given number of times.


def replicate_elements(lst, n):
    for x in lst:
        for y in range(n):
            yield x

Saturday, November 1, 2008

99 problems - python - 14

Duplicate the elements of a list.


def duplicate_elements(lst):
    for x in lst:
        yield x
        yield x

real age

I had set a goal to get to a certain weight (210) and then get a physical and blood profile (cholesterol, etc). As it happens I received my results just as it was my 40th birthday.

One of my objectives of getting the blood profile was so I could complete the realage survey in excruciating detail. So I finally filled it in today and was pretty pleased with my results.

physical age = 40
"real age" = 27.9

Of course that is an absurdly precise number, but I'm still proud of my result. And my wife was part jealous and part suspicious. So she went over all my answers with me and helped me adjust them to less fantasy based answers. And my score went from 26.7 to 27.9.

Now we are in serious competition mode. She will be getting a blood profile done later this week.

Any way, I have the body of a 27.9 year old. I can do 50 pushups (and working towards 100). Things could be worse.

Wednesday, October 29, 2008

emacs python mode from scratch: stage 9 - rest of the python indent code

So my idea was to move onto something completely new but wouldn't you know it, I found a couple more indentation related functions that needed dealing with.

The functions/variables in question are:


python-guess-indent (both variable and function with this name)
python-guess-indent
python-beginning-of-defun
python-end-of-defun
python-backspace

And just for fun we'll throw in the key mapping for python-backspace

(define-key map "\177" 'python-backspace)

So let's quickly browse through these functions and see how they work and make sure each works as promised.

python-guess-indent (variable)
Seems weird to me that there can be a variable and function with the same name. I'm guessing this is a due to emacslisp being a lisp-2
This guess seems to be confirmed by: http://coding.derkeiler.com/Archive/Lisp/comp.lang.lisp/2008-01/msg00683.html
In any case having this being true means that python mode will be using the following indentation guessing function.
python-guess-indent (function)
This function does what it's name implies.
Looks for a ":" character and then moves to the beginning of that line. Then it moves to the following line and takes the difference. If the different is >= 2 and <= 8 then it take that as a guess. Otherwise it continues to the next ":".
If the value it found is different from the default value then it sets it to this new value.
python-beginning-of-defun
Look for a regular expression consisting of space and either def or class.
python-end-of-defun
This is a little trickier than python-beginning-of-defun. Here's another place where python's whitespace makes life hard. Good thing it's pretty to look at.
First find the beginning of the block we are currently in. If we are at the beginning of a line then but not at the start of a block, move forward until we find a block starting line.
From here python-end-of-block does the rest of the work.
python-backspace
Look at the list of valid indentation levels for the current line and then back track to the next smallest.
Also present a message explaining what block you just closed.

I'm now 40% through. Almost starting to seem like I might make it all the way through this puppy.

Tuesday, October 28, 2008

99 problems - python - 13

Run-length encoding of a list (direct solution). Implement the so-called run-length encoding data compression method directly. I.e. don't explicitly create the sublists containing the duplicates, as in problem 9, but only count them. As in problem P11, simplify the result list by replacing the singleton lists (1 X) by X


def last_element(lst):
    if not lst:
        return None
    elif type(lst[-1]) == tuple:
        return lst[-1][1]
    else:
        return lst[-1]

def increment_last(lst,x):
    if not lst:
        lst.append(x)
    elif type(lst[-1]) == tuple:
        count = lst[-1][0]+1
    else:
        count = 2

    lst[-1] = (count, x)

def encode_frugal(lst):
    result = []
    for x in lst:
        if x == last_element(result):
            increment_last(result,x)
        else:
            result.append(x)

    return result

Here's another one that seems a little more verbose than necessary, but that's what I come up with for a first pass.

Monday, October 27, 2008

99 problems - python - 12

Decode a run-length encoded list. Given a run-length code list generated as specified in problem 11. Construct its uncompressed version.


def decode(encoded):
    for x in encoded:
        if type(x) == tuple:
            for _x in range(x[0]):
                yield x[1]
        else:
            yield x

By now you should know to throw a "list" into the mix if that's the way you swing.

Sunday, October 26, 2008

99 problems - python - 11

Modified run-length encoding. Modify the result of problem 10 in such a way that if an element has no duplicates it is simply copied into the result list. Only elements with duplicates are transferred as (N E) lists.


import itertools
def run_length_encode2(lst):
    # could just reuse run_length_encode function from last time
    groups = [list(x[1]) for x in itertools.groupby(lst)]
    return [len(g) > 1 and (len(g), g[0]) or g[0] for g in groups]

Hmmm.... I use 2.4 at work, but I need to keep up with the times, so here is the more modern way (let's get ternary with it)


import itertools
def _run_length_encode2(lst):
    groups = [list(x[1]) for x in itertools.groupby(lst)]
    return [(len(g), g[0]) if len(g) > 1 else g[0] for g in groups]

Saturday, October 25, 2008

99 problems - python - 10

Run-length encoding of a list. Use the result of problem P09 to implement the so-called run-length encoding data compression method. Consecutive duplicates of elements are encoded as lists (N E) where N is the number of duplicates of the element E.


import itertools
def run_length_encode(lst):
    # could just call "group" from last solution
    groups = [list(x[1]) for x in itertools.groupby(lst)]
    return [(len(g), g[0]) for g in groups]

99 problems - python - 9

Pack consecutive duplicates of list elements into sublists. If a list contains repeated elements they should be placed in separate sublists.


import itertools
def group(lst):
    return [list(x[1]) for x in itertools.groupby(lst)]

99 problems - python - 8

Eliminate consecutive duplicates of list elements.


import itertools
def squeeze(lst):
    return [x[0] for x in itertools.groupby(lst)]

What can I say? I like itertools.

99 problems - python - 7

Flatten a nested list structure.


import types
def flatten_list(lst):
    if type(lst) not in (list, tuple, types.GeneratorType):
        yield lst
    else:
        for x in lst:
            for _x in flatten_list(x):
                yield _x

I'm being a little loose with the definition of a list here but I think this a behavior I would want from a function like this. Call it as list(flatten_list(my_list)) to get an actual list.

Something doesn't sit quite right with me about that nested for statement. Maybe something more elegant will come to me later.

Friday, October 24, 2008

99 problems - python - 6

Find out whether a list is a palindrome. A palindrome can be read forward or backward; e.g. (x a m a x).

def is_palindrome(lst):
    return lst == lst[::-1]

The first thing that came to mind was lst == list(reversed(lst)), but that doesn't quite work if the "sequence" you pass in is a string. Besides I hate to pass up an opportunity to use "[::-1]"

I have to admit that using languages like Haskell has made me sensitive to weirdnesses of python that I never really noticed before. For instance the fact that doing list("some string") doesn't return a list of characters but instead a list of strings of length 1. Ugh.

Thursday, October 23, 2008

99 problems - python - 5

Reverse a list.

reversed(my_list)

# or for you purists
list(reversed(my_list))

BTW, sorted and reversed are two of my favorite semi-recent additions to python

99 problems - python - 4

Find the number of elements of a list.

len(my_list)

OK, bear with me, these really do get harder after a while.

Wednesday, October 22, 2008

99 problems - python - 3

Find the K'th element of a list. The first element in the list is number 1.


def kth_item(lst):
    return lst[k-1]

99 problems - python - 2

Find the last but one element of a list.

my_list[-2]

I could go on all day like this.

99 problems - python - 1

Find the last element of a list.

my_list[-1]

Talk about your l33t hacking skills. :)

Saturday, October 18, 2008

99 problems - python - intro

There is a fun looking set of problems making the rounds. Here is the haskell flavor of them.

They start off sort of ridiculously easy and then get to pretty damn challenging. I'll see how far I can get with my mad skillz.

Twelve Virtues of Rationality

I have a calendar entry set to remind me once a month to reread the following:

Twelve Virtues of Rationality

I highly recommend it for anyone who values clear thinking and considers pursuing the truth as a good in itself.

(If that sip was to your liking, you can turn on the firehose here)

Wednesday, October 15, 2008

emacs python mode from scratch: stage 8 - python-indent-line

OK, now that we have python-calculate-indentation finished let's see who calls it (and hopefully work our way up to the point that tab cycling actually works).

So here are the call chains for python-calculate-indentation and a couple of related functions.


python-indent-line
   python-indent-line-1
      python-calculate-indentation
   python-block-end-p

python-indent-region
   python-indent-line-1
      python-calculate-indentation

python-electric-colon
   python-calculate-indentation

So our new functions we will bring over this time are:


python-block-end-p
python-indent-line-1
python-indent-line
python-indent-region
python-electric-colon

That's about 90 lines so that's good for one session. In addition to these functions we'll add a in the key mapping for python-electric-colon and the indent assignments within define-derived-mode (indent-line-function, indent-region-function).

Well, I don't know yet know *why*, but I can confirm that tab cycling works as I expected. Hopefully it will be clear as I go through the code what exactly is happening.

As for the electric colon, after experimenting around I was able to figure out a scenario which makes a line of code "outdent" as promised:


if foo:
    bar
    else:

My first instinct is to try to understand the tab cycling by digging down from the top and seeing how the tab key itself is bound.


TAB (translated from ) runs the command indent-for-tab-command
  which is an interactive compiled Lisp function in `indent.el'.
It is bound to TAB.
(indent-for-tab-command &optional arg)

Indent line in proper way for current major mode or insert a tab.
Depending on `tab-always-indent', either insert a tab or indent.
If initial point was within line's indentation, position after
the indentation.  Else stay at same point in text.
The function actually called to indent is determined by the value of
`indent-line-function'.

So this points us to indent.el which is where the magic happens and defines why the major mode needs to define indent-line-function and indent-region-function.

My next step was to look for descriptions of this indent functionality in the emacs manual but I quickly got tired of poring over that. So back to looking at the code in the library.

So let's work through the functions above:


python-block-end-p
python-indent-line-1
python-indent-line
python-indent-region
python-electric-colon

python-block-end-p

Check to see if the current line is the last line of an indented block. It's sort of surprising that this functionality hasn't been needed so far.

It checks that the current line is not a comment and that that either: it's for sure the end of a block (e.g. raise, etc) or the current indentation is less that the previous line's indentation.

That seems a little loose on the meaning of what I would expect a block-end checker to do. But from looking at how it is used in context I see that it probably is sufficient for it purpose. Basically it is simply being used to determine whether or not to display a message about what block is being closed by the position that the current indenting level. So close enough. Might be confusing if someone tried to use this function more generally.
python-indent-line-1

The comments for this function indicate that this is how indenting would be done if there was no cycling. Which essentially means it will take the last value from the full list of possibilities and just use that.
python-indent-line

So this is essentially the money function for doing tab cycling and now that I know so much more about the work going on underneath the covers it seems comically short. And I certainly wouldn't have really understood just how magical this function is without looking at the gory details myself.

The basic logic is to check if we are already cycling. If so then we move to the next known position and try to show what code block we are closing if that makes sense.

If we are just starting a tab cycle then we just call python-indent-line-1 and set up python-indent-index to be ready for cycling.
python-indent-region

So this function is a little strange. According to the emacs manual indent-region-function is used to act as a fast way to indent lines. The default behavior is to just run the indent-line-function on each line. But this code just goes ahead and loops over every line and just runs python-indent-line for each. Nothing obvious in this code to indicate what the advantage of walking over each line by hand would be. A mystery.
python-electric-colon

So this guy is mapped to ":" and usually just inserts a ":" unless we have the case I outlined above. The basic logic is to check if we are at the end of a line, whether we are in an outdentable context (python-outdent-p), not in a string or comment and we are past the point that it is predicted we should be indented to.

This function is followed by:
```
(put 'python-electric-colon 'delete-selection t)
```
Which as far as I can tell is related to delsel.el. Apparently delete-selection is used by this mode when you are deleting selected text a la windows behavior (ie delete selected text when you type).

Now that I have tab cycling pretty much wrapped up I have this hope that much of the rest of the code will be a little more straight forward.What can I say, I'm an optimist at heart.

As far as code coverage goes, I've copied over 33% of the code. Wow, I'm a third of the way done.

Wednesday, October 1, 2008

emacs python mode from scratch: stage 7 - python-calculate-indentation

We continue with our task of trying to get tabbing working by copying over python-calculate-indentation and examining how it works.

First thing that is apparent is that we also need these variables to be defined and set:

python-indent-string-contents
- if true, then for doc strings do alignment
python-continuation-offset
- how many spaces to add if indenting within a line following a continuation (ie "\")
python-honour-comment-indentation
- go ahead and align a comment with a previous comment even if it's not itself in a "normal" position

So on a first glance python-calculate-indentation doesn't seem to work as i would have expected. I just get one position back and the value doesn't change on successive runs even if there are various logical candidates. So my guess is that this function returns the best first guess and then a higher level function calling this will changes some global variable so that this function will return the next best and so on until it cycles through.

So let's see what is really going and and how good my guess is.

This function (similarly to python-indentation-levels) is just one big "cond". The various sections are:

multi-line string
after a backslash or in brackets (ie continuation context)
beginning of buffer
non-indentable comment
the rest

The number of new cases here is a bit of a surprise since I had half expected that the previous function we looked at (python-indentation-levels) already handled the main cases. But I guess not.

multi-line string

If we are currently in a string then do one of two things. If we have set our preferences so that we don't want to indent string contents, then just take the current indentation.

Otherwise if we are on a line that starts the string, then just keep the current indentation. The actual work horse of this section is to keep walking backward a line at a time until we find a non-blank line and then use that as the indentation level.
after backslash or in brackets

The logic here might be best summarized with pseudo code
```
   if in bracketed expression
       find first item in list and align with that
       otherwise align with the start statement (plus a "step" per bracketing level)
   else # must be backslash continuation line
       if this is not the first continuation line
          take indent of previous line
       else
          indent one "step" (plus even one more if previous line
          is a opening block)
```
Fortunately this function is pretty well documented or it would be a bear to get through. Well, it's still a bear, but it's less of a bear. I wonder if instead of good documentation this would be a good place to create well named smaller functions. My personal style is to aim for functions that are the width of my hand (or two). Not being able to see the function in its entirety on one screen is pretty obnoxious to me.

Note: needed to remember that "if" has one clause for the then statement (hence we find a progn) but the else can be any number of clauses. Strange asymmetry.
beginning of buffer

No choice here. It's gotta be 0.
non-indentable comment

There is a note here that basically says that they copied this behavior from python-mode.el but that it's not necessarily a good idea to have done so.

A non-indentable comment seems to be a comment line that has no space-like characters between the comment starter ("#") and the text
```
    e.g.
       # hello
       #there
```
That seems weird. But now I know to look for this behavior when editing.
the rest.

So this is weird. This is where the last function we looked at (python-indentation-levels) will get called. but at this point I'm confused about how how they interact. Any way let's wade through this and see what we can figure out

There are four main items here
- move backward over white space if need to
- call our friend python-indentation-levels
- prefer indenting of comments with a following statement
- take the caar (car of the car) of the last thing in python-indent-list

I have to admit at this point that I can mostly grok the code but the actual details are quite daunting. I would be hard pressed to recreate this logic from scratch. Also I'm still completely nonplussed as to how the magic cycling of tabbing options occurs.

Peeking around the original file I see a reference to cycling so I hope things get more clear when I get to that. But, once again we've been through a bunch of code and we still don't have a working tab cycler yet. But I think we are within spitting distance now.

For what it's worth, I'm now about 30% of the way through the code.

Saturday, September 27, 2008

100 Pushups: week 5 (finished)

You'd think I had finished the whole stinking program, but I'm feeling pretty good about finishing week 5 after numerous iterations. I checked my records and I've been doing week 5 since the middle of July.

But I did finish it. So, I may suck, but I don't suck as much as people who *haven't* finished week 5 ever. :)

Any way, on to week 6 now. (For who knows how long).

Wednesday, September 17, 2008

emacs python mode from scratch: stage 6 - python-indentation-levels

OK now let's see if python-indentation-levels works as advertised when it is copied over.

For the following test code block (with cursor at position POINT):


class Foo(object):
    def __init__(self, *args, **kwargs):
        print "hi"
        print 'qwerty' [POINT]

it returns the following list:


((0 . #("class Foo(object):" 0 5 (face font-lock-keyword-face fontified t) 5 6 (fontified t) 6 9 (face font-lock-type-face fontified t) 9 18 (fontified t)))
 (4 . #("def __init__(self, *args, **kwargs):" 0 3 (face font-lock-keyword-face fontified t) 3 4 (fontified t) 4 12 (face font-lock-function-name-face fontified t) 12 13 (fontified t) 13 17 (face font-lock-keyword-face fontified t) 17 36 (fontified t)))
 (8 . #("print \"hi\"" 0 5 (face font-lock-keyword-face fontified t) 5 6 (fontified t) 6 10 (face font-lock-string-face fontified t))))

This is a list of lists where each internal list consists of a pair of tab position and the object with which it is matching.

So everything seems to be functioning correctly,now we just need to delve into how this works exactly.

Basically the whole function is a cond with three cases:

statement following a block open statement
comment following a comment
everything else

Before we go through these three cases I have to say that the style is very new to me. The "predicate" part for the first two cases of each case branch is a block of code that acts as if what it is testing is true. If it succeeds then it is true and it sets the indent list appropriately, otherwise it tries the next case. It is just an unusual style (to me) to run code in a "if" statement like this. But it obviously works, so I'll just adjust my expectations accordingly.

statement following a block open statement

The logic here is to that check all of the following actions/tests work:
- move to a previous statement
- check that it is an opening block statement
- save the value of the indent
- move to the end of the statement
- skip comments and blanks
- make sure we are at a ":"
- add the fixed indent amount to the indent of the previous block statement
I don't really understand all the machinations here.

Intuitively I would have stopped after the first 3 steps.

Ahh... now I see. The comments are useful here.
```
   ;; Check we don't have something like:
   ;;   if ...: ...
```
So if we go to the end of the statement and don't find a ":" we have the above scenario and the "normal" indenting rule won't work.
comment following a comment
This is more straightforward. If the current line is a comment and the previous line is also a comment, then there is only one choice for indent levels: the indent level of the previous line.
all other cases.
This logic doesn't look as bad as I would have at first suspected.

The first thing added to the list of indentation levels will be the position of the previous lines indentation *if* the previous line is part of a pair like if/else that makes sense to line up with AND it is not a block closer (e.g. return) that doesn't make sense to line up with

Next we are going to crawl up a block at a time and collect indentation levels on the way up. We only skip a level if we had a word like "else" and the block we are examining doesn't match our "start" word

Even if we had nothing we throw a 0 position on the list for good measure and then set a couple global values (python-indent-list and python-indent-list-length)

So that was one of the first "big" functions I've had to work through and it wasn't too bad. It's amazing how many functions were required do something as simple sounding as get suggested indentation levels.

And of course we *still* can't indent. So the next phase will hopefully be using the output of this function to actually navigate using tab. I would never have guessed so much magic was going on when the tab key was hit.

I will need to keep in mind that these values are being set globally and see if I can guess why. As a guess I'd say its because this operation is expensive and while you are on a line there is no need to recalculate it. If that's true I should at some point see some code that recognizes that point has moved to a new line and invalidates the current python-indent-list

We are now 23% of the way through the code (by lines - including comments).

Tuesday, September 9, 2008

emacs python mode from scratch: stage 5 - more movement methods

So let's continue with the seemingly modest goal of getting tabbing to work

Presumably my current target is to get python-indent-line working

From some quick browsing this function has the following important dependencies:


db-python-indent-line
  db-python-indent-line-1 ;; which sets global: db-python-indent-list-length
    db-python-calculate-indentation (104 lines)

And then looking at python-calculate-indentation we find it requires


   - db-python-indent-string-contents # global var
   - db-python-indentation-levels  # 59 lines

So I guess this time our target is to get the support functions in place for python-indentation-levels.


- python-indent
- python-block-pairs

and the function(s)


- python-first-word
- python-initial-text
- python-beginning-of-block
- python-end-of-block

python-indent:

This is pretty straight forward, set a customizable variable for what the default number of columns for indentation will be. Just to be thorough we need to look up what safe-local-variable means.

From poking around in files.el


;; Safe local variables:
;;
;; For variables defined by major modes, the safety declarations can go into
;; the major mode's file, since that will be loaded before file variables are
;; processed.
;;
;; For variables defined by minor modes, put the safety declarations in the
;; file defining the minor mode after the defcustom/defvar using an autoload
;; cookie, e.g.:
;;
;;   ;;;###autoload(put 'variable 'safe-local-variable 'stringp)
;;
;; Otherwise, when Emacs visits a file specifying that local variable, the
;; minor mode file may not be loaded yet.
;;
;; For variables defined in the C source code the declaration should go here:

So basically it's a way to do some simple type checking on a variable.

python-block-pairs
This is an alist of python keywords and the keywords that they are the "closers" for.
python-first-word
A simple function that returns to the beginning of code for a line and calls current-word
python-initial-text
A function to grab the non-comment code on a line. Interestingly this function doesn't seem to work as advertised. Both in my hand copied subset of code and in the full working original both seem to keep the comments at the end of the line.

I will have to keep this in mind when looking at how it is used to see if this will affect the functionality.
python-beginning-of-block
Move point to beginning of containing block. It starts by moving past any blank space and/or comments and then going to start of current statement.

Then while point is not on the 0th column and/or the arg passed in is not 0 continue recursively until one of the target conditions is true

The logic here is a bit of a challenge (for me) to follow. Its a twisty maze of whiles, whens, ands, recursion and throw/catches.

It seems to be doing something like:

Move up a line and continue doing it until either we can't go up any more or we hit the condition where we generate a 'done (ie the conditions match a new outer block have been hit).

This is probably the most "lispy" code so far and seemed excessively weird when I first tried to wrap my brain around it. But it seems halfway sensible to me now
python-end-of-block
For some reason I was expecting this function to be a simple variation of python-beginning-of-block, but the logic is fairly different. It *does* use many of the same function but the way they are combined is not a simple reverse of the above logic.

This function and the above are definitely a bit more sophisticated in their logic and the code is more strange (to me). As an example:

within a let* expression there is the following "variable assignent" which is really not a variable assignment.
```
   (_ (if (db-python-comment-line-p)
          (db-python-skip-comments/blanks t)))
```
At this point I'm not sure whether to think this is an ugly hack or just to try and get used to this as idomatic emacslisp. Seems like it would be cleaner just to do this test and function call before the let*, or within a nested let. Hmmm... Can't do it before because we need the value of point before moving and having a nested let makes the code more complicated. I guess I'm already starting to see this more as a clever hack than first blush.

More strangely, python-beginning-of-block is called recursively and this function is an iteration. I wonder if these were done by different people or in different moods. Or perhaps there is some compelling reason not to do it recursively. Not obvious to me in any case why the difference in style.

Anyway, we are now closer to being able to implement some tab bevhavior. With any luck next phase will have us implementing python-indentation-levels

By lines of code I'm about 20% of the way through.

Sunday, September 7, 2008

100 Pushups: week 5

I have vowed to get to one hundred pushups. And I *will* keep at it well past the point where a sane person would just cut their loss. But in case any one is wondering, I've been on week 5 for about 6 weeks now.

*sigh*

True, there was a week off because I was sick and another following that for a week vacation. And yes, I did lose a lot of progress during those two weeks. But man, this is a non-trivial challenge.

I really have to wonder how many people have followed this (and exclusively this) plan to victory.

Any way, as long as each time I go through week 5 I get a little better then I don't feel *too* bad about being relentless. It's when I plateau that I have to start wondering about my sanity.

For what it's worth I can currently do a max of 40 pushups in one set. 100 stills seems absurdly far off.

Tuesday, August 26, 2008

Puzzle: Determine the smallest integer

Here is a puzzle a friend forwarded to me recently:

Ten (not necessarily distinct) integers have the property that if all but one of them are added the possible results are: 82, 83, 84, 85, 87, 89, 90, 91, 92. What is the smallest of the integers?

(I'll post the solution and my method in a few days.)

Tuesday, August 19, 2008

emacs python mode from scratch: stage 4 - utilities (the rest)

Let's continue from the last entry and finish off the utilities.

The rest of the utilities:


(defun python-comment-line-p ()
(defun python-blank-line-p ()
(defun python-beginning-of-string ()
(defun python-open-block-statement-p (&optional bos)
(defun python-close-block-statement-p (&optional bos)
(defun python-outdent-p ()

python-comment-line-p

Go to the end of the line, if the emacs parser says we we are in a comment, go to the beginning of the line per indentation and check if we are looking at a comment or the end of a line.
This is weird. Seems like there is no way for this to ever be an end of line.
comment-start seems to be a symbol required by rx not necessarily a value set in define-derived-mode (which I'm not setting yet)
As an aside, the regex for start of comment is \s<
I don't think that line-end is a necessary part of that regular expression or at least if i take it out it doesn't seem to change the behavior but I'll just leave it in for now.
Presumably we are just depending on the comment character set in python-mode-syntax-table

python-blank-line-p

This is about as straight forward as you get. Go to the beginning of the line and check if you are looking at 0 or more white space chars followed by an end of line. \\s- is the ever so strange looking way emacs regular expressions represent white space.

python-beginning-of-string

Hey, this stuff is starting to look familiar. First we determine what state a parser would be at the current position. Then if the parser says we are in a string, we go the the starting point (8th item of the state list)

python-open-block-statement-p

So this and python-close-block-statement-p and python-outdent-p call a number of things that aren't already defined which is strange since you'd think some functions described as utilities would be self contained. The additional functions are:
```
python-beginning-of-statement and python-previous-statement
```
So we'll assume these functions do what they claim for now and investigate them more closely down below
This logic is pretty straight forward. Go to the beginning of the statement and then use a regular expression match to see if we are at the beginning of a block.
NOTE: there is an optional parameter that let's you skip the movement to the beginning of the statement. Seems like a simple performance helper.

python-close-block-statement-p

This is the exact analog of python-open-block-statement-p. Except here we check to see if it is something that for sure is the last member of a block.

python-outdent-p

Check if current line should be "outdented". Again pretty straight forward
code. Move to the beginning of the the current indentation and check if all of the following are true: looking at (else, finally, except or elif) and not in a comment or string and check that the previous statement is neither a close block nor an open block.
In other words if on something like an else statement and the previous statement is not something that requires indentation you should outdent.
Seems like this is just sort of a heuristic and not quite accurate.
E.g. for an improperly nested "else" following a return. It will say not to outdent. Of course, what *should* you do when the code is invalid?

python-beginning-of-statement

(this requires python-skip-out so we add that to the list as well)
Here we will move to the beginning of the line and then if on a continuation of some sort we'll either check if we are on a backslash style of continuation and move backward over these or we will move backward over strings and "skip out" (jump up) from nested brackets.

python-previous-statement

(this needs python-next-statement so we add that to the list as well)
If it receives a negative argument they really want to go forward, so pass off to python-next-statement. Otherwise, go to the beginning of the statement and skip over comments and blanks and continue going to the beginnnig of the statement until counter is 0 (or reach beginning of buffer)

python-skip-out

Pop up out of nested brackets to the front by default and to the
end if "forward" is set. Additionally if "syntax" of point is already available, then that can be passed in. For well formed nesting we just call backward-up-list with the depth. For ill-matched brackets we try to go backward up the list over and over until we get an error.

python-next-statement

This is pretty much an exact analogue of python-previous-statement.

There were a surprising number of functions that the utilities them self depended on. I continue to be amazed by the complexity of making a language sensitive mode for emacs.

I also noticed that I'm over 10% of the way done. Making progress.

Thursday, August 7, 2008

emacs python mode from scratch: stage 3 - some utilities

As I mentioned last time my plan was to do tabbing/indentation next but there was a lot of reliance on the utility section of code so the next step becomes getting these helper functions copied over and understood.

There are about 100 lines in the utility section so let's do half at a time. In the first 50 lines we have the following functions/definitions:


(defsubst python-in-string/comment ()
(defconst python-space-backslash-table
(defun python-skip-comments/blanks (&optional backward)
(defun python-backslash-continuation-line-p ()
(defun python-continuation-line-p ()

python-in-string/comment

defsubst is a marcro for inlining a function which seems strange. I wonder if this was just a performance booster. (Is there any other reason to inline code?)

syntax-ppss returns info on what the parser state would be at the current position. Below are the states and its clear why nth 8 is the thing we need for determining if we are in a string or comment


  0. The depth in parentheses, counting from 0. Warning: this can be negative if there are more close parens than open parens between the start of the defun and point.
  1. The character position of the start of the innermost parenthetical grouping containing the stopping point; nil if none.
  2. The character position of the start of the last complete subexpression terminated; nil if none.
  3. Non-nil if inside a string. More precisely, this is the character that will terminate the string, or t if a generic string delimiter character should term inate it.
  4. t if inside a comment (of either style), or the comment nesting level if inside a kind of comment that can be nested.
  5. t if point is just after a quote character.
  6. The minimum parenthesis depth encountered during this scan.
  7. What kind of comment is active: nil for a comment of style â?oaâ?? or when not inside a comment, t for a comment of style â?ob,â?? and syntax-table for comment that should be ended by a generic comment delimiter character.
  8. The string or comment start position. While inside a comment, this is the position where the comment began; while inside a string, this is the position where the string began. When outside of strings and comments, this element is nil.
  9. Internal data for continuing the parsing. The meaning of this data is subject to change; it is used if you pass this list as the state argument to another call.

syntax-ppss seems like an insanely powerful feature. My respect for the sophistication going on behind the scenes when emacs opens a file for a certain language continues to grow.

python-space-backslash-table and python-skip-comments/blanks

This seems to be a "throw away" data structure for overriding the *real* syntax table temporarily
It provides a syntax table that redefines "\" as a whitespace class. Presumably this is for allowing movement over otherwise blank continuation lines as if they were whitespace.
What this meant wasn't immediately obvious, but it seems to be addressing the type of situation below (which honestly I haven't seen in the wild before, but seems to be legal python code).
```
if x and \
        \
        \
        \
        \
  y:
   print z
```
For movement we will use forward-comment which moves over up to X comments. It moves backward if arg is negative. The forward motion is straight forward. The backward motion starts by positioning point so that it is at the *start* of the comment (probably to avoid having it forward-comment move it to the start of the comment itself which would probably seem like a noop to the user)

python-backslash-continuation-line-p

This is pretty straight forward. Check if the last char on the previous line is a \ and make sure not in a comment or string
syntax-ppss-context is a an undocumented function (but trivial) that checks if the current syntax parsing state is string or comment

python-continuation-line-p

is an extension of the above function and checks for the case of a continuation char (ie the above function) or if in a matching paren type context that is allowed to span multiple lines.
The syntax-ppss-depth function tells you how far nested you are in parens, braces, etc. The interesting part of this function is that if you have unmatched parens then it tries to move up the list and assumes that if you succeed then you must be in something close enough to matching lists after all. I wasn't able come up with a set of braces that tickled this case but otherwise the logic makes sense.

So that is the first half of the utility functions.

I'm still overwhelmed by the amount of complexity required to get a language mode working. I haven't even gotten something seemingly simple like tab support working. I guess this is like a lot of things. When you start to understand a new system, many of the hard things turn out to be trivial and many of the seemingly trivial things turn out to be quite complex.

I had never really browsed through the emacs lisp manual before and I'm finding it is quite helpful and not too hard to navigate.

Eventually I'll have to understand a mode for some other language(s) as well. I'm curious how much python's significant whitespace makes makes things harder. Presumably the default language mode behaviors are tuned for something like lisp and/or C.

I also wonder if it would be possible to use pythons own parser in place of emacs for doing some of the syntax checking activities. That would be an interesting project to pursue. As will be looking at the pymacs module.

Settle down there tiger. One thing at a time....

Friday, August 1, 2008

emacs python mode from scratch: stage 2 - comments

So my next plan was to look at indenting/tabbing, but I found almost immediately that (a) it's really bizarrely complicated and (b) it depends (in some of the supporting functions) on comments being recognized correctly. So I'm forced to figure that out now. So, rather than reading the emacs manual (I mean *anyone* could do that) I used the tried and true binary code search technique and started commenting things out (starting with the original code) until font-locking for comments stopped working.

That worked like a charm. The missing piece of the puzzle was this little fella:


(defvar python-mode-syntax-table
  (let ((table (make-syntax-table)))
    ;; Give punctuation syntax to ASCII that normally has symbol
    ;; syntax or has word syntax and isn't a letter.
    (let ((symbol (string-to-syntax "_"))
   (sst (standard-syntax-table)))
      (dotimes (i 128)
 (unless (= i ?_)
   (if (equal symbol (aref sst i))
       (modify-syntax-entry i "." table)))))
    (modify-syntax-entry ?$ "." table)
    (modify-syntax-entry ?% "." table)
    ;; exceptions
    (modify-syntax-entry ?# "<" table)
    (modify-syntax-entry ?\n ">" table)
    (modify-syntax-entry ?' "\"" table)
    (modify-syntax-entry ?` "$" table)
    table))

Strangely you don't have to actually *use* this variable anywhere, it just needs to be defined. If I had to guess off the top of my head, I'd say that define-derived-mode uses this syntax-table if it's available. Otherwise it just falls back to some default behavior.

I was surprised also to see that my friends:


;;   (set (make-local-variable 'parse-sexp-lookup-properties) t)
;;   (set (make-local-variable 'parse-sexp-ignore-comments) t)
;;   (set (make-local-variable 'comment-start) "# ")

still don't seem to be necessary. So they remain commented.

Because I'm now using the block of code above in my "from scratch" python mode, technically I'm supposed to understand it now that I've copied it in. So let's look at what this guy does.

Looking in derived.el we find this little guy:


(defsubst derived-mode-syntax-table-name (mode)
  "Construct a syntax-table name based on a MODE name."
  (intern (concat (symbol-name mode) "-syntax-table")))

So it's not too surprising that it's using the name python-mode-syntax-table somewhat auto-magically.

And to really seal the deal we find in the elisp manual that "<" is the syntax class for comment starter. So that's part of what's getting set when this syntax-table variable is defined.

Also of interest is that initially all symbol constituent chars (except "_") are reassigned to the punctuation character class. This makes sense since python only allows numbers/letters and _ in variable names. Everything else is a type of punctuation.

Additionally we change the syntax class of a few more characters. Notably we tell it that "'" is a quote character and "`" is a paired delimiter.

Cool. Now we have comments that get correctly colored as comments.

Thursday, July 31, 2008

OSCON: Tee Hee Wii

My co-worker who also attended OSCON this year was making fun of me for putting my business card in all and every contest receptacle. But today I found that it paid off. I won a drawing for a wii.

Who's laughing now monkey boy?!

Wednesday, July 30, 2008

emacs python mode from scratch: stage 1 - syntax coloring

First a note on my method. In order to keep my regular python mode functional while I'm playing around, my copy of the python mode uses different function names where possible (prepend "db-" to the normal name) and I'm targeting .pydb files (instead of .py).

So far this seems to work fine. I have the following in my .emacs file:

;; (load-library "db-python-1")
;; (add-to-list 'auto-mode-alist '("\\.pydb\\'" . db-python-mode))

After changes, I run the load-library command and it seems to pick up any changes that I've made recently. (In case it's not clear my stage 1 library is in a file called "db-python-1.el" and it provides db-python-mode). I plan to change the library file name as I progress and leave the db-python-mode name constant.

I am surprised on how much stuff is going on to get a mode working. But even more surprising is how understandable it is. I definitely don't understand all the working parts so far and not everything I've tried has worked but in general the pieces are falling pleasantly into place.

For stage 1, syntax coloring *mostly* works. For whatever reason I wasn't able to get comments to color correctly. My first guess was that one or more of the following:

(set (make-local-variable 'parse-sexp-lookup-properties) t)
(set (make-local-variable 'parse-sexp-ignore-comments) t)
(set (make-local-variable 'comment-start) "# ")

would do it, but so far no avail. But since I didn't get it working that means I don't really understand the lines above so by my rules I can't add them to the file. I'm hoping that at some point as I'm schlepping code from one place to another it *will* start working and hopefully then everything will fall into place.

I'm intrigued by the rx syntax for regular expressions. A little verbose but it might be better than the fairly non-standard emacs regex syntax. It's interesting to look at the actual regular expression that rx produces. For instance the keyword list that is ORed together dumps out as:

"\\(?:\\_<\\(?:False\\|None\\|True\\|a\\(?:nd\\|s\\(?:sert\\)?\\)\\|break\\|continue\\|del\\|e\\(?:l\\(?:if\\|se\\)\\|x\\(?:cept\\|ec\\)\\)\\|f\\(?:inally\\|or\\|rom\\)\\|global\\|i\\(?:mport\\|[fns]\\)\\|lambda\\|not\\|or\\|p\\(?:ass\\|rint\\)\\|r\\(?:aise\\|eturn\\)\\|self\\|try\\|w\\(?:hile\\|ith\\)\\|yield\\)\\_>\\)"

Which seems pretty bad. If you look carefully you can see that it has done some consolidating of terms. I'm not sure how useful it is to do it like that, but if nothing else the rx style is quite readable compared to the above final output.

I also removed db-python-font-lock-syntactic-keywords since adding it in didn't seem to do anything and I really want to understand what all the pieces do.

One thing I had never noticed before is that globals at the top level of the module get a special syntax coloring.

So with no further ado, here is what I have so far:: db-python-1.el


(defvar db-python-font-lock-keywords
  `(,(rx symbol-start
         ;; From v 2.4 reference.
         ;; def and class dealt with separately below
         (or "and" "assert" "break" "continue" "del" "elif" "else"
             "except" "exec" "finally" "for" "from" "global" "if"
             "import" "in" "is" "lambda" "not" "or" "pass" "print"
             "raise" "return" "try" "while" "yield"
             ;; Future keywords
             "as" "None" "with"
             ;; Not real keywords, but close enough to be fontified as such
             "self" "True" "False")
         symbol-end)
    ;; Definitions
    (,(rx symbol-start (group "class") (1+ space) (group (1+ (or word ?_))))
     (1 font-lock-keyword-face) (2 font-lock-type-face))
    (,(rx symbol-start (group "def") (1+ space) (group (1+ (or word ?_))))
     (1 font-lock-keyword-face) (2 font-lock-function-name-face))
    ;; Top-level assignments are worth highlighting.
    (,(rx line-start (group (1+ (or word ?_))) (0+ space) "=")
     (1 font-lock-variable-name-face))
    (,(rx "@" (1+ (or word ?_))) ; decorators
    (0 font-lock-preprocessor-face))))

(define-derived-mode db-python-mode fundamental-mode "(DB)Python-1"
  (set (make-local-variable 'font-lock-defaults)
       '(db-python-font-lock-keywords nil nil nil nil
                                   ;;(font-lock-syntactic-keywords
                                   ;; . db-python-font-lock-syntactic-keywords)
                                   ))
)

(provide 'db-python)