Friday, August 1, 2008

emacs python mode from scratch: stage 2 - comments

So my next plan was to look at indenting/tabbing, but I found almost immediately that (a) it's really bizarrely complicated and (b) it depends (in some of the supporting functions) on comments being recognized correctly. So I'm forced to figure that out now. So, rather than reading the emacs manual (I mean *anyone* could do that) I used the tried and true binary code search technique and started commenting things out (starting with the original code) until font-locking for comments stopped working.

That worked like a charm. The missing piece of the puzzle was this little fella:


(defvar python-mode-syntax-table
(let ((table (make-syntax-table)))
;; Give punctuation syntax to ASCII that normally has symbol
;; syntax or has word syntax and isn't a letter.
(let ((symbol (string-to-syntax "_"))
(sst (standard-syntax-table)))
(dotimes (i 128)
(unless (= i ?_)
(if (equal symbol (aref sst i))
(modify-syntax-entry i "." table)))))
(modify-syntax-entry ?$ "." table)
(modify-syntax-entry ?% "." table)
;; exceptions
(modify-syntax-entry ?# "<" table)
(modify-syntax-entry ?\n ">" table)
(modify-syntax-entry ?' "\"" table)
(modify-syntax-entry ?` "$" table)
table))


Strangely you don't have to actually *use* this variable anywhere, it just needs to be defined. If I had to guess off the top of my head, I'd say that define-derived-mode uses this syntax-table if it's available. Otherwise it just falls back to some default behavior.

I was surprised also to see that my friends:


;; (set (make-local-variable 'parse-sexp-lookup-properties) t)
;; (set (make-local-variable 'parse-sexp-ignore-comments) t)
;; (set (make-local-variable 'comment-start) "# ")


still don't seem to be necessary. So they remain commented.

Because I'm now using the block of code above in my "from scratch" python mode, technically I'm supposed to understand it now that I've copied it in. So let's look at what this guy does.

Looking in derived.el we find this little guy:


(defsubst derived-mode-syntax-table-name (mode)
"Construct a syntax-table name based on a MODE name."
(intern (concat (symbol-name mode) "-syntax-table")))


So it's not too surprising that it's using the name python-mode-syntax-table somewhat auto-magically.

And to really seal the deal we find in the elisp manual that "<" is the syntax class for comment starter. So that's part of what's getting set when this syntax-table variable is defined.

Also of interest is that initially all symbol constituent chars (except "_") are reassigned to the punctuation character class. This makes sense since python only allows numbers/letters and _ in variable names. Everything else is a type of punctuation.

Additionally we change the syntax class of a few more characters. Notably we tell it that "'" is a quote character and "`" is a paired delimiter.

Cool. Now we have comments that get correctly colored as comments.

No comments: