Saturday, July 10, 2010

Compilation and Compression

Quick, which is bigger in general: source code or the resulting compiled executable?

Try to answer that before reading on. If you are being nit picky consider source code that has no comments and minimal whitespace. And consider *any* sort of language compilation process: python to bytecode, c to machine code, etc.

Recently while reading "What is Thought?" by Eric Baum (a great mix of AI, computational theory, cognitive science, evolutionary theory) he stated matter-of-factly that when you compile a program it gets bigger. He was using this in analogy with how DNA is (relatively) concise but the things it builds (e.g. all the connections in the human brain) are much, much bigger (in the Shannon information sense).

That seemed completely wrong to me. Humans are inefficient and verbose. Machines are efficient and concise. Surely some optimizing compiler (or even just a regular compiler) will show my code to be a joke and transform it into a small jewel-like essence that I wasn't capable of at my high level of abstraction.

And so I started looking at examples. .py to .pyc files. c to binaries. .hs to binaries. And wouldn't you know it, more often then not the resulting "compiled" entity was larger.

I was surprised that my intuition was so far off on this. I asked around with my co-workers and more often than not they had the same faulty intuition as me. So at least I'm not alone. But it's strange that my intuition was so far off on this.

Now I think my intuition is properly adjusted and here's how I think about it now.

Humans aren't inefficient and verbose. Quite the opposite. Saying something like:
print "hello world"

as a simple python program is a very terse way of making *lots* of low level details happen. In fact all programming languages (no matter how high or low you think of them) are very abstract (ie concise) compared to what is really going on. Even assembler is a massive collection of shortcuts and abstractions compared to the actual physics going on.

So no matter what language you are working in you are essentially working with shortcuts that need to actually be fleshed out into more precise and verbosely specific instructions.

What's really funny is that from reading "What Is Thought" I actually see a deep relationship between compilation and compression that I had never considered before. A little Information Theory is a dangerous thing...

Saturday, July 3, 2010

My Plan for Programming Language World Domination

I've recently been playing around with prolog (of all things). This happened after reading in "Coders at Work" that Erlang was originally written in prolog. This blew my mind. My impression from a few weeks of prolog usage in a programming languages course and another few weeks in an AI course that it was a *cute* idea but not really anything like a general purpose language. I guess this just emphasizes the point that you don't really know enough to have a valid opinion after such a short exposure. FWIW, I've been pleasantly surprised at the feel of declarative programming.

This sort of reactivated a crazy idea I had in my youth. Back in high school I had this notion of being a polyglot. At one point I had a different language for each day of the week (French on Mondays, German on Tuesdays, etc.). Needless to say this overwhelmed me and I never really mastered any of them (though I retained an interest in linguistics).

But I still love this idea of learning lots of languages. But now it occurs to me that I *can* do this with programming languages. The main flaw (besides the audacity) of my original goal was that I didn't have access to any native speakers with which to really practice my language skills. But with programming languages you can always practice to your hearts content.

So here is my plan. I will rotate through my list of "keeper" languages as I run through my usual diet of "99 problems", "euler problems", ACM contest problems and other puzzlers I come across. After I confirm that I can comfortably solve relatively interesting (though smallish) problems in my current list of "keeper" languages I'll start work on getting another language up to speed to add to my Bat utility belt.

Part of my interest in such a plan is that I just love learning to think in new ways. OK, that's a lie. Sometimes it's interesting. Sometimes it is maddening. But in either case it feels important and useful like going for long runs and eating right. I don't love exercise, but sometimes it feels good and I'm more worried about what will happen if I *don't* do it than discomfort/nuisance of actually doing it.

My other concern is related to the fact that I quite luckily have been able to program primarily in python for the last 9 years. This has been fun, but I worry that I could be missing out on other cool emerging technologies and that while python is a decent way to express your thoughts, it is not perfect and I shouldn't stop exercising different ways to bend computers to my will.

So here is my initial list of languages that I presume that I can currently solve programming puzzles in: python, prolog and haskell. Haskell is my language of the year and prolog has been quickly ramping up again. After I verify that these guys are reliably at my fingertips the next few languages will likely be from this short list:

  • oz: Gives me an excuse to read CTM and it feels like and important laboratory for programming ideas
  • c: I feel guilty that I've lost my fluency in this. I want to get comfortable with thinking at that level again.
  • java: this is a lingua franca that I should be comfortable with (though I think it has the least to teach me). I used to be paid to do this, but I don't miss it.
  • a lisp (commonlisp/scheme/clojure/emacslisp): Do I need a reason? OK, probably scheme so that I can go through SICP all the way and be able to wield the ninja star of continuations with confidence.
  • smalltalk: Is there a more direct road to OO mastery? I've spent some time before but probably didn't give it enough of a chance. Also being different is sort of the point of this exercise.

There are many more that seem worth cultivating, but I don't dare write them down for fear of overwhelming myself too early on.

Any way, here we go again.