Saturday, December 13, 2008

99 problems - python - 50

Huffman codes.

We suppose a set of symbols with their frequencies, given as a list of fr(S,F) terms. Example: [fr(a,45),fr(b,13),fr(c,12),fr(d,16),fr(e,9),fr(f,5)]. Our objective is to construct a list hc(S,C) terms, where C is the Huffman code word for the symbol S. In our example, the result could be Hs = [hc(a,'0'), hc(b,'101'), hc(c,'100'), hc(d,'111'), hc(e,'1101'), hc(f,'1100')] [hc(a,'01'),...etc.]. The task shall be performed by the predicate huffman/2 defined as follows:

# stats: [(letter, count)]
# return: [(count, letter OR (left, right)]
def make_tree(stats):
pool = sorted([(count, letter) for letter, count in stats])

while len(pool) > 1:
first = pool.pop(0)
second = pool.pop(0)

node = (first[0]+second[0], (first, second))

return pool[0]

def codes_from_tree(node, path, results):
count, letter_or_trees = node
letter = left = right = None
if type(letter_or_trees) == tuple:
left, right = letter_or_trees
letter = letter_or_trees

if letter:
results.append((letter, path))
codes_from_tree(left, path+"0", results)
codes_from_tree(right, path+"1", results)

def huffman(stats):
results = []
tree = make_tree(stats)
codes_from_tree(tree, "", results)

#to sort by huffman code value
#results.sort(key=lambda x: int(x[1], 2))

return results

That solution *feels* a little hacky for me, but I can't think of why it bothers me. It might be that the last time I wrote this was in java, and it was a lot more work. Maybe this just seems too easy.

No comments: