I think if I had to blame anyone, I'd start with Alan Kay. I recently read the following quote by him:
I think it is safe to say that most of the Squeak community is dedicated to making this Smalltalk more useful and accessible, and not devoted to making something so much better as to render Smalltalk obsolete (a fate I would dearly love to see happen).
Alan Kay wants smalltalk to go away? Well, if I'm going to use the man's language I should also have the decency to try to improve upon it and replace it.
So that was the start of a tiny little snow ball. And the snow ball grew. And now I find myself keeping notebooks full of ideas for creating a new programming language.
And that is my obsession. I've got it in my head that I could create a new language. And that this is somehow a good use of my time. So I'm constantly comparing and contrasting the various features that different languages have. Looking for a good idea to steal or a wart to avoid. I've been shopping around for language parsers and VMs to use.
And you know what? It's fun as hell. I actually don't have any illusions that I'm going to set the world on fire or that anyone but me will ever use my language. Or, let's be serious, that there is much likelihood that I will get past the vaporware stage. But I really feel like I'm seeing the landscape of languages with new eyes. Sort of like when you take a class on drawing and your brain starts learning how to do the "switch". And you can almost magically just draw things. I feel like my eyes are really seeing languages and language features for the first time.
So what is my language like? Not much. I've actually been avoiding trying to commit to any specific syntax. Which is probably going to make it lisp like if I'm not careful. (Not that there's anything wrong with that). When I start committing to various features it's been coming out something like a pythonized haskell (or a haskellized python) with strong nods to smalltalk minimalism. In a word its sort of an incoherent jumble. But I keep circling around and trying new things.
And like I said it's really fun. And I'm learning a lot. (And I've become addicted to starting sentences with "and").
I wonder if this is a common or rare affliction. I've never met anyone who said that they were trying to create their own language. Perhaps others are too smart to go down that road in the first place or too ashamed to admit they did and failed.
In any case, be prepared for the next big thing. Any decade now....
# @lines = open('filename'); # alternate universe interface lines = open("filename") # a less alternate universe interface
# open(FILEHANDLE, 'filename'); # while () { # last if /Plutonium/; # } # close FILEHANDLE; # # do something with $_; fh = open("filename") for line in fh: if "Plutonium" in line: break fh.close() # do something with line
# # alternate universe interface # @lines = open('filename'); # for (@lines) { # last if /Plutonium/; # } # # do something with $_; lines = open("filename").readlines() for line in lines: if "Plutonium" in line: break fh.close() # do something with line
# sub parse_section { # my $fh = shift; # my $title = parse_section_title($fh); # my %variables = parse_variables($fh); # return [$title, \%variables]; # } def parse_section(fh): title = parse_section_title(fh) variables = parse_variables(fh) return title, variables
# sub parse_section { # my @lines = @_; # my $title = parse_section_title(@lines); # my %variables = parse_variables(@lines); # return [$title, \%variables]; # } # In the python case we *could* easily # pop values from the list (but really you # could in perl as well) def parse_section(lines): title = parse_section_title(lines) variables = parse_variables(lines) return title, variables
# opendir D, "/tmp"; # @entries = readdir D; # In python we get a list instead of a iterator entries = os.listdir("/tmp")
# opendir D, "/tmp"; # while (my $entry = readdir D) { # # Do something with $entry # } # Python doesn't have "scalar" mode type behavior changes for entry in os.listdir("/tmp"): # Do something with $entry
# while (my $file = glob("/tmp/*.[ch]")) { # # Do something with $file # } # glob in python is also not an iterator for _file in glob.glob("/tmp/*.[ch]"): # Do something with $file
# while (my $key = each %hash) { # # Do something with $key # } # depending on version of python # hash may automagically be an iterator # (maybe all versions...) for key in hash.iterkeys(): # Do something with key
# while ("12:34:56" = ̃ m/(\d+)/g) { # # do something with $1 # } for m in re.finditer("(\d+)", "12:34:56"): # do something with m (where m is a "match" object)
### 4.2 Homemade Iterators
# sub dir_walk { # my ($dir, $filefunc, $dirfunc, $user) = @_; # my $iterator = make_iterator($dir); # while (my $filename = NEXTVAL($iterator)) { # if (-f $filename) { $filefunc->($filename, $user) } # else { $dirfunc->($filename, $user) } # } # } # In python os.walk returns an iterator as is def dir_walk(dir, filefunc, dirfunc, user): iterator = make_iterator(dir) for filename in iterator: if os.path.isfile(filename): filefunc(filename, user) else: dirfunc(filename, user)
# sub upto { # my ($m, $n) = @_; # return sub { # return $m <= $n ? $m++ : undef; # }; # } # my $it = upto(3, 5); def upto(m,n): _i = [m] def foo(): val = _i[0] _i[0] += 1 if val > n: return None return val
return foo
it = upto(3,5) ## of course in python it's more natural to do the following: # def upto(m,n): # for x in range(m,n+1): # yield x
# my $nextval = $it->(); nextval = it()
# while (defined(my $val = $it->())) { # # now do something with $val, such as: # print "$val\n"; # } # this doesn't translate in a pretty way to python # since we can't have statements in a while # context val = it() while val != None: # now do something with val, such as: print val val = it() # but of course we'd just use: for val in it: print val
# for my $val (1 .. 10000000) { # # now do something with $val # } for val in range(1, 10000000): # now do something with val
# while (defined(my $val = NEXTVAL($it))) { # # now do something with $val # }
# No need to do these machinations since this is already built into # python except we'd do this with "for" for val in it: # not do something with val
# sub upto { # my ($m, $n) = @_; # return Iterator { # return $m <= $n ? $m++ : undef; # }; # } # sub Iterator (&) { return $_[0] } # in python we just do this with a yield def upto(m, n): i = m while i <= n: yield i i += 1
# # iterator version # sub dir_walk { # my @queue = shift; # return Iterator { # while (@queue) { # my $file = shift @queue; # if (-d $file) { # opendir my $dh, $file or next; # my @newfiles = grep {$_ ne "." && $_ ne ".."} readdir $dh; # push @queue, map "$file/$_", @newfiles; # } # return $file; # } else { # return; # } # }; # } def dir_walk(root): queue = [root] while queue: _file = queue.pop(0) if os.path.isdir(_file): for newfile in os.listdir(_file): queue.append(os.path.join(_file, newfile)) yield _file
# sub dir_walk { # my ($top, $code) = @_; # my $DIR; # $code->($top); # if (-d $top) { # my $file; # unless (opendir $DIR, $top) { # warn "Couldn’t open directory $top: $!; skipping.\n"; # return; # } # while ($file = readdir $DIR) { # next if $file eq '.'|| $file eq '..' # dir_walk("$top/$file", $code); # } # } # } def dir_walk(top, code): code(top) if os.path.isdir(top): try: for _file in os.listdir(top): dir_walk(os.path.join(top,_file), code) except StandardError, why: print "Couldn't open directory %s: %s" % (top, why) return
### 4.3 Examples
# sub interesting_files { # my $is_interesting = shift; # my @queue = @_; # return Iterator { # while (@queue) { # my $file = shift @queue; # if (-d $file) { # opendir my $dh, $file or next; # my @newfiles = grep {$_ ne "." && $_ ne ".."} readdir $dh; # push @queue, map "$file/$_", @newfiles; # } # return $file if $is_interesting->($file); # } # return; # }; # } def interesting_files(is_interesting, *top_dirs): queue = list(top_dirs)
while queue: _file = queue.pop(0) if os.path.isdir(_file): for newfile in os.listdir(_file): queue.append(os.path.join(_file, newfile)) if is_interesting(_file): yield _file
# # Files are deemed to be interesting if they mention octopuses # sub contains_octopuses { # my $file = shift; # return unless -T $file && open my($fh), "<", $file; # while (<$fh>) { # return 1 if /octopus/i; # } # return; # } # my $octopus_file = # interesting_files(\&contains_octopuses, 'uploads', 'downloads'); # while ($file = NEXTVAL($octopus_file)) { # # do something with the file # } # if (NEXTVAL($next_octopus)) { # # yes, there is an interesting file # } else { # # no, there isn’t. # } # undef $next_octopus; def contains_octopuses(_file): if not os.path.isfile(_file): return False for line in file(_file): if "octopus" in line: return True return False octopus_file = interesting_files(contains_octopuses, "uploads", "downloads") for _octopus_file in octopus_file: # do something with the file try: next_octopus.next() # yes, there is an interesting file except StopIteration: # no there isn't del next_octopus
# sub permute { # my @items = @{ $_[0] }; # my @perms = @{ $_[1] }; # unless (@items) { # print "@perms\n"; # } else { # my(@newitems,@newperms,$i); # foreach $i (0 .. $#items) { # @newitems = @items; # @newperms = @perms; # unshift(@newperms, splice(@newitems, $i, 1)); # permute([@newitems], [@newperms]); # } # } # } # # sample call: # permute([qw(red yellow blue green)], []); # I suspect I don't have this quite right # it produces the permuations but doesn't # have the problem of waiting for the end to # start showing the permutations def permute(items, perms): if not items: print perms else: for i in range(len(items)): newitems = items[:] newitem = newitems.pop(i) newperms = [newperm+[newitem] for newperm in perms] or [[newitem]] permute(newitems, newperms) # sample call permute(["red", "yello", "blue", "green"], [])
# my $it = permute('A'..'D'); # while (my @p = NEXTVAL($it)) { # print "@p\n"; # } it = permute(["A","B","C","D"]) for p in it: print p
# sub permute { # my @items = @_; # my @pattern = (0) x @items; # return Iterator { # return unless @pattern; # my @result = pattern_to_permutation(\@pattern, \@items); # @pattern = increment_pattern(@pattern); # return @result; # }; # } def permute(items): pattern = [0] * len(items)
while pattern: result = pattern_to_permutation(pattern, items) pattern = increment_pattern(pattern) yield result
# sub pattern_to_permutation { # my $pattern = shift; # my @items = @{shift()}; # my @r; # for (@$pattern) { # push @r, splice(@items, $_, 1); # } # @r; # } def pattern_to_permutation(pattern, items): items = items[:] r = [] for _x in pattern: r.append(items.pop(_x)) return r
# sub increment_odometer { # my @odometer = @_; # my $wheel = $#odometer; # start at rightmost wheel # until ($odometer[$wheel] < 9 || $wheel < 0) { # $odometer[$wheel] = 0; # $wheel--; # next wheel to the left # } # if ($wheel < 0) { # return; # fell off the left end; no more sequences # } else { # $odometer[$wheel]++; # this wheel now turns one notch # return @odometer; # } # } def increment_odometer(odometer): wheel = len(odometer) - 1 while not (odometer[wheel] < 9 or wheel < 0): odometer[wheel] = 0 wheel -= 1 if wheel < 0: return else: odometer[wheel] += 1 return odometer
# sub increment_pattern { # my @odometer = @_; # my $wheel = $#odometer; # start at rightmost wheel # until ($odometer[$wheel] < $#odometer-$wheel || $wheel < 0) { # $odometer[$wheel] = 0; # $wheel--; # next wheel to the left # } # if ($wheel < 0) { # return; # fell off the left end; no more sequences # } else { # $odometer[$wheel]++; # this wheel now turns one notch # return @odometer; # } # } def increment_pattern(odometer): wheel = len(odometer) - 1 while not (odometer[wheel] < (len(odometer)-1-wheel) or wheel < 0): odometer[wheel] = 0 wheel -= 1 if wheel < 0: return else: odometer[wheel] += 1 return odometer
# sub n_to_pat { # my @odometer; # my ($n, $length) = @_; # for my $i (1 .. $length) { # unshift @odometer, $n % $i; # $n = int($n/$i); # } # return $n ? () : @odometer; # } def n_to_pat(n, length): odometer = [] for i in range(1, length+1): odometer.insert(0, n % i) n = n / i return not n and odometer or []
# sub permute { # my @items = @_; # my $n = 0; # return Iterator { # my @pattern = n_to_pat($n, scalar(@items)); # my @result = pattern_to_permutation(\@pattern, \@items); # $n++; # return @result; # }; # } def permute(items): n = 0 while 1: pattern = n_to_pat(n, len(items)) if not pattern: break result = pattern_to_permutation(pattern, items) yield result n += 1
# sub iterate_function { # my $n = 0; # my $f = shift; # return Iterator { # return $f->($n++); # }; # } def iterate_function(f): n = 0 while 1: yield f(n) n += 1
# sub permute { # my @items = @_; # my $n = 0; # return Iterator { # $n++, return @items if $n==0; # my $i; # my $p = $n; # for ($i=1; $i<=@items && $p%$i==0; $i++) { # $p /= $i; # } # my $d = $p % $i; # my $j = @items - $i; # return if $j < 0; # @items[$j+1..$#items] = reverse @items[$j+1..$#items]; # @items[$j,$j+$d] = @items[$j+$d,$j]; # $n++; # return @items; # }; # } def permute(_items): n = 0 items = _items[:]
if n == 0: yield items
n += 1 while 1: # make a copy so list(permute(my_list)) returns n copies of same item # otherwise can remove items = items[:] i = 1 p = n while i <= len(items)+1 and p % i == 0: p /= i i += 1 d = p % i j = len(items) - i
if j < 0: return
items[j+1:len(items)] = reversed(items[j+1:len(items)]) x,y = items[j+d], items[j] items[j] = x items[j+d] = y n += 1
for i in range(len(tokens))[1::2]: tokens[i] = [0] + list(tokens[i])
FINISHED = False while not FINISHED: finished_incrementing = False result = "" for token in tokens: if token.__class__ is str: result += token else: n, c = token[0], token[1:] result += c[n] if not finished_incrementing: if n == len(c) - 1: token[0] = 0 else: token[0] += 1 finished_incrementing = True if not finished_incrementing: FINISHED = True yield result
# %n_expand = qw(N ACGT # B CGT D AGT H ACT V ACG # K GT M AC R AG S CG W AT Y CT); # sub make_dna_sequences { # my $pat = shift; # for my $abbrev (keys %n_expand) { # $pat =~ s/$abbrev/($n_expand{$abbrev})/g; # } # return make_genes($pat); # } n_expand = {"N" : "ACGT", "B" : "CGT", "D" : "AGT", "H" : "ACT", "V" : "ACG", "K" : "GT", "M" : "AC", "R" : "AG", "S" : "CG", "W" : "AT", "Y" : "CT"} def make_dna_sequences(pat): for abbrev in n_expand: pat = re.sub(abbrev, n_expand[abbrev], pat)
return make_genes(pat)
# sub filehandle_iterator { # my $fh = shift; # return Iterator { <$fh> }; # }
# my $it = filehandle_iterator(*STDIN); # while (defined(my $line = NEXTVAL($it))) { # # do something with $line # } ### python already does this by default for line in file("foo"): # do something with line
# package FlatDB; # my $FIELDSEP = qr/:/; # sub new { # my $class = shift; # my $file = shift; # open my $fh, "<", $file or return; # chomp(my $schema = <$fh>);
def callbackquery(self, is_interesting): fh = self.fh fh.seek(0) fh.readline() # discard schema line for line in fh: line = line.strip() fieldnames = self.field fields = line.split(FlatDB.FIELDSEP) F = dict(zip(fieldnames, fields)) if is_interesting(F): yield line
# use FlatDB; # my $dbh = FlatDB->new('db.txt') or die $!; # my $q1 = $dbh->query('STATE', 'MA'); # my $q2 = $dbh->query('STATE', 'NY'); # for (1..2) { # print NEXTVAL($q1), NEXTVAL($q2); # } dbh = FlatDB("db.txt") q1 = dbh.query("STATE","MA") q2 = dbh.query("STATE","NY") for x in range(1,3): print q1.next(), q2.next()
# # usage: $dbh->query(fieldname, value) # # returns all records for which (fieldname) matches (value) # use Fcntl ':seek'; # sub query { # my $self = shift; # my ($field, $value) = @_; # my $fieldnum = $self->{FIELDNUM}{uc $field}; # return unless defined $fieldnum; # my $fh = $self->{FH}; # seek $fh, 0, SEEK_SET; # <$fh>; # discard header line # my $position = tell $fh; # return Iterator { # local $_; # seek $fh, $position, SEEK_SET; # while (<$fh>) { # chomp; # $position = tell $fh; # my @fields = split $self->{FIELDSEP}; # my $fieldval = $fields[$fieldnum]; # return $_ if $fieldval eq $value; # } # return; # }; # } # # callbackquery with bug fix # use Fcntl ':seek'; # sub callbackquery { # my $self = shift; # my $is_interesting = shift; # my $fh = $self->{FH}; # seek $fh, 0, SEEK_SET; # <$fh>; # discard header line # my $position = tell $fh; # return Iterator { # local $_; # seek $fh, $position, SEEK_SET; # while (<$fh>) { # $position = tell $fh; # my %F; # my @fieldnames = @{$self->{FIELDS}}; # my @fields = split $self->{FIELDSEP}; # for (0 .. $#fieldnames) { # $F{$fieldnames[$_]} = $fields[$_]; # } # return $_ if $is_interesting->(%F);
# } # return; # }; # } # 1;
class FlatDB(object): FIELDSEP = ":"
def __init__(self, _file): self._file = _file self.fh = file(self._file) self.schema = self.fh.readline().strip() self.field = self.schema.split(FlatDB.FIELDSEP) self.fieldnum = dict(zip([x.upper() for x in self.field], range(len(self.field))))
def query(self, field, value): fieldnum = self.fieldnum.get(field.upper()) if fieldnum == None: return fh = self.fh fh.seek(0) fh.readline() # discard schema line while 1: line = fh.readline() if not line: break position = fh.tell() fields = line.split(FlatDB.FIELDSEP) fieldval = fields[fieldnum] if fieldval == value: yield line.strip() fh.seek(position)
def callbackquery(self, is_interesting): fh = self.fh fh.seek(0) fh.readline() # discard schema line while 1: line = fh.readline() if not line: break position = fh.tell() line = line.strip() fieldnames = self.field fields = line.split(FlatDB.FIELDSEP) F = dict(zip(fieldnames, fields)) if is_interesting(F): yield line fh.seek(position)
# package FlatDB::Iterator; # my $FIELDSEP = qr/\s+/; # sub new { # my $class = shift; # my $it = shift; # my @field = @_; # my %fieldnum = map { uc $field[$_] => $_ } (0..$#field); # bless { FH => $it, FIELDS => \@field, FIELDNUM => \%fieldnum, class IterFlatDB(object): FIELDSEP = "\s+"
def __init__(self, it, *field): self.it = it self.field = field self.fieldnum = dict(zip([x.upper() for x in self.field], range(len(self.field))))
# sub SRand { # $seed = shift; # } def SRand(_seed): global seed seed = _seed
# SRand($$); SRand(os.getpid())
# use CGI::Push; # my $seed = shift || $$ ; # srand($seed); # open LOG, "> $logfile" or die ... ; # print LOG "Random seed: $seed\n"; # do_push(...); if len(sys.argv > 1): seed = int(sys.argv[1]) else: seed = os.getpid() srand(seed) LOG = open(logfile, "w") LOG.write("Random seed: " + str(seed)) do_push(...)
# use Foo; # while (<>) { # my $random = Rand(); # # do something with $random # foo(); # } import Foo for line in sys.stdin: random = Rand() # do something with random Foo.foo()
# use Foo; # my $rng = make_rand(); # while (<>) { # my $random = NEXTVAL($rng); # # do something with $random # foo(); # } import Foo rng = make_rand() for line in sys.stdin: random = rng.next() # do something with randome Foo.foo()
### 4.4 Filters and Transforms
# sub imap { # my ($transform, $it) = @_; # return Iterator { # my $next = NEXTVAL($it); # return unless defined $next; # return $transform->($next); # } # }
# itertools.imap does this already def imap(transform, it): for next in it: yield transform(next)
# sub imap (&$) { # my ($transform, $it) = @_; # return Iterator { # my $next = NEXTVAL($it); # return unless defined $next; # return $transform->($next); # } # }
# my $rng = imap { $_[0] / 37268 } make_rand();
# sub imap (&$) { # my ($transform, $it) = @_; # return Iterator { # local $_ = NEXTVAL($it); # return unless defined $_; # return $transform->(); # } # }
# these are irrelevant changes for python
# sub igrep (&$) { # my ($is_interesting, $it) = @_; # return Iterator { # local $_; # while (defined ($_ = NEXTVAL($it))) { # return $_ if $is_interesting->(); # } # return; # } # } def igrep(is_interesting, it): for x in it: if is_interesting(x): yield x
# # instead of my $next_octopus = # # interesting_files(\&contains_octopuses, 'uploads', 'downloads' ; # ) # my $next_octopus = igrep { contains_octopuses($_) } # dir_walk('uploads', 'downloads'); # while ($file = NEXTVAL($next_octopus)) { # # do something with the file # } for _file in igrep(contains_octopuses, dir_walk("uploads", "downloads")): # do something with the file
# sub list_iterator { # my @items = @_; # return Iterator { # return shift @items; # }; # } def list_iterator(*args): for x in args: yield x # or just iter(args)
# sub append { # my @its = @_; # return Iterator { # while (@its) { # my $val = NEXTVAL($its[0]); # return $val if defined $val; # shift @its; # Discard exhausted iterator # } # return; # }; def append(its): for it in its: for x in it: yield x # or just itertools.chain(*its)
### 4.5 The Semipredicate Problem
# this whole section is irrelevant due to how # python uses iterators/generators # so i skipped it. it someone sees something # in here that deserves a python translation # let me know
### 4.6 Alternative Interfaces to Iterators
# sub equal_arrays (\@\@) { # my ($x, $y) = @_; # return unless @$x == @$y; # arrays are the same length? # for my $i (0 .. $#$x) { # return unless $x->[$i] eq $y->[$i]; # mismatched elements # } # return 1; # arrays are equal # } def equal_arrays(x,y): if len(x) != len(y): return False for i in range(len(x)): if x[i] != y[i]: return False return True # but this is unnecessary since we can already do x == y # in place
xy = each_array(x,y) for xe,ye in xy: if xe != ye: return False return True
# sub each_array { # my @arrays = @_; # my $cur_elt = 0; # my $max_size = 0; # # Get the length of the longest input array # for (@arrays) { # $max_size = @$_ if @$_ > $max_size; # } # return Iterator { # $cur_elt = 0, return () if $cur_elt >= $max_size; # my $i = $cur_elt++; # return map $_->[$i], @arrays; # }; # } def each_array(*arrays): max_size = max(*[len(ar) for ar in arrays])
def get_item(ar, i): if i < len(ar): return ar[i] return None
for i in range(max_size): yield [get_item(ar, i) for ar in arrays] # you could also probably do something clever with itertools.izip()
# my $buttons = each_array(\@labels, \@values); # ... # while (my ($label, $value) = NEXTVAL($buttons)) { # print HTML qq{ $label \n}; # } buttons = each_array(labels, values) for label, value in buttons: HTML.write(" %(label)s \n" % locals())
# sub each_array { # my @arrays = @_; # my $stop_type = ref $arrays[0] ? 'maximum' : shift @arrays; # my $stop_size = @{$arrays[0]}; # my $cur_elt = 0; # # Get the length of the longest (or shortest) input array # if ($stop_type eq 'maximum') { # for (@arrays) { # $stop_size = @$_ if @$_ > $stop_size; # } # } elsif ($stop_type eq 'minimum') { # for (@arrays) { # $stop_size = @$_ if @$_ < $stop_size; # } # } else { # croak "each_array: unknown stopping behavior '$stop_type'"; # } # return Iterator { # return () if $cur_elt >= $stop_size; # my $i = $cur_elt++; # return map $_->[$i], @arrays; # }; # } def each_array(arrays, stop_type="maximum"): assert stop_type in ("minimum", "maximum")
if stop_type == "minimum": stop_size = min(*[len(ar) for ar in arrays]) else: stop_size = max(*[len(ar) for ar in arrays])
def get_item(ar, i): if i < len(ar): return ar[i] return None
for i in range(stop_size): yield [get_item(ar, i) for ar in arrays]
# sub eachlike (&$) { # my ($transform, $it) = @_; # return Iterator { # local $_ = NEXTVAL($it); # return unless defined $_; # my $value = $transform->(); # return wantarray ? ($_, $value) : $value; # } # } # not sure if wantarray really maps to python # style
# package CIA; # sub TIESCALAR { # my $package = shift; # my $self = {}; # bless $self => $package; # } # sub STORE { } # sub FETCH { "<>" }
# tie $secret, 'CIA';
# $secret = 'atomic ray';
# print "The secret weapon is '$secret'.\n"
# the secret weapon is '<>'.
# I can't think of any reasonable way to do # this in python. In part it seems like something # you could handle with descriptor and in part # with "with". I'm just going to ignore TIE-ing # for now
### 4.7 An Extended Example: Web Spiders
# use HTML::LinkExtor; # use LWP::Simple; # sub traverse { # my @queue = @_; # my %seen; # return Iterator { # while (@queue) { # my $url = shift @queue; # $url =~ s/#.*$//; # next if $seen{$url}++; # my ($content_type) = head($url); # if ($content_type =~ m{ˆtext/html\b}) { # my $html = get($url); # push @queue, get_links($url, $html); # } # return $url; # } # return; # exhausted # } # } import urllib2 def traverse(_queue): queue = _queue[:] seen = {} while queue: url = queue.pop(0) url = url.split("#")[0] seen.setdefault(url,0) if seen[url] > 0: continue seen[url] += 1 try: page = urllib2.urlopen(url) except urllib2.HTTPError: print "http error for:", url continue content_type = page.headers.getheader("content-type") if re.search(r"^text/html\b", content_type): html = page.read() queue.extend(get_links(url, html)) yield url
# sub get_links { # my ($base, $html) = @_; # my @links; # my $more_links = sub { # my ($tag, %attrs) = @_; # push @links, values %attrs; # }; # HTML::LinkExtor->new($more_links, $base)->parse($html); # return @links; # } # Off the top of my head I don't know a python library # that provides this exact functionality, so we # fake it. def get_links(base, html): links = []
parsed = urlparse.urlparse(base)
for anchor in BeautifulSoup.BeautifulSoup(html)('a'): link = anchor.get("href") if not link: continue
if link.startswith("./"): link = link[2:]
if link.startswith("http"): links.append(link) elif link.startswith("/"): links.append(parsed[0]+"://"+parsed[1]+link) else: links.append(parsed[0]+"://"+parsed[1]+parsed[2]+link)
return links
# # Version with 'interesting links' callback # sub traverse { # my $interesting_links = sub { @_ }; # $interesting_links = shift if ref $_[0] eq 'CODE'; # ... # push @queue, $interesting_links->(get_links($url, $html)); # ... # } def traverse(queue, interesting_links=None): ... queue.extend(interesting_links(get_links(url, html))) ...
# my $top = 'http://perl.plover.com/'; # my $interesting = sub { grep /ˆ\Q$top/o, @_ }; # my $urls = traverse($interesting, $top); top = "http://perl.plover.com" interesting = lambda x: top in x urls = traverse(interesting, top)
# use File::Basename; # while (my $url = NEXTVAL($urls)) { # my $file = $url; # $file =~ s/ˆ\Q$top//o; # my $dir = dirname($file); # system('mkdir', '-p', $dir) == 0 or next; # open F, ">", $file or next; # print F get($url); # } for url in urls: _file = url.replace(url, "") _dir = os.path.dirname(_file) if os.system("mkdir -p %s" % _dir) != 0: continue try: F = open(_file, "w") else: continue F.write(urllib2.urlopen(url).read())
# while (my $url = NEXTVAL($urls)) { # print "Bad link to: $url" unless head($url); # } for url in urls: try: urllib2.urlopen(url) except: print "Bad link to: %s" % url
# sub traverse { # ... # my (%head, $html); # @head{qw(TYPE LENGTH LAST_MODIFIED EXPIRES SERVER)} = head($url); # if ($head{TYPE} = ̃ m{ˆtext/html\b}) { # $html = get($url); # push @queue, $interesting_links->(get_links($url,$html)); # } # return wantarray ? ($url, \%head, $html) : $url; # ... # } # I don't think this is a straight forward way to duplicate # "wantarray" type functionality in python. In any case # it would be more uniform to *always* retrn the tuple
# sub traverse { # my $interesting_links = sub { shift; @_ }; # $interesting_links = shift if ref $_[0] eq 'CODE'; # my @queue = map [$_, 'supplied by user'], @_; # my %seen; # return Iterator { # while (@queue) { # my ($url, $referrer) = @{shift @queue}; # $url =~ s/#.*$//; # next if $seen{$url}++; # my (%head, $html); # @head{qw(TYPE LENGTH LAST_MODIFIED EXPIRES SERVER)} = head($url); # if ($head{TYPE} =~ m{ˆtext/html\b}) { # my $html = get($url); # push @queue, # map [$_, $url], # $interesting_links->($url, get_links($url, $html)); # } # return wantarray ? ($url, \%head, $referrer, $html) : $url; # } # return; #exhausted # } # } import urllib2 def traverse(queue, interesting_links=None): queue = [(x, "supplied by user") for x in queue]
if interesting_links == None: def interesting_links(this_url, other_urls): return other_urls
seen = {}
while queue: url, referrer = queue.pop(0) url = url.split("#")[0] seen.setdefault(url,0) if seen[url] > 0: continue seen[url] += 1 try: page = urllib2.urlopen(url) except urllib2.HTTPError: print "http error for:", url yield url, None, referrer, None continue content_type = page.headers.getheader("content-type") if re.search(r"^text/html\b", content_type): html = page.read() queue.extend([(x, url) for x in interesting_links(url, get_links(url, html))]) yield url, page.headers, referrer, html
# my $top = 'http://perl.plover.com/' # my $interesting = sub { shift; grep /ˆ\Q$top/o, @_ }; # my $urls = traverse($interesting, $top); # while (my ($url, $head, $referrer) = NEXTVAL($urls)) { # next if $head->{TYPE}; # print "Page '$referrer' has a bad link to '$url'\n"; # } top = "http://perl.plover.com" interesting = (lambda x,y: [_y for _y in y if top in _y]) urls = traverse([top], interesting) for url, head, referrer, html in urls: if not html: continue print "Page '%s' has a bad link to '%s'" % (referrer, url)
# my $top = 'http://perl.plover.com/'; # my $interesting = sub { shift; grep /ˆ\Q$top/o, @_ }; # my $urls = igrep_l { not $_[1]{TYPE} } traverse($interesting, $top); # while (my ($url, $head, $referrer) = NEXTVAL($urls)) { # print "Page '$referrer' has a bad link to '$url'\n"; # } top = "http://perl.plover.com" interesting = (lambda x,y: [_y for _y in y if top in _y]) urls = igrep_l((lambda url, head, referrer, html: not html), traverse([top], interesting)) for url, head, referrer, html in urls: if not html: continue print "Page '%s' has a bad link to '%s'" % (referrer, url)
# sub igrep_l (&$) { # my ($is_interesting, $it) = @_; # return Iterator { # while (my @vals = NEXTVAL($it)) { # return @vals if $is_interesting->(@vals); # } # return; # } # } def igrep_l(is_interesting, it): for vals in it: if is_interesting(*vals): yield vals
# while (my ($url, $head, $referrer) = NEXTVAL($urls)) { # print "Page '$referrer' has a bad link to '$url'\n"; # print "Edit now? "; # my $resp = <>; # if ($resp =~ /ˆy/i) { # system $ENV{EDITOR}, url_to_filename($referrer); # } elsif ($resp =~ /∧ q/i) { # last; # } # } for url, head, referrer, html in urls: print "Page '%(referrer)s' has a bad line to '%(url)s'" % locals() print "Edit now?" resp = raw_input(): if resp == 'y': os.system(os.environ["EDITOR"] + " " + url_to_filename(referrer)) elif resp == 'q': break
# sub traverse { # my $interesting_link; # $interesting_link = shift if ref $_[0] eq 'CODE'; # my @queue = map [$_, 'supplied by user'], @_; # my %seen; # my $q_it = igrep { ! $seen{$_->[0]}++ } # imap { $_->[0] =~ s/#.*$//; $_} # Iterator { return shift(@queue) }; # if ($interesting_link) { # $q_it = igrep {$interesting_link->(@$_)} $q_it; # } # return imap { # my ($url, $referrer) = @$_; # my (%head, $html); # @head{qw(TYPE LENGTH LAST_MODIFIED EXPIRES SERVER)} = head($url); # if ($head{TYPE} =~ m{ˆtext/html\b}) { # $html = get($url); # push @queue, # map [$_, $url], # get_links($url, $html); # } # return wantarray ? ($url, \%head, $referrer, $html) : $url; # } $q_it; # } # this is not an exact match but is close enough for our # purposes. what ever that purpose could be. def traverse(queue, interesting_link=None): seen = {} queue = [(x, "supplied by user") for x in queue]
def iterate_queue(): while queue: yield queue.pop(0)
content_type = page.headers.getheader("content-type") if re.search(r"^text/html\b", content_type): html = page.read() queue.extend([(x, url) for x in get_links(url, html)]) return url, page.headers, referrer, html
return imap(process_url, q_it)
# sub make_robot_filter { # my $agent = shift; # my %seen_site; # my $rules = WWW::RobotRules->new($agent); # return sub { # my $url = url(shift()); # return 1 unless $url->scheme eq 'http'; # unless ($seen_site{$url->netloc}++) { # my $robots = $url->clone; # $robots->path('/robots.txt'); # $robots->frag(undef); # $rules->parse($robots, get($robots)); # } # $rules->allowed($url) # }; # } def make_robot_filter(agent): seen_site = {}
rules = {} #robotparser.RobotFileParser()
def _filter(url): u = urlparse.urlparse(url) if u.scheme != "http": return True
if u.netloc not in rules: rules[u.netloc] = robotparser.RobotFileParser() rules[u.netloc].set_url(u.scheme+"://"+u.netloc+"/robots.txt") rules[u.netloc].read()
As I've mentioned, I'm working on smalltalk as my learn a language a year language. So where am I after a month?
My initial idea was to ease into things using etoys as a gateway drug. But after a week or so I decided that, while it's oddly fascinating and kinda fun, it's not really smalltalk per se. So while I'll probably dabble a little bit, I've decided that I need to work with more mainstream smalltalk learning.
In that vein, I'm reading Squeak By Example and am about 1/3 of the way through. It's a very nice no nonsense introduction to smalltalk the language and squeak the environment.
After that I'm thinking I'll either look at the Squeak Development Example for Squeak 3.9 tutorial or work on a Seaside Tutorial. I'm sort of leaning towards the latter, but I'm sure I'll eventually do both, but for some reason I'm more drawn to the web development aspect of things these days. And how cool are you if you use continuations? (Pretty cool, I'd wager)
I have to confess that already in the first month I've had to fight off the the urge just to ditch this project. I'm a little embarrassed to admit that I'm not immediately overwhelmed with a love for smalltalk. And logically that's not really so unexpected. Learning a language is *hard*. And in the initial phases you basically see everything with your blub colored glasses and in the new language you see blub features but they are distorted and some blub features are completely missing or almost too awkward to be usable. You may dimly see some features that are interesting but they are obscured by an alien syntax and semantics.
And that's where I am now with smalltalk. There are somethings that seem interesting (elegant metaclass programming features, turtles all the way down, highly integrated development environment, etc) but I've never used these features "in anger". So at best they just seem kinda interesting. On the other hand the lack of modules feels archaic, the default user interface look-and-feel seems oddly clunky, the lack of list access syntax (e.g. foo[3:5]) makes me sad, and I miss emacs.
So I'm in the no man's land right now. I see things faintly off in the distance that seem interesting but everything in my reach is (seemingly) inferior and awkward.
I guess it helps me to lay it out like this. I'm surprised (even though I should know myself better by now) at how easily I could just abandon things and jump over to haskell for a while. Oooh, and then lisp is kinda cool, and then, oh yeah, I heard javascript is the NBL, I better look at that for a few minutes today.
I wonder if it's harder to fall in love with another language when your day job is python. But I'm a little afraid that python has become my blub and I must fight the tendencies of a blub programmer to be blind to non-blubby goodness.
Ok, enough dithering. I'm putting my blinders on again and focusing on smalltalk. I actually hope (and somewhat expect) that I will get over the oddness hurdle and really love smalltalk.
But it's not love at first sight. It's more like cautious optimism at first sight.