If you have a database of zipcodes and their latitudes and longitudes, you can use a version of this query to get the geographically closest zipcodes:

SELECT b.zipcode, b.city, b.state, b.latitude, b.longitude,
       ACOS(SIN(RADIANS(a.latitude))
          * SIN(RADIANS(b.latitude)) +
            COS(RADIANS(a.latitude))
          * COS(RADIANS(b.latitude))
          * COS(RADIANS(a.longitude - b.longitude))) as distance
 FROM zipcodes.zip_to_latlong a,
      zipcodes.zip_to_latlong b
WHERE a.zipcode=?
ORDER BY distance
LIMIT 20

The “distance” there is…I dunno…radians? I think the original is assuming the points are on a sphere, and converts from radians to degrees to miles using the 1.1515 statue miles per nautical mile standard.

I’m mostly a Perl guy (with secret love of Javascript), so I try to stay out of the Python stuff at dayjob where possible. But recently I’ve been taking the lead on a bunch of Memcached optimizations, which are starting to trickle over into the Python side.

A nice feature of the Perl Cache::Memcached module is the ability to define a “namespace” when you create the Memcached object:

my $memd = new Cache::Memcached (namespace => "foo_");

Then, any keys passed to the $memd object via get/set/etc. are automatically prefixed with “foo_”: $memd->get("123") actually requests the memcached key “foo_123”.

Python’s memcache module supports namespaces for the *_multi methods, but not on the individual get/set/etc calls. Also, the namespace must be passed on each call — you can’t specify it in the constructor. Well, subclassing saves the day again:

class Client(memcache.Client):
    def __init__(self, servers=None, debug=0, namespace=None):
        super(Client, self).__init__(servers, debug=debug)

        if namespace:
            self._namespace = namespace
        else:
            self._namespace=""

    # GET
    def get(self, key):
        try:
            val=self.get_multi([ key ])[key]
        except KeyError:
            val=None
        return val

    def get_multi(self, keys, key_prefix=''):
        if self._namespace: key_prefix=self._namespace + key_prefix
        return super(Client, self).get_multi(keys, key_prefix=key_prefix)

    # SET
    def set(self, key, val, time=0, min_compress_len=0):
        return self.set_multi({ key : val }, time=time, min_compress_len=min_compress_len)

    def set_multi(self, mapping, time=0, key_prefix='', min_compress_len=0):
        if self._namespace: key_prefix=self._namespace + key_prefix
        return super(Client, self).set_multi(mapping, time=time, key_prefix=key_prefix, min_compress_len=min_compress_len)

    # DELETE
    def delete(self, key, time=0):
        return self.delete_multi([key], time=time)

    def delete_multi(self, keys, seconds=0, key_prefix=''):
        if self._namespace: key_prefix=self._namespace + key_prefix
        return super(Client, self).delete_multi(keys, seconds=seconds, key_prefix=key_prefix)

    # EVERYTHING ELSE
    def add(self, key, val, time=0, min_compress_len=0):
        if self._namespace: key=self._namespace + str(key)
        super(Client, self).add(key, val, time=time, min_compress_len=min_compress_len)

    def incr(self, key, delta=1):
        if self._namespace: key=self._namespace + str(key)
        super(Client, self).incr(key, delta=delta)

    def replace(self, key, val, time=0, min_compress_len=0):
        if self._namespace: key=self._namespace + str(key)
        super(Client, self).replace(key, val, time=time, min_compress_len=min_compress_len)

    def decr(self, key, delta=1):
        if self._namespace: key=self._namespace + str(key)
        super(Client, self).decr(key, delta=delta)

The __init__ method is overridden to take an additional “namespace” parameter, which is stored in self._namespace. The get/set/delete methods all have namespace-capable *_multi versions, so for those I just pass the calls off to the appropriate one. The *_multi methods themselves are subclassed to check the self._namespace value as well as the namespace parameter, like normal. Finally, the add/incr/replace/decr methods are all modified to check the self._namespace value and prefix it to the key. Obviously, get/set/delete could have been done the same way.

Yesterday at work someone was trying to pass traditional Apache SSI directives through an XSL transformation on a Google search appliance. Long story short, they vanished: HTML comments don’t make it out of that device.

Anyway…I had a simple solution. Since we were pumping the Google results through a Perl CGI anyway, there was no reason we couldn’t just output a fake HTML tag which the CGI would then turn into an SSI comment for Apache. This was born <ssi virtual="/foo/bar.html" />.

Then a simple s/<ssi (virtual=\"[^\"]+\")><\/ssi>/<!--#include $1 -->/; in Perl will give something Apache can understand.

That solved the immediate problem, but got me thinking about emulating the full Apache mod_include set of SSI directives using the <ssi> tag. I’m thinking of something like this:

<ssi element="include" virtual="/foo/bar.html" />

<ssi element="set" var="FOO" value="BAR" />

<ssi element="if">
  <ssi_if expr="test_condition">YES!</ssi_if>
  <ssi_elif expr="test_condition">MAYBE!</ssi_elif>
  <ssi_else>NO!</ssi_else>
</ssi>

And with that format, the original version still works if you assume a missing “element” attribute implies element="include". The <!--#if --> block isn’t quite satisfying here — any text nodes inside the <ssi> block but outside the <ssi_(if|elif|else)> blocks would be ignored, but that’s no different than odd content in, say, a <table> that doesn’t actually fall into a cell.

I don’t actually have the Perl that would do the transformation, but it wouldn’t be hard. I’ll wait until someone actually needs it.

mod_perl 2 has an annoying…feature. Because the system environ struct is not thread safe, mod_perl’s perl-script handler unties the %ENV hash from the actual environment. That means, anything that uses the C getenv/setenv/unsetenv functions to read the environment will not see changes that were made to %ENV.

An obvious example is Perl’s localtime function. It actually calls the system localtime function, which uses the C getenv to check the current value of the timezone environment variable TZ. If you try to change the timezone in a mod_perl2 program by assigning to $ENV{TZ}, localtime won’t know it.

The solution is to use the Env::C module and it’s getenv/setenv/unsetenv wrappers. It works fine, but it’s a bit cumbersome. But a simple module, loaded at server-startup time, can wrap the system localtime in a function that takes care of the environment.

package Apache2::Localtime;

use Env::C;
use Exporter;
use strict;

our @ISA = qw(Exporter);
our @EXPORT = qw(localtime);

sub import {
  my $class = shift;
  $class->export('CORE::GLOBAL', 'localtime');
}

sub localtime {
  my $time = shift || time;
  return localtime($time) unless $ENV{TZ};

  my $orig_tz = Env::C::getenv('TZ');
  Env::C::setenv('TZ', $ENV{TZ}, 1);
  my(@ret, $ret);
  if(wantarray) {
    @ret = CORE::localtime($time);
  } else {
    $ret = CORE::localtime($time);
  }
  if(defined $orig_tz) {
    Env::C::setenv('TZ', $orig_tz, 1);
  } else {
    Env::C::unsetenv('TZ');
  }
  return wantarray ? @ret : $ret;
}

1;

Put that in your @INC path at Apache2/Localtime.pm and then add use Apache2::Localtime to a PerlRequire .../initialize.pl script or something similar. The new function should override the built-in localtime and keep your timeonzes in sync.

The code was mostly taken from here.

I’ve been using Memcached for a few weeks, trying to offload some VERY heavy database load. It’s nice and blazing fast, but the implementation is sort of clunky. If I have this simple bit of code:

$key = "foobar";
$val = calculate_val($key);

It turns into this:

$key = "foobar";
$val = $memd->get($key);
if(! defined($val)) {
  $val = calculate_val($key);
  $memd->set($key, $val);
}

Repeatedly I came back to the idea of a get_or_set method that would handle this stuff, but until the obvious solution hit me, I couldn’t get it:

$key = "foobar";
$val = $memd->get_or_set($key, sub { calculate_val($key) });

A simple closure around the actual calculation block which is then passed to the get_or_set method as a callback. If the lookup finds a value the method returns it, otherwise it returns the result of calling the callback function.

The only change to Cache::Memcached is adding the get_or_set function. The easiest way is to just subclass Cache::Memcached:

package My::Memcached
use base qw(Cache::Memcached);
sub get_or_set {
  my $self = shift;
  my($key, $callback) = @_;
  my $val = $self->get($key);
  unless(defined $val) {
    $val = &$callback;
    $self->set($key, $val);
  }
  return $val;
}

1;

Now you have this simple interface:

use My::Memcached;

my $memd = new My::Memcached { servers => [...] };
my $foo = $memd->get_or_set("bar", sub { get_val("bar") });

mod_rewrite’s RewriteCond directive supports a filesize comparison like this:

RewriteCond TestString -s

This verifies that TestString is a file with non-zero size. This patch adds the ability to compare the file’s size with an arbitrary value:

RewriteCond TestString -s>1000
RewriteCond TestString -s=1024
RewriteCond TestString -s<5000

The patch was created against Apache 2.2.8 but will probably apply against the 2.0 series as well.

Until about six months ago, I ran my own mail and web servers. It all ran on Linux with mirror hard drives and was fine except for the fan and the occasional problems with software versions and upgrades and shared libraries and spam — which is to say, it was a total pain.

When my last Linux box finally died, I bought a nice Dual Core iMac and decided, although I *could* run all that stuff on MacOS 10, I didn’t really feel like it anymore. I paid Pobox.com $62 to host my email and put a simple, static page on the Apache server that runs automatically on my iMac. But I still had a problem. I wanted some sort of site beyond what I was willing to install on my new desktop.

Web 2.0 has brought all sorts of community-enabled, RSS- and XML-accessible services. I upload photos to Flickr. I bookmark stuff using del.icio.us. I sync iTunes with Last.fm. My daily life is already web-enabled and feed-friendly. This got me thinking about why I didn’t want to put the effort into running a website for myself.

When it comes down to it, I don’t want to find a good way to upload pictures to my site because I already have a way I like: Flickr. I don’t want to create a special “link blog” sidebar on my site showing things I’ve read that other people might like, because I already have that: del.icio.us. My friends can already track all those things — if they know where to look.

Which leads me to the new Webkist.com. I still have my own domain. I want it to be a central place for finding my stuff: photos, music, whatever. But I’m not willing to manage all the tedious aspects of each datatype. Instead, I’m aggregating the data I create on *other* sites. Sites that take care of uploading and linking and categorizing and counting. I take the RSS feeds from each service, run it through a bit of XSL to create a simple block of HTML. I then combine each of those HTML files into a single page on webkist.com. It’s totally custom — I reformat the feeds however I like, and can style them simply using CSS. The result is pretty much what I would have created if I did it all from scratch, but without having to do ANY of it from scratch. I even used someone else’s XSL to start.

The last piece of the puzzle is this blog, right here. A simple, free wordpress blog, to handle whatever won’t fit in a del.icio.us bookmark description field. We’ll see if I use it.