code


I was recently asked about enabling mod_concat on the work servers to make it easier to combine things like .js and .css files into a single request. It works nicely:

<script src="foo.js" type="text/javascript"></script>
<script src="bar.js" type="text/javascript"></script>

becomes

<script src="./??foo.js,bar.js" type="text/javascript"></script>

This is nice and it’s reasonable with Last-Modified-Date headers, etc. But if you have some PHP files in your document tree and request something like this:

http://www.foo.com/blah/??foo.php,bar.php

It’ll expose unparsed PHP which is almost never what you want to happen.

The right solution, obviously, would be to use subrequests to get the content back from each file and then concatenate all of that. But that’s much more complicated and 90% of the time unnecessary: how often do you really want to combine the output of two dynamic URLs but can’t just do it within the application?

Anyway…the obvious answer was to just not do it. Let mod_concat return static files that would have been returned but the standard static content handler, but bail on everything else. This patch does exactly that.

Advertisements

Edward Tufte’s sparklines are a nice way to present small minimal graphs in contexts where larger, full charts are not needed or warranted. The growing support for the HTML <canvas> tag gives an opportunity to create these graphs dynamically in Javascript.

(Download yahoo-widget-sparkline-1.0b.zip.)

This widget was built against YUI version 2.7.0 but may work in other versions. It subclasses the “charts” module and is easily loaded via YUILoader. The test.html creates a auto-updating sparkline based on a simple FunctionDataSource within a paragraph of text.

Features:

  • If passed a height of “font-size” or “line-height” will cause the module to examine the CSS of the element to which it will be rendered to determine its height. This allows you to easily insert a sparkline into blocks of text without worrying about scaling the graph by hand.
  • By default it will remove all content within the element that will contain the <canvas>. This allows “alt=” type data to be displayed when <canvas> is not available.
  • Arbitrary highlighting of points with custom colors.
  • mean, median, standard deviation and arbitrary “normal” range shading.
  • Uses the Chart object’s polling setup to auto-update charts by re-polling the DataSource.

Usage:

Pull in the module:

var loader = new YAHOO.util.YUILoader();
loader.addModule({ name: 'sparklinecss', type: 'css',
                   fullpath: 'Sparkline.css'});
loader.addModule({ name: 'widget.Sparkline', type: 'js',
                   requires: ['charts', 'sparklinecss'],
                   fullpath: 'Sparkline.js'});
loader.require('widget.Sparkline', 'datasource', 'element');

Then create a sparkline:

var sl = new YAHOO.widget.Sparkline("sparkline-container",
                                    myDataSource,
                                    { height: 20, yField: "value" });
sl.subscribe("sparklineUpdate", updateHandler);

Configuration Options:

max
Force a maximum y-value. Default is the largest value in the data.
min
Force a minimum y-value. Default is the smallest value in the data.
width
Width of the graph. Default is one pixel per data point.
height
A pixel value or “font-size” or “line-height”. Default is 20 pixels.
logscale
Change the y-axis to a logarithmic scale.
clearParent
Remove any children of the node passed to render() before adding the canvas. Default true
zero, mean, median
Add horizontal lines at the appropriate places. Each can have a boolean true value or a string specifying the color for the line.
stddev
Calculate the standard deviation of the data and shade the background between +/- one standard deviation from the mean. Can be true or a color.
normal
An object with two or three fields: min, max, and (optionally) color. A background rectangle covering the range will be drawn in the specified (or default) color.
color
Graph color
highlightPoints
An array of two-field objects: x and color. x is the index of the point in data to highlight and color is the color to use. x-values of max and min highlight the maximum and minimum values. An x-value of -1 highlights the last point in the data.

To Do:

  1. Graceful failure when <canvas> is unavailable.
  2. Allow graphing of data extracted from HTML markup.

(updated 7/7/2009)

Many years ago I wrote a simple perl+mysql polling script for work. It collected votes for multiple-choice polls, stored them in a database, and returned a results page. It was so simple and easy to use, we’ve continued using it for YEARS. Every so often we ask a question that is HIGHLY controversial (I’m looking at you NOW) and we are suddenly inundated with votes, causing much of the site to slow down or return errors.

The last time this happened I made some changes to the code that made it more difficult to vote more than once. Part of that involved storing the current vote totals in Memcached instead of asking the database each time. But even with that change, we’re still running a steady 3 or 4 votes per second and would be vulnerable to legitimate spikes in traffic from on-air promotion or something similar.

So today, I’ve reworked the script to run ALL traffic through memcached and NEVER hit the database at all unless memcached is completely unavailable. The basis for this is BroddlIT’s “memcached as simple message queue” at least as far as I can figure from his rough sketch.

The general idea is, there’s a lock, a pointer to the next slot in the queue, and the queue entries themselves.

Getting the lock

This waits up to a second to get the lock by sleeping for 1/1000th of a second between attempts. You can implement multiple queues by varying the $key_prefix. If the loop exits without getting the lock, you have to decide what to do.

  my $key_prefix = "poll-queue";
  my $total_sleep = 1000000;
  while($total_sleep > 0) {
    my $lock_key = $memd->incr("$key_prefix-lock");

    if(!defined $lock_key) { # No key in cache.
      $memd->set("$key_prefix-lock", $lock_key = 1);
      $memd->set("$key_prefix-curr", 0);
      last;
    } elsif($lock_key == 1) { # We got the lock.
      last;
    }

    # Someone else has the lock, so wait 1/1000th of a second.
    $total_sleep -= usleep(1000);
  }
  if($total_sleep <= 0) {
    # You didn't get the lock. Figure something out.
  }

If you make sure you normally hold the lock for a very small amount of time, you can assume that waiting significantly longer than that without success gives you reasonable cause to take over the lock. The only trick there, as you might expect, is that you need to then release it before another process makes the same assumption.

Get Next Key Pointer

The $key_prefix-curr memcached key contains the index of the last entry in the queue. By calling the $memd->incr() method we get the next one and update the memcache at the same time. If for some reason the pointer doesn’t exist, it is created. The $key variable becomes the memcached key for this message.

  my $next_key = $memd->incr("$key_prefix-curr");
  if(!defined($next_key)) { $memd->set("$key_prefix-curr", 1); }
  my $key = sprintf("$key_prefix-key-%d", $next_key);

Store the Message in the Queue

Simple. Run this and the previous step in a loop to insert multiple values. But be careful because the lock is active, so the longer you take the longer you block other requests.

  $memd->set($key, $message);

Release the Lock

This releases the lock, which allows another process to insert its messages.

  $memd->set("$key_prefix-lock", 0);

Reading and Flushing the Queue

Obtain the lock in the same way as above, then get the $key_prefix-curr value and loop from 1. Just make sure you process the values after you release the lock so you don’t block new entries.

  my $last_key = $memd->get("$key_prefix-curr");
  for(my $i=1; $i<=$last_key; $i++) {
    my $key = sprintf("$key_prefix-key-%d", $i);
    push @messages, $memd->get($key);
  }
  $memd->set("$key_prefix-curr", 0);
  $memd->set("$key_prefix-lock", 0);

  process_messages(@messages);

So that’s it. It should require only Cache::Memcached and Time::HiRes.

Update: if you run multiple memcached daemons and connect to them all (allowing key-hash load balancing), you will probably want to set the “no_rehash” flag and add a bit more error checking. If you allow Cache::Memcached to re-hash your keys if a server connection fails, you could end up losing the various keys temporarily.

If you have a database of zipcodes and their latitudes and longitudes, you can use a version of this query to get the geographically closest zipcodes:

SELECT b.zipcode, b.city, b.state, b.latitude, b.longitude,
       ACOS(SIN(RADIANS(a.latitude))
          * SIN(RADIANS(b.latitude)) +
            COS(RADIANS(a.latitude))
          * COS(RADIANS(b.latitude))
          * COS(RADIANS(a.longitude - b.longitude))) as distance
 FROM zipcodes.zip_to_latlong a,
      zipcodes.zip_to_latlong b
WHERE a.zipcode=?
ORDER BY distance
LIMIT 20

The “distance” there is…I dunno…radians? I think the original is assuming the points are on a sphere, and converts from radians to degrees to miles using the 1.1515 statue miles per nautical mile standard.

I’m mostly a Perl guy (with secret love of Javascript), so I try to stay out of the Python stuff at dayjob where possible. But recently I’ve been taking the lead on a bunch of Memcached optimizations, which are starting to trickle over into the Python side.

A nice feature of the Perl Cache::Memcached module is the ability to define a “namespace” when you create the Memcached object:

my $memd = new Cache::Memcached (namespace => "foo_");

Then, any keys passed to the $memd object via get/set/etc. are automatically prefixed with “foo_”: $memd->get("123") actually requests the memcached key “foo_123”.

Python’s memcache module supports namespaces for the *_multi methods, but not on the individual get/set/etc calls. Also, the namespace must be passed on each call — you can’t specify it in the constructor. Well, subclassing saves the day again:

class Client(memcache.Client):
    def __init__(self, servers=None, debug=0, namespace=None):
        super(Client, self).__init__(servers, debug=debug)

        if namespace:
            self._namespace = namespace
        else:
            self._namespace=""

    # GET
    def get(self, key):
        try:
            val=self.get_multi([ key ])[key]
        except KeyError:
            val=None
        return val

    def get_multi(self, keys, key_prefix=''):
        if self._namespace: key_prefix=self._namespace + key_prefix
        return super(Client, self).get_multi(keys, key_prefix=key_prefix)

    # SET
    def set(self, key, val, time=0, min_compress_len=0):
        return self.set_multi({ key : val }, time=time, min_compress_len=min_compress_len)

    def set_multi(self, mapping, time=0, key_prefix='', min_compress_len=0):
        if self._namespace: key_prefix=self._namespace + key_prefix
        return super(Client, self).set_multi(mapping, time=time, key_prefix=key_prefix, min_compress_len=min_compress_len)

    # DELETE
    def delete(self, key, time=0):
        return self.delete_multi([key], time=time)

    def delete_multi(self, keys, seconds=0, key_prefix=''):
        if self._namespace: key_prefix=self._namespace + key_prefix
        return super(Client, self).delete_multi(keys, seconds=seconds, key_prefix=key_prefix)

    # EVERYTHING ELSE
    def add(self, key, val, time=0, min_compress_len=0):
        if self._namespace: key=self._namespace + str(key)
        super(Client, self).add(key, val, time=time, min_compress_len=min_compress_len)

    def incr(self, key, delta=1):
        if self._namespace: key=self._namespace + str(key)
        super(Client, self).incr(key, delta=delta)

    def replace(self, key, val, time=0, min_compress_len=0):
        if self._namespace: key=self._namespace + str(key)
        super(Client, self).replace(key, val, time=time, min_compress_len=min_compress_len)

    def decr(self, key, delta=1):
        if self._namespace: key=self._namespace + str(key)
        super(Client, self).decr(key, delta=delta)

The __init__ method is overridden to take an additional “namespace” parameter, which is stored in self._namespace. The get/set/delete methods all have namespace-capable *_multi versions, so for those I just pass the calls off to the appropriate one. The *_multi methods themselves are subclassed to check the self._namespace value as well as the namespace parameter, like normal. Finally, the add/incr/replace/decr methods are all modified to check the self._namespace value and prefix it to the key. Obviously, get/set/delete could have been done the same way.

mod_perl 2 has an annoying…feature. Because the system environ struct is not thread safe, mod_perl’s perl-script handler unties the %ENV hash from the actual environment. That means, anything that uses the C getenv/setenv/unsetenv functions to read the environment will not see changes that were made to %ENV.

An obvious example is Perl’s localtime function. It actually calls the system localtime function, which uses the C getenv to check the current value of the timezone environment variable TZ. If you try to change the timezone in a mod_perl2 program by assigning to $ENV{TZ}, localtime won’t know it.

The solution is to use the Env::C module and it’s getenv/setenv/unsetenv wrappers. It works fine, but it’s a bit cumbersome. But a simple module, loaded at server-startup time, can wrap the system localtime in a function that takes care of the environment.

package Apache2::Localtime;

use Env::C;
use Exporter;
use strict;

our @ISA = qw(Exporter);
our @EXPORT = qw(localtime);

sub import {
  my $class = shift;
  $class->export('CORE::GLOBAL', 'localtime');
}

sub localtime {
  my $time = shift || time;
  return localtime($time) unless $ENV{TZ};

  my $orig_tz = Env::C::getenv('TZ');
  Env::C::setenv('TZ', $ENV{TZ}, 1);
  my(@ret, $ret);
  if(wantarray) {
    @ret = CORE::localtime($time);
  } else {
    $ret = CORE::localtime($time);
  }
  if(defined $orig_tz) {
    Env::C::setenv('TZ', $orig_tz, 1);
  } else {
    Env::C::unsetenv('TZ');
  }
  return wantarray ? @ret : $ret;
}

1;

Put that in your @INC path at Apache2/Localtime.pm and then add use Apache2::Localtime to a PerlRequire .../initialize.pl script or something similar. The new function should override the built-in localtime and keep your timeonzes in sync.

The code was mostly taken from here.

I’ve been using Memcached for a few weeks, trying to offload some VERY heavy database load. It’s nice and blazing fast, but the implementation is sort of clunky. If I have this simple bit of code:

$key = "foobar";
$val = calculate_val($key);

It turns into this:

$key = "foobar";
$val = $memd->get($key);
if(! defined($val)) {
  $val = calculate_val($key);
  $memd->set($key, $val);
}

Repeatedly I came back to the idea of a get_or_set method that would handle this stuff, but until the obvious solution hit me, I couldn’t get it:

$key = "foobar";
$val = $memd->get_or_set($key, sub { calculate_val($key) });

A simple closure around the actual calculation block which is then passed to the get_or_set method as a callback. If the lookup finds a value the method returns it, otherwise it returns the result of calling the callback function.

The only change to Cache::Memcached is adding the get_or_set function. The easiest way is to just subclass Cache::Memcached:

package My::Memcached
use base qw(Cache::Memcached);
sub get_or_set {
  my $self = shift;
  my($key, $callback) = @_;
  my $val = $self->get($key);
  unless(defined $val) {
    $val = &$callback;
    $self->set($key, $val);
  }
  return $val;
}

1;

Now you have this simple interface:

use My::Memcached;

my $memd = new My::Memcached { servers => [...] };
my $foo = $memd->get_or_set("bar", sub { get_val("bar") });

Next Page »