So you have a directory structure that contains all of the applications needed to run a set of servers, but no server needs the entire tree. This rsync configuration is the easiest way I can come up with to rsync only what is needed for a single server:

If /export/apps contains bin, lib, var, etc, share and sbin but you only want bin, lib, and one subdirectory of share rsynced, `hostname`-exclude-list.txt then looks like this:

+ /apps/bin/
+ /apps/lib/
+ /apps/share/vim/
- /apps/share/*
- /apps/*

Using this command, you can then sync the directories:

# rsync -avz --delete-excluded \
        --exclude-from=`hostname`-exclude-list.txt \
        /export/apps /local

With the --delete-excluded, if you change the exclude file, newly excluded entries will be removed from the destination. The copy will end up in /local/apps. More usefully, /export/apps could be on a remote server and the destination would be /export.

Advertisements

A method for randomly selecting N values from a finite (but unknown length) list of elements, without replacement. Algorithm is from Taming Uncertainty. This version takes lines of input on STDIN (or files on @ARGV) and selects 20 of them at random and prints the results to STDOUT.

#!/usr/bin/perl

use strict;
my $N = 20;
my $k;
my @r;

while(<>) {
  if(++$k <= $N) {
    push @r, $_;
  } elsif(rand(1) <= ($N/$k)) {
    $r[rand(@r)] = $_;
  }
}

print @r;

The VGT Omnivore’s Hundred

1) Copy this list into your blog or journal, including these instructions.
2) Bold all the items you’ve eaten.
3) Cross out any items that you would never consider eating.
4) Optional extra: Post a comment at www.verygoodtaste.co.uk linking to your results.

I’ve eaten 60/100.
I’ll skip 4/100.

(via. Dean Sabatino)

1. Venison
2. Nettle tea
3. Huevos rancheros
4. Steak tartare
5. Crocodile

6. Black pudding
7. Cheese fondue
8. Carp
9. Borscht
10. Baba ghanoush
11. Calamari
12. Pho
13. PB&J sandwich
14. Aloo gobi
15. Hot dog from a street cart

16. Epoisses
17. Black truffle
18. Fruit wine made from something other than grapes
19. Steamed pork buns
20. Pistachio ice cream
21. Heirloom tomatoes
22. Fresh wild berries
23. Foie gras
24. Rice and beans

25. Brawn, or head cheese
26. Raw Scotch Bonnet pepper
27. Dulce de leche
28. Oysters
29. Baklava

30. Bagna cauda
31. Wasabi peas
32. Clam chowder in a sourdough bowl
33. Salted lassi
34. Sauerkraut
35. Root beer float

36. Cognac with a fat cigar
37. Clotted cream tea
38. Vodka jelly/Jell-O
39. Gumbo
40. Oxtail
41. Curried goat

42. Whole insects
43. Phaal
44. Goat’s milk
45. Malt whisky from a bottle worth £60/$120 or more
46. Fugu
47. Chicken tikka masala
48. Eel
49. Krispy Kreme original glazed doughnut

50. Sea urchin
51. Prickly pear
52. Umeboshi
53. Abalone
54. Paneer
55. McDonald’s Big Mac Meal
56. Spaetzle
57. Dirty gin martini
58. Beer above 8% ABV

59. Poutine
60. Carob chips
61. S’mores
62. Sweetbreads
63. Kaolin
64. Currywurst
65. Durian
66. Frogs’ legs
67. Beignets, churros, elephant ears or funnel cake
68. Haggis
69. Fried plantain
70. Chitterlings, or andouillette
71. Gazpacho
72. Caviar and blini
73. Louche absinthe
74. Gjetost, or brunost
75. Roadkill
76. Baijiu
77. Hostess Fruit Pie
78. Snail
79. Lapsang souchong
80. Bellini
81. Tom yum
82. Eggs Benedict
83. Pocky

84. Tasting menu at a three-Michelin-star restaurant.
85. Kobe beef
86. Hare
87. Goulash
88. Flowers
89. Horse
90. Criollo chocolate
91. Spam
92. Soft shell crab
93. Rose harissa
94. Catfish
95. Mole poblano
96. Bagel and lox
97. Lobster Thermidor
98. Polenta
99. Jamaican Blue Mountain coffee

100. Snake

If you have a database of zipcodes and their latitudes and longitudes, you can use a version of this query to get the geographically closest zipcodes:

SELECT b.zipcode, b.city, b.state, b.latitude, b.longitude,
       ACOS(SIN(RADIANS(a.latitude))
          * SIN(RADIANS(b.latitude)) +
            COS(RADIANS(a.latitude))
          * COS(RADIANS(b.latitude))
          * COS(RADIANS(a.longitude - b.longitude))) as distance
 FROM zipcodes.zip_to_latlong a,
      zipcodes.zip_to_latlong b
WHERE a.zipcode=?
ORDER BY distance
LIMIT 20

The “distance” there is…I dunno…radians? I think the original is assuming the points are on a sphere, and converts from radians to degrees to miles using the 1.1515 statue miles per nautical mile standard.

I’m mostly a Perl guy (with secret love of Javascript), so I try to stay out of the Python stuff at dayjob where possible. But recently I’ve been taking the lead on a bunch of Memcached optimizations, which are starting to trickle over into the Python side.

A nice feature of the Perl Cache::Memcached module is the ability to define a “namespace” when you create the Memcached object:

my $memd = new Cache::Memcached (namespace => "foo_");

Then, any keys passed to the $memd object via get/set/etc. are automatically prefixed with “foo_”: $memd->get("123") actually requests the memcached key “foo_123”.

Python’s memcache module supports namespaces for the *_multi methods, but not on the individual get/set/etc calls. Also, the namespace must be passed on each call — you can’t specify it in the constructor. Well, subclassing saves the day again:

class Client(memcache.Client):
    def __init__(self, servers=None, debug=0, namespace=None):
        super(Client, self).__init__(servers, debug=debug)

        if namespace:
            self._namespace = namespace
        else:
            self._namespace=""

    # GET
    def get(self, key):
        try:
            val=self.get_multi([ key ])[key]
        except KeyError:
            val=None
        return val

    def get_multi(self, keys, key_prefix=''):
        if self._namespace: key_prefix=self._namespace + key_prefix
        return super(Client, self).get_multi(keys, key_prefix=key_prefix)

    # SET
    def set(self, key, val, time=0, min_compress_len=0):
        return self.set_multi({ key : val }, time=time, min_compress_len=min_compress_len)

    def set_multi(self, mapping, time=0, key_prefix='', min_compress_len=0):
        if self._namespace: key_prefix=self._namespace + key_prefix
        return super(Client, self).set_multi(mapping, time=time, key_prefix=key_prefix, min_compress_len=min_compress_len)

    # DELETE
    def delete(self, key, time=0):
        return self.delete_multi([key], time=time)

    def delete_multi(self, keys, seconds=0, key_prefix=''):
        if self._namespace: key_prefix=self._namespace + key_prefix
        return super(Client, self).delete_multi(keys, seconds=seconds, key_prefix=key_prefix)

    # EVERYTHING ELSE
    def add(self, key, val, time=0, min_compress_len=0):
        if self._namespace: key=self._namespace + str(key)
        super(Client, self).add(key, val, time=time, min_compress_len=min_compress_len)

    def incr(self, key, delta=1):
        if self._namespace: key=self._namespace + str(key)
        super(Client, self).incr(key, delta=delta)

    def replace(self, key, val, time=0, min_compress_len=0):
        if self._namespace: key=self._namespace + str(key)
        super(Client, self).replace(key, val, time=time, min_compress_len=min_compress_len)

    def decr(self, key, delta=1):
        if self._namespace: key=self._namespace + str(key)
        super(Client, self).decr(key, delta=delta)

The __init__ method is overridden to take an additional “namespace” parameter, which is stored in self._namespace. The get/set/delete methods all have namespace-capable *_multi versions, so for those I just pass the calls off to the appropriate one. The *_multi methods themselves are subclassed to check the self._namespace value as well as the namespace parameter, like normal. Finally, the add/incr/replace/decr methods are all modified to check the self._namespace value and prefix it to the key. Obviously, get/set/delete could have been done the same way.

Yesterday at work someone was trying to pass traditional Apache SSI directives through an XSL transformation on a Google search appliance. Long story short, they vanished: HTML comments don’t make it out of that device.

Anyway…I had a simple solution. Since we were pumping the Google results through a Perl CGI anyway, there was no reason we couldn’t just output a fake HTML tag which the CGI would then turn into an SSI comment for Apache. This was born <ssi virtual="/foo/bar.html" />.

Then a simple s/<ssi (virtual=\"[^\"]+\")><\/ssi>/<!--#include $1 -->/; in Perl will give something Apache can understand.

That solved the immediate problem, but got me thinking about emulating the full Apache mod_include set of SSI directives using the <ssi> tag. I’m thinking of something like this:

<ssi element="include" virtual="/foo/bar.html" />

<ssi element="set" var="FOO" value="BAR" />

<ssi element="if">
  <ssi_if expr="test_condition">YES!</ssi_if>
  <ssi_elif expr="test_condition">MAYBE!</ssi_elif>
  <ssi_else>NO!</ssi_else>
</ssi>

And with that format, the original version still works if you assume a missing “element” attribute implies element="include". The <!--#if --> block isn’t quite satisfying here — any text nodes inside the <ssi> block but outside the <ssi_(if|elif|else)> blocks would be ignored, but that’s no different than odd content in, say, a <table> that doesn’t actually fall into a cell.

I don’t actually have the Perl that would do the transformation, but it wouldn’t be hard. I’ll wait until someone actually needs it.

mod_perl 2 has an annoying…feature. Because the system environ struct is not thread safe, mod_perl’s perl-script handler unties the %ENV hash from the actual environment. That means, anything that uses the C getenv/setenv/unsetenv functions to read the environment will not see changes that were made to %ENV.

An obvious example is Perl’s localtime function. It actually calls the system localtime function, which uses the C getenv to check the current value of the timezone environment variable TZ. If you try to change the timezone in a mod_perl2 program by assigning to $ENV{TZ}, localtime won’t know it.

The solution is to use the Env::C module and it’s getenv/setenv/unsetenv wrappers. It works fine, but it’s a bit cumbersome. But a simple module, loaded at server-startup time, can wrap the system localtime in a function that takes care of the environment.

package Apache2::Localtime;

use Env::C;
use Exporter;
use strict;

our @ISA = qw(Exporter);
our @EXPORT = qw(localtime);

sub import {
  my $class = shift;
  $class->export('CORE::GLOBAL', 'localtime');
}

sub localtime {
  my $time = shift || time;
  return localtime($time) unless $ENV{TZ};

  my $orig_tz = Env::C::getenv('TZ');
  Env::C::setenv('TZ', $ENV{TZ}, 1);
  my(@ret, $ret);
  if(wantarray) {
    @ret = CORE::localtime($time);
  } else {
    $ret = CORE::localtime($time);
  }
  if(defined $orig_tz) {
    Env::C::setenv('TZ', $orig_tz, 1);
  } else {
    Env::C::unsetenv('TZ');
  }
  return wantarray ? @ret : $ret;
}

1;

Put that in your @INC path at Apache2/Localtime.pm and then add use Apache2::Localtime to a PerlRequire .../initialize.pl script or something similar. The new function should override the built-in localtime and keep your timeonzes in sync.

The code was mostly taken from here.