Harvard Business School of Echec

Aller au contenu | Aller au menu | Aller à la recherche

mardi, 3 avril 2007

ruby regexp

I am working on a bunch of utilities in ruby to process logs and parse hundreds of webalizer reports. I track my work with monotone which runs fine on Windows. I also use monotone to track appliances configurations.

One on my scripts analyses about one thousand of HTTP documents of ~50KiB each. ruby net/http module is perfect to retrieve them. This script is quite slow because of the regex it uses like %r|Foo.+?<B>(\d+)</B>(?:.+?Bar.+?<B>(\d+)</B>)*|m. But as the script runs every night on my desktop, I don't really care if it takes 10 seconds or 10 minutes. It is clean and easy to extend and that's what matters to me.

I have found that slow regexp is a common ruby issue : here's a simple demonstration. More details about regular expressions performance.

samedi, 17 février 2007

evolution + bogofilter

There is no standard bogofilter plugin for evolution. I am not happy with spamassassin/spamd/spamc CPU and RAM requirements. Like 'sa-learn --version' which takes 1.8s even on warm start...

So I wrote a tiny (stupid) ruby script to emulate SpamAssassin programs with bogofilter.

#!/usr/bin/env ruby

# 1 spam, 0 not-spam
def bogo_exec(mode)

  system 'bogofilter', '-l', mode

  exit case $?.exitstatus
    when 0
    1
    else
    0
  end
end

if ARGV.include?('--version') then
  print "SpamAssassin version 3.1.0-bogo\n"
  exit
end

if ARGV.include?('--spam') then
  bogo_exec '-s'
end

if ARGV.include?('--ham') then
  bogo_exec '-n'
end

if ARGV.include?('-c') or ARGV.include?('--exit-code') then
  bogo_exec '-u'
end

I then created symlinks {spamassassin, spamc, spamd, sa-learn} -> spam.rb in my PATH. Maybe it WAS incomplete and broken, but it NOW works. No more daemon. Mail retrieval is no longer CPU-bound. Evolution and I are happy.

Update

I have fixed the script. To feed spam/ham here is what i did :

  1. selected a bunch of messages in evolution and Saved As '/tmp/mails'
  2. run sa-learn --spam < /tmp/mails or sa-learn --ham < /tmp/mails