Recursiveness

One of the big things I learned at University was that while "Recursion is a Wonderful Thing" (Thank you, Dr. Roelants), sometimes the performance can really hurt. Those times, it can pay to spend the effort turning that recursive function into a simple loop. Sure, it might not be as clean, or as elegant, or as natural to understand, but we're looking at performance here, right?

Ryan Davis recently posted about using RubyInline to optimize a recursive factorial method. He ended with a caveat that sometimes you need to look at other things than just moving the code into C for speed. His idea was to cache the data as it goes along. There are times when that won't help you in the log run (for example, generating a stats graph where caching as you draw helps, but the cached values will be stale the next time you need to do it) but changing it around to iterative can sometimes give you a further speedup.

def fib_iter(n)
  return 1 if n < 3  
  f = f1 = 1
  (2..n).each do
    f, f1 = (f+f1), f
  end
  f
end

The benchmarking speaks for itself. (Same parameters as Ryan's benching, 10,000 runs doing fib(15)):

                      user     system      total        real
fib-ruby         21.180000   3.640000  24.820000 ( 24.989140)
fib-hash-reset    0.510000   0.070000   0.580000 (  0.609976)
fib-cache-reset   0.510000   0.050000   0.560000 (  0.570715)
fib-iter          0.160000   0.020000   0.180000 (  0.209565)
fib-hash          0.020000   0.000000   0.020000 (  0.034616)
fib-cached        0.020000   0.010000   0.030000 (  0.035222)

Benchmarks for fib-ruby and fib-cached come from Ryan's post. fib-iter and fib-hash are mine.

The two "-reset" methods are indicative of times when global caching won't help you, which is still a significant speedup over the uncached versions. (For fib(15), uncached will need ~610 method calls, compared to ~15) The iterative method is about 1/3 their speed, but when you can globally cache you can get huge gains - if I increased the number of runs in the benchmark, the discrepancy between fib-iter and fib-cached would increase even more.

So once again, it seems that there's a different best solution for two different problems.

And the fib-hash benchmark? It's not significantly faster than Ryan's fib-cached method, but it bumps the fib logic from a method that uses a hash into the hash itself. It's a neat trick I picked up a while ago, but probably too ugly to make significant use of unless your benchmarking tells you otherwise - it's really hard to read at first glance:

def hashfib(n)
  return 1 if n <= 1
  h = Hash.new{|h,k| h[k] = h[k-1] + h[k-2] }
  h[1] = 1
  h[2] = 1
  h[n]
end

The cached version uses @@h instead of h, and ||=s it.

Dynamic ActiveRecord Attributes

Chris Abad wrote yesterday about his experience with dynamic attributes, and I thought I'd share mine.

I'm doing something similar to collect data POSTed to a form, but my data is slightly more structured than Chris'. I have a few fields that I expect to be populated most of the time, and the possiblity of arbitrary fields being set as well. My models look like this:

create_table "leads", :force => true do |t|
  t.column "email", :string, :default => "", :null => false
  t.column "firstname", :string
  t.column "lastname", :string
  t.column "ip", :string, :default => "", :null => false
end
create_table "lead_infos", :force => true do |t|
  t.column "lead_id", :integer, :default => 0, :null => false
  t.column "name", :string, :default => "", :null => false
  t.column "value", :string, :default => "", :null => false
end

Lead, of course, has_many :lead_infos. Thus, I can assume that most leads will have an email, first and last name, and an IP address. The name fields are optional, but common enough to warrant being in the main table (also makes for easier lookups and duplicate checking). Other things, like address, city, zip, etc. I want to hang on to if provided, so I store them as a LeadInfo.

I'm in the same boat as Chris though, as I want to provide uniform access to the data points in a Lead, as well as its LeadInfos, using the 'name' field as a key. Chris added an after_find hook that moved all the correct data in, but since I'm such a fan of metaprogramming, I decided that I would use method_missing like so:

class Lead < ActiveRecord::Base
  has_many :lead_infos, :dependent => true

  def method_missing(methodname, *args)
    begin
      super
    rescue NameError
      name = methodname.to_s.chomp('=')
      if (methodname.to_s =~ /=$/)
        LeadInfo.create(:name => name, :value => args.first, :lead_id => self.id)
        self
      else
        LeadInfo.find(:first, :conditions => ['name = ?', name]) or raise
        lead_infos.find_by_name(name).value rescue ''
      end
    end
  end

  ...
end

Walking through this, I first make sure to call super so that ActiveRecord's method_missing gets run first. If it can't find anything to do, then it is the Lead's turn to try.

We start by stripping a trailing = if it exists to get the correct name to use. If the = existed, it's being used as a setter, so we just create the new LeadInfo record based off the current Lead, and return self so that we can chain calls if we desire.

Otherwise, it's a getter, so I first verify that there is at least one LeadInfo with the supplied name - if not, then something is very wrong and we want to re-raise the NameError. Elsewise we so a search for the given value and return it, or an empty string if it is not set (the empty string catch-all is specific for the things I'm doing with this bit of hackery, so might not be applicable to everybody).

The performance difference between my code and Chris' is that I take 2 DB queries for each access to the LeadInfo data, but only when you ask for it. I'm not caching the result (which would take some strain away) because my use of it is solely one access each time I load the Lead from the DB, so caching wouldn't help me. On the other hand, if I do a grand find of a bunch of leads (and in a few places in my code that's quite a lot) I'm not getting hit with extra db hits that I'm not going to use most of the time.

There's benefits to both what I've done and what Chris has done, so anyone reading these can take their pick :)

Comments

Nice write up. I was going to go the MethodMissing route if it weren't for my requirement to have all the attributes insterted into the objects attributes hash. I think you're write about the catch-all being specific to you. Most people would probably want to re-raise the error and deal with that appropriately elsewhere.

  • Chris Abad, at 18:45, Sep 13 2006