Parsing JSON in SQL

November 19th 2011 | Tags: ruby rails sql

The Problem: You have a database column with some data serialzed as JSON in it that you'd like to pull out into its own column to index it.

The Solution: Run a data migration to pull the value out. Table has 5 million rows and you don't want to round trip all that data through ActiveRecord? Just parse the json directly with some SQL:

def json(key, field='params')
  key_json = "\"#{key}\":"

  # key start/end locations, including ""
  k_a = "LOCATE('#{key_json}', #{field})"
  k_z = "LOCATE('\"', #{field}, #{k_a}+1)" # this is terminating "

  # is there a space after colons?
  spad = "IF(LOCATE('\": ', #{field}), 1, 0)"

  # is value a string?
  val_string = "LOCATE(CONCAT('#{key_json}', IF(#{spad},' ',''), '\"'), #{field}, #{k_a})"
  qpad = "IF(#{val_string}, 1, 0)"

  # value start/end locations, excluding "" if present
  v_a = "(#{k_z}+1 + 1 + #{spad} + #{qpad})" # 1 for colon, spad for optional space, qpad for possible quote

  end_if_string = "LOCATE('\"', #{field}, #{v_a})"
  end_if_not_string = "IF(LOCATE(',', #{field}, #{v_a}), LOCATE(',', #{field}, #{v_a}), LOCATE('}', #{field}, #{v_a}))"

  v_z = "IF(#{val_string}, #{end_if_string}, #{end_if_not_string})"

  value_string = "SUBSTRING(#{field} FROM #{v_a} FOR (#{v_z} - #{v_a}))"
  "IF(#{k_a}, #{value_string}, NULL)"
end

up do
  execute "
    UPDATE model_table
    SET status = #{json('status')}
  "
end

The generated sql looks pretty gnarly but mysql ran through it stupidly fast. I shudder to think how long it'd take activerecord to load and update each record individually.


Isolating Rails

January 19th 2011 | Tags: rails ruby

Rails 3 is now very friendly with regards to dropping Bundler support, only loading it if it's installed and a Gemfile exists. Since Isolate is so awesome, I thought I'd just drop a quick script in here to convert an existing Rails app to use Isolate instead of Bundler.

#!/usr/bin/env ruby

require 'fileutils'

File.open("Isolate", 'w') do |isolate|
  File.readlines("Gemfile").each do |line|
    next if line =~ /^\w*#/
    next if line =~ /^source/
    next if line =~ /^\w*$/

    line.sub!(/, :require.*(,|$)/, '\1')
    line.sub!(/^([ \t#]*)group/, '\1environment')

    if line =~ /:git/
      line = "# Don't use git, build it as a gem\n# " + line
    end

    isolate.puts line
  end
end

File.open("config/boot.rb", 'a') do |boot|
  boot.puts
  boot.puts("require 'isolate/now'")
end

FileUtils.rm('Gemfile')

This should convert an existing Gemfile to an Isolate file, remove the Gemfile (so that rails won't try to load it), and update the app to load Isolate appropriately.

I'm basically only guessing that the group/environment setup is correct, so if anyone has any corrections to this let me know and I'll update it.


Migrating Disqus

December 22nd 2009 | Tags: blog

In changing this blog over to jekyll, my urls changed (there's now a trailing slash). Easy enough to tell google about it, just set up redirects, but there's no easy way to tell Disqus about it so my comments migrate over.

The good news is that it's pretty straightforward using their API, the only bad news is that I can't delete the new threads auto-generated for the new urls, so I'm just moving them out of the way.

I'm using the HTTParty gem to wrap API access, like so:

require 'rubygems'
require 'httparty'
require 'json'

class Disqus
  include HTTParty
  base_uri 'disqus.com'
  format :json

  def initialize(key, version='1.1')
    @key = key
    @version = version
  end

  def auth
    {:user_api_key => @key, :api_version => @version}
  end

  def get(action, opts={})
    result = self.class.get("/api/#{action}/", :query => opts.merge(auth))
    result["message"]
  end

  def post(action, opts={})
    result = self.class.post("/api/#{action}/", :body => opts.merge(auth).to_params)
    p result
    result["message"]
  end
end

Do note that I'm adding trailing slashes to the api calls to avoid a redirect. Doesn't matter for the GET, but the redirect on POST was causing issues.

With this in hand, I'm grabbing my forum, looping through the threads, and renaming any that have comments (a whopping 3 of them).

key = "secret" # get yours at http://disqus.com/api/get_my_key/
disqus = Disqus.new(key)

forum = disqus.get(:get_forum_list).first # I just have one
forum_api_key = disqus.get(:get_forum_api_key, :forum_id => forum["id"])

start = 0 # manual pagination, eww
loop do
  threads = disqus.get(:get_thread_list, :forum_id => forum["id"], :start => start)
  break if threads.empty?

  threads.each do |thread|
    posts = disqus.get(:get_thread_posts, :thread_id => thread["id"])
    next if !posts.empty?

    target_url = thread["url"]+"/"

    # There's another thread in the way...
    if other_thread = disqus.get(
      :get_thread_by_url, 
      :forum_api_key => forum_api_key,
      :url => target_url
    )
      # free up the url we want to use
      disqus.post(
        :update_thread,
        :forum_api_key => forum_api_key,
        :thread_id => other_thread["id"],
        :url => target_url + 'old'
      )
    end

    # update thread url
    disqus.post(
      :update_thread,
      :forum_api_key => forum_api_key,
      :thread_id => thread["id"],
      :url => target_url
    )
  end

  start += 25
end

Et voilĂ , old comments are in the right place now.


Jekyll: Custom Liquid Tags

December 4th 2009 | Tags: blog

The base install of Jekyll at the moment doesn't let you run any arbitrary ruby code. This is so that they can use it for github pages and not need to worry about making a super-secure sandbox just to generate some HTML.

Unfortunately, that means we're out of luck for creating custom liquid filters. The most annoying deficiency for me is tags. The way the default liquid map filter works isn't friendly with @site.tags, so to generate my Tags page I had to do some really crazy stuff with capture:

<div id="articles">
  <table>
    {% for tag_ in @site.tags %}
      {% capture tag %}{{ tag_ | first }}{% endcapture %}
      <tr><th>{{ tag }}</th>
          <th><a name="{{ tag }}" class="anchor">&nbsp;</th></tr>
      {% for post in @site.posts %}
        {% if post.tags contains tag %}
          <tr><td>{{ post.date | date: '%b %e, %Y' }}</td>
              <td><a href="{{ post.url }}">{{ post.title }}</a></td></tr>
        {% endif %}
      {% endfor %}
    {% endfor %}
  </table>
</div>

Fortunately, it wasn't to hard to make a fork, and in my fork I added a super simple code loading option. Now, I can add a quick extension in _lib/filters.rb like so:

module Jekyll
  module Filters
    def keys(input)
      input.keys
    end

    def tagged(input, tag)
      input.select{|post| post.tags.include? tag}
    end
  end
end

Now tags.html looks like this:

<div id="articles">
  <table>
    {% for tag in @site.tags|keys|sort %}
      <tr><th>{{ tag }}</th>
          <th><a name="{{ tag }}" class="anchor">&nbsp;</th></tr>
      {% for post in @site.posts|tagged:tag %}
        <tr><td>{{ post.date | date: '%b %e, %Y' }}</td>
            <td><a href="{{ post.url }}">{{ post.title }}</a></td></tr>
      {% endfor %}
    {% endfor %}
  </table>
</div>

There's a bit of trickery there that liquid doesn't document very well on lines 3 and 5 - in the second half of the for block you can chain filters on the collection you're iterating over. The short format used is something along the lines of collection|filter:arg,arg,arg|filter...

Similarly, I had some ugly code in my regular archive page to group by year and put headings in:

{% for post in site.posts %}
  {% unless post.next %}
    <tr><th>{{ post.date | date: '%Y' }}</th><th>&nbsp;</th></tr>
  {% else %}
    {% capture year %}{{ post.date | date: '%Y' }}{% endcapture %}
    {% capture nyear %}{{ post.next.date | date: '%Y' }}{% endcapture %}
    {% if year != nyear %}
      <tr><th>{{ post.date | date: '%Y' }}</th><th>&nbsp;</th></tr>
    {% endif %}
  {% endunless %}

  ...
{% endfor %}

Now, a few extra liquid filters later, it looks like this:

{% for post in site.posts %}
  {% if post|last_of_year? %}
    <tr><th>{{ post.date | date: '%Y' }}</th><th>&nbsp;</th></tr>
  {% endif %}

  ...
{% endfor %}

If you want to get easy extensions in your own project, rather than maintaining Yet Another Jekyll Fork, please vote up my merge request on github.


As a side note, blogging about liquid is a pain. The least pain I've found so far is to use liquid to output the leading open brace for all tags. Looks like garbage in my text editor, but it gets the job done:

{{'{'}}{ post.title }}

Blogging about blogging about liquid (as above) I leave as an exercise to the reader.


Hoptoad v2 in Merb

December 1st 2009 | Tags: hoptoad merb

Hoptoad has been bugging us for a week or two now to upgrade to v2 of its API, so we did that this week for our project at work. Except we're running Merb, not Rails.

Previously, we've been using Atmos' merb_hoptoad plugin, but it looks like it's been abandoned now in favor of a rack handler, and we needed to hack it a bit to support running multiple sites (with different API keys) off our single codebase. I'm always happy throwing code away, though, so I thought I'd try using the official Hoptoad Notifier plugin and see how hard it is to get working. It wasn't.

First, you probably want to make a gem of the plugin. I forked it a few days back, and just added a jeweller task to create the gem locally. Install in the local gems directory (or system-wide in production if you want to do it the hard way) and add it as a dependency.

Then, to actually use it, just add the following in the right places:

# init.rb, in Merb::Bootloader.after_app_loads

HoptoadNotifier.configure do |config|
  config.api_key = '...'
  config.environment_name = Merb.env
  config.project_root = Merb.root
  # See http://github.com/thoughtbot/hoptoad_notifier/ README.rdoc
  # for other config settings. You probably want to think about
  # params_filters and maybe ignore.
end

# application.rb, if you want available manually.
# If you just want it for completely unexpected errors you can stick it in
# exceptions.rb instead

def notify_hoptoad(error=nil)
  error ||= request.exceptions.first

  data = {
    :controller       => params[:controller],
    :action           => params[:action],
    :url              => "#{request.protocol}://#{request.host}#{request.uri}",
    # Looks like hoptoad is filtering these itself, we don't need to worry about it
    # other than configuring what needs to be filtered
    :parameters       => params.to_hash,
    :session_data     => session.to_hash,
    :cgi_data         => request.env.to_hash,
    :environment_vars => ENV.to_hash.merge(:RAILS_ENV => Merb.env)
  }

  HoptoadNotifier.notify(error, data)
end

# exceptions.rb, override default error handlers to submit to hoptoad

def internal_server_error
  notify_hoptoad
  render
end

def standard_error
  notify_hoptoad
  render
end    

# other spots you handle exceptions inline, just pass in the exception

begin
 ...
rescue => e
  notify_hoptoad(e)
end

It's not exactly as magical as the Rails plugin's auto-including, but it looks like it's getting the job done for us.


Enter Jekyll

November 30th 2009 | Tags: blog

So I've gone and switched up blogging platforms again. This time, it's Jekyll. It's not that I had any problems with webby, just thought I'd try it out. After spending a few hours porting things over (mostly layout and helpers, my posts are already in markdown) I think I'm going to stick with it.

Being blog-aware is looking like a huge win - if I'm ok with my posts just being recorded by date, rather than date+time, there's a lot less overhead that needs to get done to push a new post up. I have smaller YAML preambles, and the only custom code I'm maintaining is some custom Liquid filters. With less time being spent worrying about the blog software, I might be more successful at the whole blog-writing bit. Maybe.