Git Rebase by Example

June 20th 2017 | Tags: git

Today at work, I determined that my current work in progress branch was going to want to be merged into production ahead of our next scheduled merge/deploy of master. To make it easier to merge into our production branch for a hot-fix deploy, I want to have the branch based off production instead of master.

So, I could do this manually:

$ git reset HEAD~ # move HEAD commit back to staging
$ git stash save # move uncommited changes to stash
# repeat until all my changes are in stash
$ git checkout production
$ git stash pop
$ git commit ...
# again, repeat until complete

Howerver, this is a pain in the neck because it’s all manual, and it loses the commit metadata (timestamps, but also author info if you’re moving commits from multiple people). Also, because I prefer small merges to big merges, I've merged master into my branch a few times, and that means deciding to commit or wipe commits.

Instead, let's git do all the work, and use git rebase to move my commits to their new home, and also filter out those merges.

Before getting started, we should backup my branch, by just doing a git checkout -b wip-rebase. Then, to confirm exactly what my changes were we can run git cherry -v origin/master. This will list all the commits on the current branch that are not on the specified branch.

Then, I want to start by getting rid of all my extra merges, and filter the branch to just these outbound changes. For this we'll do an interactive rebase: git rebase -i acfad~, where acfad is the commit hash at the top of my change list (you can also just use a visual git browser or whatever to find the base commit for your changes). This will pop open an editor, and let me decide exactly how I want to handle each commit.

This is a powerful tool, and all its options are beyond the scope of this article, but it will let you do anything from editing a commit message, to combining or reordering commits, to executing shell commands between specific commits.

Here, we're just dropping commits, so we can scroll down the editor here and any commit that wasn't listed in our git cherry output we can change the pick to drop. Or, even easier, delete the line.

Now save and quit the editor, and git will do its job. We can verify with another git cherry -v origin/master and compare output now with earlier output. Note that our oldest commits are unchanged, but the commit hash for the newer ones is changed. This is because the hash of a commit depends on the hashes of its parents, and by deleting merged commits from our history we are of course unable to use them as parents anymore. Also, looking directly at the git log we can see that deleting all commits from one branch of a merge commit will also delete the merge itself. For our purposes, as long as the commit messages in both git cherry outputs are the same, then we're doing fine.

Now that the busy work is behind us, we can get to the actual goal of moving the commits. The command for this will be git rebase --onto origin/production acfad~ – again we're referring to the parent of the first commit in our branch.

This is the part that works exactly as I described above, git is "rewinding head to replay your work on top of it". As with any merge (applying a commit in a changed context) you can expect to see some conflicts and can resolve them as usual. However when done, instead of commiting we'll move on with git rebase --continue. If at any time things get confused and you make a mistake, git rebase --abort will roll you back to your branch as it was before rebasing.

Once you're done with the conflicts, git will finalize the rebase behind the scenes, and leave you back at your current branch having updated the root parent, and touched all the commits on the way down. We can confirm this with a final git cherry -v origin/master.

Now is a good time to run specs (or let CI work that for you) and do whatever other testing you need to confirm everything functions as it should.


Building an Object Graph in Rails

June 4th 2015 | Tags: ruby rails

I was needing to do some object cleanup in our rails app the other day, and purge some malformed objects, so I put together a quick script using some ActiveRecord reflection to walk the object chain.

# Squelch SQL logs, if you're running from rails console
ActiveRecord::Base.logger.level = 1

# Output formatters. Trailing ':' on both, plus the staggered indent
# of 4n and 4n+2 makes the output valid YAML, if automated analysis
# is called for.
def puts_node(node, indent)
  puts "    "*indent + node.class.name + "#" + node.id.to_s + ":"
end
def puts_assoc(assoc, indent)
  puts "    "*indent + "  " + assoc.to_s + ":"
end

def puts_tree(node, seen=[], indent=0)
  puts_node(node, indent)

  unless seen.include? node
    # seen maintains a list of nodes to avoid mutual recursion
    seen << node
    node.class.reflections.keys.each do |assoc|
      # To see all locations an object is referenced, get rid of the
      # "- seen" here. I only cared about which objects were present
      # anywhere in the tree, so this was fine.
      associated = Array(node.send(assoc)) - seen
      next if associated.empty?

      # Print an entry for the association, then recurse
      puts_assoc(assoc, indent)
      associated.each do |subnode|
        puts_tree(subnode, seen, indent+1)
      end
    end
  end

  # Also outputs a footer listing all seen objects once, take it or leave it
  if indent.zero?
    puts
    seen.each {|node| puts_node(node, indent)}
  end

  nil # return nil to avoid flooding terminal in rails console
end

# Usage: call with the root node for the object graph
puts_tree(User.find(42))

Worked like a charm, and made it easy to compare my bad object with other good ones. Just be careful of any global objects (a common shared subscription package, for instance, that has_many :users) that could lead to traversing your entire database, or logging associations that could overwhelm your output on older or heavily used objects. Subtracting a blacklist from reflection keys on line 18 would do the trick there.


No More Bundle Exec

September 6th 2012 | Tags: programming ruby

Bundler is pretty darn good. Installing all your gems globally sucks. bundle install --path does a great job of fixing that but it means you need to bundle exec any shell commands you want to run, which again sucks. There are lots of attempts to fix this, but they're all fairly convoluted.

I'm a fan of simpler solutions wherever possible. I use zsh as my shell, which has a handler you can hook into if the command you're trying to run is not found. It's a simple matter to hook that into a custom shell script from your ~/.zshrc:

function command_not_found_handler() {
    ~/bin/command-not-found $*
}

I know bash supports this kind of handler (Ubuntu uses it to provide command helpers for not-yet-installed programs) but I don't know the exact details. Alas, my favorite shell ever, fish, only provides the executable to its corresponding helper, so while it can suggest an alternate command, it can't auto-correct it.

My script happens to be in Ruby, but it could just as easily be a standard shell script as all I'm doing is some file existence tests:

#!/usr/bin/env ruby

# ARGV is the entire command we wanted to run, but we
# really only care about the actual executable for fallbacks
command = ARGV.first

def run(cmd)
  $stderr.puts "Running #{cmd.inspect} instead"
  system(cmd)
end

case
when File.exist?("./.bundle/config") && File.exist?("./bin/#{command}")
  run("bundle exec #{ARGV.join(' ')}")

else
  exit 127
end

Now, as long as you're being sure to bundle install --binstubs it should Just Work. And because it only functions if you're in a directory that's been bundled, you don't run into the security risks that you would by trying to get ./bin added to your $PATH directly.

Lastly, the case statement instead of an if is a bit redundant in the simple case above, I've actually got a few more filters for things like isolate and git - don't forget to quote anything that might need space literals:

# Paste git repo url to clone it
when command =~ /^git(@|:\/\/).*\.git$/
  run("git clone #{command.inspect}")

# paste compressed url to download+extract it
when command =~ /^(?:ftp|https?):\/\/.+\.t(?:ar\.)?gz$/
  run("curl #{command.inspect} | tar xzv")

when File.exist?("./tmp/isolate/ruby-1.8/bin/#{command}")
  run("rake isolate:sh['#{ARGV.join(' ')}']")

Joining Git Repositoires

May 15th 2012 | Tags: git

At work, we provide an API for our app and maintain web-based documentation for said API. We originally had the documentation in a separate git repo, but as it makes much more sense to maintain the docs directly alongside the code it documents we wanted to merge the two repositories. This was done in two steps.

Moving files

First, we need to prep the docs repo such that the content is in a reasonable location, rather than the root directory. This is done fairly easily with git filter-branch (zsh):

% mkdir -p doc/api_docs
% git checkout -b for_transplant # work in a branch for safety
% for file in <files/dirs to move>;
      do git filter-branch --tree-filter \
        "test -e $file && mv $file doc/api_docs || echo skip" \
        HEAD;
      done

This goes through out commit history, and runs through each commit moving old content into a subdirectory. It's slow in that it does a full history pass for each file, but I didn't care to figure out how to move everything except the docs directory.

Merging the repos

First, for clarity, let's make a new branch based off of the original commit in our destination repo:

% git log --oneline | tail -n 1
e7c9feb Initial commit
% git checkout e7c9feb
% git checokut -b doc_import

First, prepare the main repo for the incoming transplant by cloning a bare copy (git refuses to pull an external repo into a normal checkout).

% git clone --bare git@github.com:example/myapp.git myapp-bare

We can then pull in the commit objects from the other repository:

% cd myapp-bare
% git fetch -f ../api_doc_site for_transplant:api_docs

The -f tells git to ignore the different initial commits, and then we explicitly specify the remote and local branches to move commits to. You might need to resolve a merge conflict at this point, if for example both repositories committed a different .gitignore in their initial commit.

We can now go back to our regular repo and pull those branches in:

% cd ../myapp
% git remote add bare ../myapp-bare
% git pull bare api_docs

Finally, merging that branch into master gets things all up-to-date, and we can start unified work while maintaining the full original history of the documentation.


Parsing JSON in SQL

November 19th 2011 | Tags: ruby rails sql

The Problem: You have a database column with some data serialzed as JSON in it that you'd like to pull out into its own column to index it.

The Solution: Run a data migration to pull the value out. Table has 5 million rows and you don't want to round trip all that data through ActiveRecord? Just parse the json directly with some SQL:

def json(key, field='params')
  key_json = "\"#{key}\":"

  # key start/end locations, including ""
  k_a = "LOCATE('#{key_json}', #{field})"
  k_z = "LOCATE('\"', #{field}, #{k_a}+1)" # this is terminating "

  # is there a space after colons?
  spad = "IF(LOCATE('\": ', #{field}), 1, 0)"

  # is value a string?
  val_string = "LOCATE(CONCAT('#{key_json}', IF(#{spad},' ',''), '\"'), #{field}, #{k_a})"
  qpad = "IF(#{val_string}, 1, 0)"

  # value start/end locations, excluding "" if present
  v_a = "(#{k_z}+1 + 1 + #{spad} + #{qpad})" # 1 for colon, spad for optional space, qpad for possible quote

  end_if_string = "LOCATE('\"', #{field}, #{v_a})"
  end_if_not_string = "IF(LOCATE(',', #{field}, #{v_a}), LOCATE(',', #{field}, #{v_a}), LOCATE('}', #{field}, #{v_a}))"

  v_z = "IF(#{val_string}, #{end_if_string}, #{end_if_not_string})"

  value_string = "SUBSTRING(#{field} FROM #{v_a} FOR (#{v_z} - #{v_a}))"
  "IF(#{k_a}, #{value_string}, NULL)"
end

up do
  execute "
    UPDATE model_table
    SET status = #{json('status')}
  "
end

The generated sql looks pretty gnarly but mysql ran through it stupidly fast. I shudder to think how long it'd take activerecord to load and update each record individually.


Isolating Rails

January 19th 2011 | Tags: rails ruby

Rails 3 is now very friendly with regards to dropping Bundler support, only loading it if it's installed and a Gemfile exists. Since Isolate is so awesome, I thought I'd just drop a quick script in here to convert an existing Rails app to use Isolate instead of Bundler.

#!/usr/bin/env ruby

require 'fileutils'

File.open("Isolate", 'w') do |isolate|
  File.readlines("Gemfile").each do |line|
    next if line =~ /^\w*#/
    next if line =~ /^source/
    next if line =~ /^\w*$/

    line.sub!(/, :require.*(,|$)/, '\1')
    line.sub!(/^([ \t#]*)group/, '\1environment')
    
    if line =~ /:git/
      line = "# Don't use git, build it as a gem\n# " + line
    end

    isolate.puts line
  end
end

File.open("config/boot.rb", 'a') do |boot|
  boot.puts
  boot.puts("require 'isolate/now'")
end

FileUtils.rm('Gemfile')

This should convert an existing Gemfile to an Isolate file, remove the Gemfile (so that rails won't try to load it), and update the app to load Isolate appropriately.

I'm basically only guessing that the group/environment setup is correct, so if anyone has any corrections to this let me know and I'll update it.