Posts tagged Ruby

Tl;dr: Heroku's non-intelligent routing takes about 25% capacity of your dynos

I'm a big fan of Heroku. For everything from really small projects to medium sized projects it has helped me focus on the development of the applications rather than how I deploy it. Recently however, Heroku took some flak after RapGenius, previously a Heroku success story, noted that it did have scaling issues on the platform

Specifically, the routing was not as smart as was advertised: routers do indeed route to 'idle' dynos, but the list of busy dynos is only based on the requests passed through that specific router, without taking other routers into account. Heroku, due to its size, has so many routers that the result is practically random assignment of work to the nodes. What does this mean in practice? To find out, I've simulated a simple application to see what happens at different loads.

The simulation

We'll be simulating a application with 100 dynos. There will only be 1 kind of job for the dynos, and it will always take the same time (1 tick). The amount of incoming jobs is determined by the utilization, and the jobs are randomly distributed over the dynos. So, during a tick the following happens:

  1. A node may receive 1 or more jobs in its queue (to simulate incoming work)
  2. A node will remove exacly 1 job for its queue (to simulate doing work)
# objects.rb
class Node

  attr_reader :queue

  def initialize
    @queue = 0

  def tick
    remove_job || 0

  def add_job
    @queue += 1

  def remove_job
    @queue -= 1 if @queue > 0


For the simulation, I've run 10 samples for each utilization level. Each sample runs for 500 ticks and then writes the queue lengths of the dynos to a CSV.

# simulation.rb
require 'csv'
require './objects.rb'

UTILIZATION = (0..110)
NODES = 100
TICKS = 500

data = {}

UTILIZATION.each do |util|
  data[util] = []
  SAMPLES.times do
    nodes = []
    NODES.times { |n| nodes << }
    TICKS.times do |n|
      util.times { nodes.sample.add_job }
      nodes.each do |node|
    nodes.each do |node|
      data[util] << node.queue
end'../data/simulation.csv', 'w') do |csv|
  UTILIZATION.each do |util|
    csv << data[util]

I've used R to read and plot the data:

Queue size vs Dyno utilization


data <- read.table("../data/simulation.csv", header=FALSE, sep=",")
mean <- apply(data,1,mean)
median <- apply(data,1,median)
max <- apply(data,1,max)
sum <- apply(data,1,sum)

df = data.frame(utilization=seq(from=0,to=110),mean=mean,median=median,max=max)

png('../plots/queue_vs_utilization.png', width=1200, height=900, res=200)
ggplot(data = df) + 
  scale_shape_manual(name="Type", values=c(2,3,4)) +
  geom_smooth(aes(x = utilization, y = mean), span=0.2) +
  geom_point(aes(x  = utilization, y = mean, shape = "mean")) +
  # geom_step( aes(x  = utilization, y = median)) +
  geom_point(aes(x  = utilization, y = median, shape = "median")) +
  geom_smooth(aes(x = utilization, y = max)) +
  geom_point(aes(x  = utilization, y = max, shape = "max")) +
  scale_y_continuous("Queue size") +
  scale_x_continuous("Dyno Utilization (%)") +
  coord_cartesian(ylim=c(0, 20), xlim=c(0,100))

What does the data tell us? If you don't want to keep your users waiting (on average!) for more than 1 other request, you will need to take into account that non-intelligent routing takes out about 25% of your capacity, and overprovision 33% in dynos. It could be even worse if several of your request take a long time, because then the maximum queue size becomes more important and you might even go back to a utilization of 50% to get decent results.

Do realize that in the ideal scenario (intelligent routing) the max, average and median queue size would be 0 for ALL utilizations up to 100%. Random routing really has a measurable and visible impact on the responsiveness of your application

Perhaps, by the time you are spending money on over a 100 dynos, it becomes time to run your own load balancers. I recently ran into cloud66, which has a lot of overlap with AWS Elastic Beanstalk, but is something I might certainly consider for large apps. It also turns out their customer service is outstanding, but I'll save that for another post

If you want to learn more about this kind simulations, I highly recommend Exploring Everyday Things with R and Ruby by Sau Sheong Chang, available without DRM from O'Reilly.

Back to index

As a programming language, I love almost all aspects of Ruby. When watching people program in other languages, or during attempts at learning different languages (Erlang, the other day), I appreciate its syntax more and more. The combination with Rails is really great because, most of the time, it works as you'd expect.

And here comes the unexpected downside

Unfortunately, sometimes things go haywire, and it can be quite difficult to see where this is from, and how to fix it. Case in point, the views from this very blog --- the one you are reading right now --- sometimes render slowly. For example, I have a partial that renders that header above a post. On the homepage, this partial gets called about 10 times. 9 out of these 10 times, it will render in 2 or 3 milliseconds. However, one of them would randomly run up to 300ms. Thats 100 times performance loss! Very rarely, this happens twice, which pushes the total view rendering time into the 700ms region. Ouch!

Note: If you want to see what I ended up with, just scroll down to the last option. If you are interested in the PATH that led me there, feel free to read everything.

So, who's the culprit?

Because it was a random partial, not every time the same one, I began thinking that it was probably a larger issue, related to something bigger than my partial. As it turns out, the Ruby Garbage Collector (GC) locks your application when it decides to run, which usually happens when you allocate a lot of objects. Unfortunately, Rails 3 does exactly this: it allocates many of objects. So, I set out to improve the response time of my application. First try, delay the GC run until after the request is processed.

class ApplicationController < ActionController::Base
  around_filter :disable_gc


   def disable_gc

What I did here was disabling the GC during the request in ApplicationController, and then enabling and forcing a cleanup afterwards. On a general level, this works. The results are shown below, and you can see that the long view render time is gone. Don't mistake this for extra capacity though, the GC still needs to run, it just runs after we've handled our request.

Does this feel right? Do I really need to manually control the GC? I don't think so.

Before disabiling the GC during the request
  Rendered posts/_post_header.html.haml (2.7ms)
  Rendered posts/_post_header.html.haml (3.7ms)
  Rendered posts/_post_header.html.haml (2.5ms)
  Rendered posts/_post_header.html.haml (2.6ms)
  Rendered posts/_post_header.html.haml (3.7ms)
  Rendered posts/_post_header.html.haml (9.6ms)
  Rendered posts/_post_header.html.haml (269.5ms)
  Rendered posts/_post_header.html.haml (2.1ms)

  Rendered posts/_post_header.html.haml (3.7ms)
  Rendered posts/_post_header.html.haml (3.1ms)
  Rendered posts/_post_header.html.haml (3.3ms)
  Rendered posts/_post_header.html.haml (3.4ms)
  Rendered posts/_post_header.html.haml (3.7ms)
  Rendered posts/_post_header.html.haml (3.4ms)
  Rendered posts/_post_header.html.haml (3.8ms)
  Rendered posts/_post_header.html.haml (2.4ms)

Option 2: Improving code to allocate less objects

I was using a for loop to generate the partial for each post object:

- for post in @posts
      = render :partial => 'post_header', :locals => {:post => post}

While discussing this issue on the Rails IRC, someone suggested to use render :collection instead. Unfortunately, that wasn't so easy to factor in due to the second row that had to be nested in the .post object. I did change the for loop to a each block. Changing the code resulted in this:

- @posts.each do |post|
      = render :partial => 'post_header', :locals => {:post => post}

In addition, I changed all for x in y loops in WebL to render @models or each blocks. This did bring some relief. The GC would fire less often, thereby increasing the request speed. Still, I wanted to improve more. Two options seemed to remain:

  1. Start caching, thereby preventing the allocation of the objects
  2. Optimize ruby's GC parameters.

I ended up caching the bodies of the posts, since those usually contain quite some code blocks and this feels like it could put extra pressure on the GC. But I also found that --- starting with ruby 1.9.3 --- you can optimize your GC parameters using environmental variables.

Tuning the GC in Ruby 1.9.3

One of the strong features of ruby enterprise edition (REE) was its tunable GC. Starting with Ruby 1.9.3, the GC in mainstream ruby can also be tuned using the following three environmental variables:

  • RUBY_HEAP_MIN_SLOTS (default: 10000)
    The initial number of heap slots as well as the minimum number of slots allocated.

  • RUBY_GC_MALLOC_LIMIT (default:8000000)
    The number of C data structures that can be allocated before the GC kicks in. If set too low, the GC kicks in even if there are still heap slots available.

  • RUBY_FREE_MIN (default: 4096)
    The minimum number of heap slots that should be available after the GC runs. If they are not available then, ruby will allocate more slots.

As an indication how these values can change for Rails, REE gives you the values from 37signals.

RUBY_HEAP_MIN_SLOTS=600000 # This is 60(!) times larger than default
RUBY_GC_MALLOC_LIMIT=59000000 # This is 7 times larger than default
RUBY_HEAP_FREE_MIN=100000 # This is 24 times larger than default

What does this mean? Apparently, Ruby's defaults for memory allocation are too conservative/low for rails. By staring with a much bigger heap, and by delaying the GC until 7 times more objects are allocated, we can gain some extra performance. Don't forget that when the GC does run, it will have to more work. Still, Twitter claims that they reached an performance improvement of 20% to 40%.

In practice

Passing these variables to ruby is easy if you control the shell:

$ ~: irb (type GC.count to see how many GC runs you made while staring, I had about 6)
$ ~: export RUBY_HEAP_MIN_SLOTS=600000  
$ ~: export RUBY_GC_MALLOC_LIMIT=59000000
$ ~: export RUBY_HEAP_FREE_MIN=100000
$ ~: irb (type GC.count to see how many GC runs you made while staring, I had about 2)

But, how do I tell Passenger to do this for me? It doesn't respect my .bashrc, so I can't set those variables here. To make matters worse, I'm also using RVM... Help!

No fear, rvm ruby wrappers are here!

RVM uses a wrapper to direct Passenger to the right ruby while setting environmental variables. We can hook into this at exactly the same way. I have a RVM setup much like i showed in the rvm, passenger, apache, and ubuntu server tutorial

Passenger thinks ruby is located at place like /usr/local/rvm/wrappers/ruby-1.9.3-p125/ruby, but under the hood, this is a script from RVM that sets up the right ruby and environmental variables.

if [[ -s "/usr/local/rvm/environments/ruby-1.9.3-p125" ]]
  source "/usr/local/rvm/environments/ruby-1.9.3-p125"
  exec ruby "$@"
  echo "ERROR: Missing RVM environment file: '/usr/local/rvm/environments/ruby-1.9.3-p125'" >&2
  exit 1

See that line with source "/usr/local/rvm/environments/ruby-1.9.3-p125"? That's where we can setup the environmental variables. And here is the source, with the extra parameters already added at the last 3 lines.

export PATH ; PATH="/usr/local/rvm/gems/ruby-1.9.3-p125/bin:/usr/local/rvm/gems/ruby-1.9.3-p125@global/bin:/usr/local/rvm/rubies/ruby-1.9.3-p125/bin:/usr/local/rvm/bin:$PATH"
export rvm_env_string ; rvm_env_string='ruby-1.9.3-p125'
export rvm_path ; rvm_path='/usr/local/rvm'
export rvm_ruby_string ; rvm_ruby_string='ruby-1.9.3-p125'
unset rvm_gemset_name
export RUBY_VERSION ; RUBY_VERSION='ruby-1.9.3-p125'
export GEM_HOME ; GEM_HOME='/usr/local/rvm/gems/ruby-1.9.3-p125'
export GEM_PATH ; GEM_PATH='/usr/local/rvm/gems/ruby-1.9.3-p125:/usr/local/rvm/gems/ruby-1.9.3-p125@global'
export MY_RUBY_HOME ; MY_RUBY_HOME='/usr/local/rvm/rubies/ruby-1.9.3-p125'
export IRBRC ; IRBRC='/usr/local/rvm/rubies/ruby-1.9.3-p125/.irbrc'
unset RBXOPT
# Change Ruby GC settings to those of 37 signals
# Note to reader, optimize to your own liking, this is an example!
export RUBY_GC_MALLOC_LIMIT=59000000
export RUBY_HEAP_MIN_SLOTS=600000
export RUBY_FREE_MIN=100000

The result: a slightly higher memory consumption on your server, but much less pressure on the GC, which as a result gets called less often, which will improve your response times.

Got questions? Post them in the comments!

Back to index