As a programming language, I love almost all aspects of Ruby. When watching people program in other languages, or during attempts at learning different languages (Erlang, the other day), I appreciate its syntax more and more. The combination with Rails is really great because, most of the time, it works as you'd expect.

And here comes the unexpected downside

Unfortunately, sometimes things go haywire, and it can be quite difficult to see where this is from, and how to fix it. Case in point, the views from this very blog --- the one you are reading right now --- sometimes render slowly. For example, I have a partial that renders that header above a post. On the homepage, this partial gets called about 10 times. 9 out of these 10 times, it will render in 2 or 3 milliseconds. However, one of them would randomly run up to 300ms. Thats 100 times performance loss! Very rarely, this happens twice, which pushes the total view rendering time into the 700ms region. Ouch!

Note: If you want to see what I ended up with, just scroll down to the last option. If you are interested in the PATH that led me there, feel free to read everything.

So, who's the culprit?

Because it was a random partial, not every time the same one, I began thinking that it was probably a larger issue, related to something bigger than my partial. As it turns out, the Ruby Garbage Collector (GC) locks your application when it decides to run, which usually happens when you allocate a lot of objects. Unfortunately, Rails 3 does exactly this: it allocates many of objects. So, I set out to improve the response time of my application. First try, delay the GC run until after the request is processed.

class ApplicationController < ActionController::Base
  around_filter :disable_gc

  private

   def disable_gc
      GC.disable
      begin
        yield
      ensure
        GC.enable
        GC.start
      end
   end
end

What I did here was disabling the GC during the request in ApplicationController, and then enabling and forcing a cleanup afterwards. On a general level, this works. The results are shown below, and you can see that the long view render time is gone. Don't mistake this for extra capacity though, the GC still needs to run, it just runs after we've handled our request.

Does this feel right? Do I really need to manually control the GC? I don't think so.

Before disabiling the GC during the request
  Rendered posts/_post_header.html.haml (2.7ms)
  Rendered posts/_post_header.html.haml (3.7ms)
  Rendered posts/_post_header.html.haml (2.5ms)
  Rendered posts/_post_header.html.haml (2.6ms)
  Rendered posts/_post_header.html.haml (3.7ms)
  Rendered posts/_post_header.html.haml (9.6ms)
  Rendered posts/_post_header.html.haml (269.5ms)
  Rendered posts/_post_header.html.haml (2.1ms)

After
  Rendered posts/_post_header.html.haml (3.7ms)
  Rendered posts/_post_header.html.haml (3.1ms)
  Rendered posts/_post_header.html.haml (3.3ms)
  Rendered posts/_post_header.html.haml (3.4ms)
  Rendered posts/_post_header.html.haml (3.7ms)
  Rendered posts/_post_header.html.haml (3.4ms)
  Rendered posts/_post_header.html.haml (3.8ms)
  Rendered posts/_post_header.html.haml (2.4ms)

Option 2: Improving code to allocate less objects

I was using a for loop to generate the partial for each post object:

- for post in @posts
  .post
    .row
      = render :partial => 'post_header', :locals => {:post => post}
    .row
      .etc.etc

While discussing this issue on the Rails IRC, someone suggested to use render :collection instead. Unfortunately, that wasn't so easy to factor in due to the second row that had to be nested in the .post object. I did change the for loop to a each block. Changing the code resulted in this:

- @posts.each do |post|
  .post
    .row
      = render :partial => 'post_header', :locals => {:post => post}

In addition, I changed all for x in y loops in WebL to render @models or each blocks. This did bring some relief. The GC would fire less often, thereby increasing the request speed. Still, I wanted to improve more. Two options seemed to remain:

  1. Start caching, thereby preventing the allocation of the objects
  2. Optimize ruby's GC parameters.

I ended up caching the bodies of the posts, since those usually contain quite some code blocks and this feels like it could put extra pressure on the GC. But I also found that --- starting with ruby 1.9.3 --- you can optimize your GC parameters using environmental variables.

Tuning the GC in Ruby 1.9.3

One of the strong features of ruby enterprise edition (REE) was its tunable GC. Starting with Ruby 1.9.3, the GC in mainstream ruby can also be tuned using the following three environmental variables:

  • RUBY_HEAP_MIN_SLOTS (default: 10000)
    The initial number of heap slots as well as the minimum number of slots allocated.

  • RUBY_GC_MALLOC_LIMIT (default:8000000)
    The number of C data structures that can be allocated before the GC kicks in. If set too low, the GC kicks in even if there are still heap slots available.

  • RUBY_FREE_MIN (default: 4096)
    The minimum number of heap slots that should be available after the GC runs. If they are not available then, ruby will allocate more slots.

As an indication how these values can change for Rails, REE gives you the values from 37signals.

RUBY_HEAP_MIN_SLOTS=600000 # This is 60(!) times larger than default
RUBY_GC_MALLOC_LIMIT=59000000 # This is 7 times larger than default
RUBY_HEAP_FREE_MIN=100000 # This is 24 times larger than default

What does this mean? Apparently, Ruby's defaults for memory allocation are too conservative/low for rails. By staring with a much bigger heap, and by delaying the GC until 7 times more objects are allocated, we can gain some extra performance. Don't forget that when the GC does run, it will have to more work. Still, Twitter claims that they reached an performance improvement of 20% to 40%.

In practice

Passing these variables to ruby is easy if you control the shell:

$ ~: irb (type GC.count to see how many GC runs you made while staring, I had about 6)
$ ~: export RUBY_HEAP_MIN_SLOTS=600000  
$ ~: export RUBY_GC_MALLOC_LIMIT=59000000
$ ~: export RUBY_HEAP_FREE_MIN=100000
$ ~: irb (type GC.count to see how many GC runs you made while staring, I had about 2)

But, how do I tell Passenger to do this for me? It doesn't respect my .bashrc, so I can't set those variables here. To make matters worse, I'm also using RVM... Help!

No fear, rvm ruby wrappers are here!

RVM uses a wrapper to direct Passenger to the right ruby while setting environmental variables. We can hook into this at exactly the same way. I have a RVM setup much like i showed in the rvm, passenger, apache, and ubuntu server tutorial

Passenger thinks ruby is located at place like /usr/local/rvm/wrappers/ruby-1.9.3-p125/ruby, but under the hood, this is a script from RVM that sets up the right ruby and environmental variables.

if [[ -s "/usr/local/rvm/environments/ruby-1.9.3-p125" ]]
then
  source "/usr/local/rvm/environments/ruby-1.9.3-p125"
  exec ruby "$@"
else
  echo "ERROR: Missing RVM environment file: '/usr/local/rvm/environments/ruby-1.9.3-p125'" >&2
  exit 1
fi

See that line with source "/usr/local/rvm/environments/ruby-1.9.3-p125"? That's where we can setup the environmental variables. And here is the source, with the extra parameters already added at the last 3 lines.

export PATH ; PATH="/usr/local/rvm/gems/ruby-1.9.3-p125/bin:/usr/local/rvm/gems/ruby-1.9.3-p125@global/bin:/usr/local/rvm/rubies/ruby-1.9.3-p125/bin:/usr/local/rvm/bin:$PATH"
export rvm_env_string ; rvm_env_string='ruby-1.9.3-p125'
export rvm_path ; rvm_path='/usr/local/rvm'
export rvm_ruby_string ; rvm_ruby_string='ruby-1.9.3-p125'
unset rvm_gemset_name
export RUBY_VERSION ; RUBY_VERSION='ruby-1.9.3-p125'
export GEM_HOME ; GEM_HOME='/usr/local/rvm/gems/ruby-1.9.3-p125'
export GEM_PATH ; GEM_PATH='/usr/local/rvm/gems/ruby-1.9.3-p125:/usr/local/rvm/gems/ruby-1.9.3-p125@global'
export MY_RUBY_HOME ; MY_RUBY_HOME='/usr/local/rvm/rubies/ruby-1.9.3-p125'
export IRBRC ; IRBRC='/usr/local/rvm/rubies/ruby-1.9.3-p125/.irbrc'
unset MAGLEV_HOME
unset RBXOPT
# Change Ruby GC settings to those of 37 signals
# Note to reader, optimize to your own liking, this is an example!
export RUBY_GC_MALLOC_LIMIT=59000000
export RUBY_HEAP_MIN_SLOTS=600000
export RUBY_FREE_MIN=100000

The result: a slightly higher memory consumption on your server, but much less pressure on the GC, which as a result gets called less often, which will improve your response times.

Got questions? Post them in the comments!

Back to index

Comments:

40f3ab7bd9c6b59a7ae580f2667af7a1
Letronje, almost 2 years ago

is there a way to print these values from within ruby/rails app ? How do i know if they are being picked up correctly ?

Dennis
Dennis, almost 2 years ago

I'm not sure, there wasn't that much documentation on this GC stuff when I wrote it...

You can sign in using Github if you want to comment