Tuesday, 7 May 2013

Rails Activerecord - common performance pitfalls

I just found this great blog entry on Engineyard on the common pitfalls using Activerecord - when an innocent bit of code works great in development but in the wild, when faced with 1000's of records, it'll grind to halt. Here's a summary:

1. Model.find(:all)

In versions of Rails before 2.3, this is a memory killer. The most common form in the wild is:

Comment.find(:all).each{ |record| do_something_with_each(record) 1125;

If you have 100,000 Comments, this will load and instantiate all 100k records in memory, then go through each one. In Rails 2.3, the .each will paginate through the results, so you’ll only load in small batches, but this won’t save you from the following variations:

@records = Comment.all
@records = Comment.find(:all)
@record_ids = Comment.find(:all).collect{|record| record.id }

Each of these will load up all Comment records into an instance variable, regardless if you have 100 or 100,000 and regardless if you are on Rails 2.1 or 2.3

2. :includes are Including Too Much

Article.find(:all, :include => [:user => [:posts => :comments]])

This is a variant of the above, intensified by the one or multiple joins on other tables. If you only have 1000 articles you may have thought loading them in is not a big deal. But when you multiply 1000 that by the number of users, the posts they have and the comments that they have… it adds up.

3. :includes on a has_many

@articles.users.find(:all, :include => [:posts => :comments]])

Variation on the above, but through a has_many.

4. @model_instance.relationship

Referring to a has_many relationship directly like so:

@authors.comments

is a shortcut to the potentially bloated:

@authors.comments.find(:all)

Be sure that you don’t have thousands of related records, because you will be loading them all up.

5. Filtering Records with Ruby Instead of SQL

This is also fairly common, especially as requirements change or when folks are in a hurry to just get the results they want:

Model.find(:all).detect{ |record| record.attribute == "some_value" }

ActiveRecord almost always has the ability to efficiently give you what you need:

Model.find(:all, :conditions => {:attribute => "some_value"})

This is a simple example to make the point clear, but I’ve seen more convoluted chunks of code where detect or reject is using some non-attribute model method to determine inclusion. Almost always, these queries can be written with ActiveRecord, and if not, with SQL.

6. Evil Callbacks in the Model

I’ve helped a couple of customers track down memory issues where their controller action looked perfectly reasonable:

def update
  @model = Model.find_by_id(params991;:id])
end

However, a look at the filters on the model showed something like this:

after_save :update_something_on_related_model
.
.
def update_something_on_related_model
  self.relationship.each do |instance|
    instance.update_attribute(:status, self.status)
  end
end

7. Named scopes, default scopes, and has_many relationships that specify :include Where Inappropriate

Remember the first time you setup your model’s relationships? Maybe you were thinking smartly and did something like this:

class User
  has_many :posts, :include => :comments
end

So, by default, posts includes :comments. Which is great for when you are displaying posts and comments on the same page together. But lets say you are doing something in a migration which has something to do with all posts and nothing to do with comments:

@posts = User.find(:all, :conditions => {:activated => true}).posts

This could feel ‘safe’ to you, because you only have 50 users and maybe a total of 1000 posts, but the include specified on the has_many will load in all related comments – something you probably weren’t expecting.

8. Use :select When You Must Instantiate Large Quantities of Records

Sometimes, in the reality of running a real production site, you need to have a query return a large data set, and no, you can’t paginate. In that case, the first question you should ask is “Do I need to instantiate all of the attributes?”

Maybe you need all the comment_ids in an Array for some reason.

@comment_ids = Comment.find(:all).collect{|comment| comment.id }

In this case, you are looking for an array of ids. Maybe you will be delivering them via JSON, maybe you need to cache them in memcached, maybe they are the first step of some calculation you need. Whatever the need, this is a much more efficient query:

@comment_ids = Comment.find(:all, :select => 'comments.id').collect{|comment| comment.id }

9. Overfed Feeds

Check all the places you are making XML sandwiches. Often these controllers are written early on and don’t scale well. Maybe you have a sitemap XML feed that delivers every record under the sun to Google, or are rending some large amount of data for an API.

10. Monster Migrations

Finally, watch out for your Migrations, as this is a common place where you need to do things like iterate over every record of a Model, or instantiate and save a ton of records. Watch the process size on the server with top or with “watch ‘ps aux | grep migrate’”.

No comments:

Post a Comment

Comments are moderated, so you'll have to wait a little bit before they appear!