The N + 1 query problem is a common performance bottleneck found in queries that are written using an ORM (Object-Relational Mapping) tool, such as ActiveRecord.

The problem arises happens when N queries have to be executed, for each association, in addition to the first one made to fetch all the associated records.

For instance, if we had the following relationship between User and Appointment models

# app/models/user.rb
class User < ApplicationRecord
 has_many :appointments
end
# app/models/appointment.rb
class Appointment < ApplicationRecord
 belongs_to :user
end

and executed the query below in our code

Appointment.all.limit(5).each do |appointment|
 puts "#{appointment.user.name} has an appointment at #{appointment.date}"
end

this is what we would see in our rails console:

Appointment Load (0.5ms)  SELECT "appointments".* FROM "appointments" LIMIT ?  [["LIMIT", 5]]
 User Load (0.1ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ?  [["id", 1], ["LIMIT", 1]]
 User Load (0.1ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ?  [["id", 3], ["LIMIT", 1]]
 User Load (0.1ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ?  [["id", 5], ["LIMIT", 1]]
 User Load (0.1ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ?  [["id", 7], ["LIMIT", 1]]
 User Load (0.1ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ?  [["id", 9], ["LIMIT", 1]]

The query above seems to be harmless, but imagine we had to load 1,000 appointments! We’d need to hit the database 1 time (to grab all appointments) plus 1,000 times (to grab each appointment’s user), totalling 1,001 queries. In technical terms, the results are being lazily loaded, meaning that the “N” consecutive queries will be made as they are requested in appointment.user.name.

How to Reduce the Number of Independent Database Queries?

A possible solution for this is to use .includes, which will eager load the query result, meaning that the related associations (parent and children) will be loaded all at once through only a few queries.

For example, if we run the following code in the console

Appointment.includes(:user).limit(5).each do |appointment|
 puts "#{appointment.user.name} has an appointment at #{appointment.date}"
end

this is what we would get back:

Appointment Load (0.2ms)  SELECT "appointments".* FROM "appointments" LIMIT ?  [["LIMIT", 5]]
 User Load (0.3ms)  SELECT "users".* FROM "users" WHERE "users"."id" IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)  [["id", 1], ["id", 3], ["id", 5], ["id", 7], ["id", 9]]

As we can see from the result above, only 2 queries are being ran against the database, instead of 6 when the results were being lazily loaded.

Backing it Up with Data!

At this point you may be wondering “Okay, enough talking! Where is the data showing that using .includes actually improves query performance?”.

When attempting to lazy load 100 users’ appointments, we get the results below in miliseconds:

user     system      total        real
0.101984   0.096377   0.198361 (  0.761556)

On the other hand, when we try eager loading the same 100 records, we get faster execution times:

user     system      total        real
0.026250   0.010747   0.036997 (  0.078877)

If we take in consideration the total time, which is a sum of the time spent executing the code (user) plus the time spent in the kernel (system), eager loading the records is 5.36 times faster!

You can read more about Ruby’s benchmark module here.

How to Identify N + 1 Queries in an Application

In order to find all those N + 1 queries that are slowing down in your application, the community recommends using the Bullet gem.

References