Friday, January 31, 2014

Passing Func When you Meant Expression

Queries are Very Slow

While working on a proof of concept project, I noticed my queries took longer than expected to run. The repository class uses an Entity Framework (Code First) DbContext. The function in the repository class in question had a signature that looked like:

public IEnumerable<entity> Find(Func<Entity, bool> whereExpression,int take,int skip){
return this.dbContext.Entities.Where(whereExpression).Skip(skip).Take(take);
}

The idea is I will let the client of the repository pass any expression that will return a boolean value to filter the entities collection.

I noticed the generated SQL was something like

select column1, .... columnn from dbo.table where <some_condition>
I did expect the output query to specify a limit to the query at the end (this was PostgreSQL). So I expected:
select column1, .... columnn from dbo.table where <some_condition> limit n;
I ran the same code against Microsoft SQL Server 2012 and the generated SQL had not top keyword to limit the number of returned rows.

What was Happening?

Basically the where operation and the take calls were not happening on the database server, they were being executed on the client-side, the C# console app in my case. The entire table (thousands of rows) of Entities was brought to the client and loaded in memory after the filtering and limiting took place. Very expensive and that explained why the query was much slower than I expected. 

Why was the Entire Table Loaded to Memory?

I just needed the top n rows so why the entire table loaded in memory? After browsing the documentation I came to the following understanding and conclusion. In LINQ if you enumerate a query it will cause the runtime to execute it to get the result. Enumerating a query is typically done via calls to function such as ToList(), or via a foreach expression. So which call in my two-line function was causing the Entities property to be executed and enumerated. Entities is of type DbSet which implements both IEnumerable and IQueryable and both interfaces have a Where method http://msdn.microsoft.com/en-us/library/gg696460(v=vs.113).aspx. Which implementation of where to call is determined by the type of the argument. Notice that IEnumerable's where looks like 
Whereas IQueryable's where looks like 

I noticed that when I called my Find function with Func<T,bool> as the type of the whereExpression variable, the Entities set was enumerated and loaded to memory as soon as the Where method is called. When I however pass an Expression (and not just a delegate) the following happened:
  • I got my limit statement (or top statement in T-SQL) in the generated SQL code
  • Therefore not all instances of Entity was loaded in memory
  • The query was faster as a result.
So the new signature of my Find function is 

public IEnumerable<entity> Find(Expression<func>Entity,bool>> whereExpression,int take,int skip)

Summary

There is a difference between passing a delegate Func or an Expression. Here is a good answer to the difference http://stackoverflow.com/a/3123181. Applying either a delegate or an expression has an effect on LINQ to Entities queries.