fd Blog

Daniel Hilgarth on software development

A Fluent Repository Implementation - Part 1

Problem with classical implementations of the Repository pattern

After establishing that the Repository pattern is still the pattern of choice for separating the domain layer from the data access layer, I want to talk about a common problem when implementing it: Method explosion.

A classical implementation of the Repository pattern has one Get* method per attribute of the corresponding entity. I call these the primary operations.
Each of these methods returns the result of the specific request. After that, the query to the data store is finished and further filtering of the result - by another attribute - is only possible on the callers site.
As such, the repository can’t support this further filtering and, again, the domain layer has filtering logic in it. But this time with the additional disadvantage that, now, part of the filtering is happening in-memory. This leads to data being needlessly requested, materialized and transfered.

To get around this problem, the classical repository contains secondary operations.
Those are basically combinations of two or more primary operations.

Example: An EmployeeRepository could contain the following primary operations:

IEnumerable<Employee> GetByName(string name)
IEnumerable<Employee> GetByGender(Gender gender)
IEnumerable<Employee> GetTeamLeaders()
IEnumerable<Employee> GetByBirthday(DateTime birthday)

Now assume that a company wants to get all female team-leaders. None of the primary operations can give this data, so a secondary operation is introduced:

IEnumerable<Employee> GetTeamLeadersByGender(Gender gender)

Taking this further, you will end up with more secondary operations than primary ones for every entity with more than a few attributes. If you want to provide a secondary operation for every possible combination of primary operations you would end up with 2^n-1 operations in total, where n is the number of primary operations.

The secondary operations not only clutter the interface of the repository, they are also a possible violation of DRY: Often, they basically repeat the code of each of the primary operations they combine.

Example:

// primary operation
public IEnumerable<Employee> GetByGender(Gender gender)
{
    return Session.Query<Employee>().Where(x => x.Gender == gender);
}

// primary operation
public IEnumerable<Employee> GetTeamLeaders()
{
    return Session.Query<Employee>().Where(x => x.IsTeamLeader);
}

// secondary operation
public IEnumerable<Employee> GetTeamLeadersByGender(Gender gender)
{
    return Session.Query<Employee>().Where(x => x.Gender == gender && x.IsTeamLeader);
}

Improving on the classical implementations: Introduce the Fluent Repository

It would be nice if we could somehow change the primary operations in a way so they can be chained, but without re-introducing all the problems that come along with using IQueryable<T>.
The goal is to be able to write queries like this in the domain layer:

var femaleTeamLeaders = query.Employees.ByGender(Gender.Female).ThatAre.TeamLeaders;
var femaleEmployees = query.Employees.ByGender(Gender.Female);
var teamLeaders = query.Employees.ThatAre.TeamLeaders;

The result of each of these lines should be directly enumerable while still allowing the chaining we are seeing. A separate method to execute the query should not be needed. On the other hand, all specified operations should be taken into account when executing the query against the data store. For example, the first line should send both conditions to the data store. It should not send one to the data store and perform the filtering of the other in memory!

Analyzing these requirements leads us to two points:

  1. Being able to enumerate the result of the methods means that some kind of IEnumerable<T> has to be returned.
  2. Being able to chain the methods and properties means that a type needs to be returned that provides them.

Interfaces

That leads us to an interface that looks something like this:

public interface IQueryEmployees : IEnumerable<Employee>
{
    IQueryEmployees ByGender(Gender gender);
    IQueryEmployees TeamLeaders { get; }
    IQueryEmployees ThatAre { get; }
    IQueryEmployees And { get; }
    // more
}

This interface has a clearly defined structure and doesn’t leak implementation details about the data store. The implementation of IQueryEmployees could access a Web service, a database or files stored on disk.
This interface represents our employees repository. Please note the name: You can easily speak it out loudly and it will tell you exactly what it does: “I query employees”. You don’t have to call your Repository EmployeesRepository.

In the introductory sample, I used query.Employees... and so far I didn’t explain where that came from.
query is an instance of a simple interface that has one property per entity:

public interface IQueries
{
    IQueryEmployees Employees { get; }
    IQueryCustomers Customers { get; }
    // more
}

This interface can be used to access all entities. For the consumer, that’s it.

Implementation

A simple implementation of IQueryEmployees that uses LINQ to NHibernate could look like this:

public class Employees : IQueryEmployees
{
    Queryable<Employee> _query;
    
    public Employees(ISession session)
    {
        _query = session.Query<Employee>();
    }
    
    public IQueryEmployees And { get { return this; } }
    public IQueryEmployees ThatAre { get { return this; } }
    
    public IQueryEmployees TeamLeaders
    {
        get
        {
            _query = _query.Where(x => x.IsTeamLeader);
            return this;
        }
    }
    
    public IQueryEmployees ByGender(Gender gender)
    {
        _query = _query.Where(x => x.Gender == gender);
        return this;
    }
    
    public IEnumerator<Employee> GetEnumerator()
    {
        return _query.GetEnumerator();
    }
}

I omitted the implementation of IEnumerable.GetEnumerator and the null guard in the constructor for brevity.
I will walk you through the implementation:

  1. And and ThatAre just return this, because they are no-ops. They just exist to support the fluent syntax and make the query readable like a sentence.
  2. The constructor creates an initial query for the Employee entity. This query will return all employees without filter.
  3. GetEnumerator returns the enumerator of the underlying query. That means that it will be executed as soon as the repository instance is being enumerated:

     var femaleTeamLeaders = query.Employees.ByGender(Gender.Female)
                                            .ThatAre.TeamLeaders; // No database hit here
     foreach(var femaleTeamLeader in femaleTeamLeaders)           // Database will be hit here
         // ...
    
  4. TeamLeaders and ByGender modify the initial query and store the result back to the _query field. Enumerating the instance after these calls will now use the updated query.

This simple implementation will work and do the job.

Comments