Repository pattern: Common implementation mistakes

Most articles about repository pattern expose good theory and incorrect implementations. Here, I want to focus on common mistakes when implementing this pattern with .NET, Entity Framework (EF) library and DDD principles.

But before start, I want to highlight the spirit of this pattern: Give to a program the capacity to operate over collections of domain objects that must persist between executions. There are two important facts related to this spirit in the context of DDD:

A concrete repository is tied to a concrete domain object type. In DDD, an aggregate root.
Repositories interface belongs to domain layer, and their methods must describe domain actions, using domain terminology.

For a full description of this pattern, see P of EAA page 322.

With some minor variations, most articles about repositories suggest an interface like this one:

public interface IRepository<T>
{
    List<T> ReadAll();
    T Read(Criteria criteria);
    T Create(T entity);
    T Update(T entity);
    T Delete(T entity);
}

What could be wrong with it? Let’s analyse it.

Mistake 1: Use of generics

This popular suggested interface imposes a fixed set of actions over any persisted type T. But a repository implementation must be tied to a domain object type and each domain object type will have a different set of actions, so there is no point to force a fixed set.

Let’s suppose we are working with a program that trace user activities. Instances of UserTrace are only expected to be recorded, not updated, not deleted, not searched. So, we can only use IRepository<UserActivity>.Create. What should we do with the other methods? Should we use throw NotImplementedException on them?

But it could become worst. Many programs with this interface also include a unique implementation. Something like:

public class Repository<T>: IRepository<T>
{
  // implementations go here
}

This class represents a generic repository, for all our persisted domain object types, with a very limited set of actions. This is a restricted DbSet, without all its flexibility and with additional drawbacks:

All repositories, and indirectly all service that use them, becomes coupled with one unique repository interface and implementation. Coupling means rigidity. Maintain these types or introduce any change becomes more and more difficult when program grows.
When program needs new actions related only to one aggregate root, programmers can (1) expand this unique repository or, (2) try to move new repository logic out. Both alternatives have problems.

Let’s take the first approach, expanding this unique repository for every new action needed. But we can not add methods valid for only some entities, because nothing prevents to use it with others. So, we are forced to add only methods with generic logic, even when we are sure those new methods will be used only with some specific entities. Forget about optimization. With this growing unique repository, we are breaking S in SOLID, and because we are artificially coupling all repository clients with methods they not need, we are breaking I in SOLID.

Mistake 2: Use of a base repository

Let’s take the second approach. Two common alternatives are used. One alternative is moving the new logic to application layer services, and it is easy to see we are breaking the S in SOLID with it.

Another frequent better alternative is deriving this generic repository with concrete ones. So, in practice, we end up with a base repository and many concrete ones. To show the problem with this, let’s suppose we have a program that handles students and grades, and specific actions like:

public interface IStudentRepository : IRepository<Student>
{
  Student GetStudentByEmail(string email);
  IEnumerable<Student> GetTopStudents(uint topLimit);
}

GetStudentByEmail could easily be handle using the method IRepository<T>.Read. What about GetTopStudents? We need to perform an ORDER BY and then a TOP in SQL, but Criteria should correspond only to a WHERE clause. Of course, programmer can extend the base repository, but we end up in the first approach.

An additional consequence of using a base repository is that it tends to receive more and more all the flexibility of the original DbSet class. But repository implementations belong to infrastructure layer, so we can tie them to any infrastructure library (like EF) we want. So, the question is: Why not use directly DbSet instead of constraining it?

Mistake 3: Pollute the domain layer

Please, read again the second fact about the spirit of repository pattern: Use domain terminology. Now, view again the interface IRepository described initially. Does it use domain terminology? If your project is about implement EF (or any generic ORM), the response is yes. Otherwise, it is no.

Going back to the students and grades example, what does method IRepository<T>.Create means? Assuming the context is a school, students are not created. Students are registered. The term Create belongs to the infrastructure layer. Because repository interface belongs to domain layer, we are polluting the domain.

Let’s see other domain pollution examples:

///////////////////////////////////////////////////////////
// in domain layer

public interface IStudentsRepository
{
  IEnumerable<StudentRecord> GetTopStudents(StudentsRequest request);
}

///////////////////////////////////////////////////////////
// in infrastructure layer

public class StudentsRepository : IStudentsRepository
{
  public IEnumerable<StudentRecords> GetTopStudents(StudentsRequest request)
  {
    var studentRecords = ...;
    return studentRecords;
  }

  // other code
}

///////////////////////////////////////////////////////////
// in application layer

public class SomeService
{
  private readonly IStudentsRepository students;

  public IEnumerable<Student> GetBestStudent(StudentsRequest request)
  {
    var studentRecords = students.GetTopStudents(request);
    var students = MapToDomainObjects(studentRecords);
    return students;
  }

  // other code
}

In previous code, IStudentsRepository is contamined with a type belong to the infrastructure layer (StudentRecord) and a type belong to application or presentation layer (StudentsRequest). A better code could be:

///////////////////////////////////////////////////////////
// in domain layer

public interface IStudentsRepository
{
  IEnumerable<Student> GetTopStudents(uint topLimit);
}

///////////////////////////////////////////////////////////
// in infrastructure layer

public class StudentsRepository : IStudentsRepository
{
  public IEnumerable<Student> GetTopStudents(uint topLimit)
  {
    var studentRecords = ...;
    var students = MapToDomainObjects(studentRecords);
    return students;
  }

  // other code
}

///////////////////////////////////////////////////////////
// in application layer

public class SomeService
{
  private readonly IStudentsRepository students;

  public IEnumerable<Student> GetBestStudent(GetTopStudentsRequest request)
  {
    var students = students.GetTopStudents(request.top);
    return students;
  }

  // other code
}

One important detail here: Call to MapToDomainObjects was moved from SomeService to StudentsRepository. In this way, the repository is forced to reply in terms of the domain.

Use the right terminology is not an aesthetic problem. It has important consequences. It guides you to implement in one specific direction.

Mistake 4: Use repositories

Repository pattern has limitations. Please, read again the first fact about it: One repository is tied to one aggregate root. What can we do when we need a query that combines two or more aggregate roots? It is easy to find use cases for those queries, like a report section in a program.

Use cases like reporting could be better handled with specific services. Even more, those services could be defined out domain layer, and be implemented with libraries different to EF, like Dapper, more efficient in these cases.

What about other cases? If a use case in your application layer needs to perform queries or changes that involves two or more aggregate roots, you can use Unit of Works (UoW) in your solution. You can read about common mistakes with UoW here.

Final comments

Having a common repository interface and implementation could be see as a good example of DRY principle. If we want to read entities by Id, why do not have only one implementation? Yes, until you must deal with an entity with a more natural identity. For example, email addresses could be a natural identity of some system users. And this is the reason DRY is not easily applied here: Even when all entities are by nature similar from the infrastructure point of view (at the end, all are records in tables), they are different form the domain point of view.