31 March 2021

Don't let Entity Framework call the shots

With .NET Core, Entity Framework is the default choice when it comes to database access. And even though it definitely is a powerful beast that can do alot of things, it is also slightly complicated to use to be honest. I guess that is the reason that I keep seeing these really weird “EF patterns” being used in different projects… Most of these annoying “patterns” involve people replicating the database in the C# code, or writing code to fit EF instead of the other way around.

In most cases, there is no reason for your C# code to replicate the database. Sure, there are things that you will have to adapt to make it work with EF. But definitely not as much as a lot of people do. And when you do, you can often hide it, so that the outside of your classes don’t expose the ugly parts.

If there is one thing you take away from this post, it is that EF is there to map data from the database to your objects, in the way that you tell it to. In most cases that means that you can make EF map the data to your classes, instead of writing classes that fit into the way that EF maps the database…

Note: Yes, I do know of micro-ORMs like Dapper, but however much I want to love them, they always seem to leave me wanting a bit more. They are awesome for quickly pulling out data from the db, but they just aren’t that awesome when you want to work with relationships in my opinion…

So…for that reason, I thought I would write this post about how to get away from some of the ugliness I see in some EF projects. It isn’t that hard, it just takes some tweaking. And it makes your code so much more readable.

The “domain”

To demonstrate these things, I have created a tiny sample that should show the most commons “issues” I keep seeing.

The database looks like this

Database diagram

As you can see, with only 4 tables, it’s pretty simplistic. It consists of a student that has a many-to-many relationship to classes. And the classes in turn have a many-to-one relationship to teachers.

I know this is overly simplified and so on, but it is enough to show some of the annoying things I think you should stop doing…if you are doing them.

This database layout often ends up being mapped to something that looks like this

A Student class

public class Student
{
    public Student(string studentId, string firstName, string lastName)
    {
        StudentId = studentId;
        FirstName = firstName;
        LastName = lastName;
    }

    public int Id { get; private set; }
    public string StudentId { get; private set; }
    public string FirstName { get; private set; }
    public string LastName { get; private set; }
    public IList<StudentClass> Classes { get; private set; } = new List<StudentClass>();
}

A StudentClass class

public class StudentClass
{
    public int StudentId { get; set; }
    public Student Student { get; set; }
    public int ClassId { get; set; }
    public Class Class { get; set; }
}

A Class class

public class Class
{
    protected Class()
    {
        // Required by EF as it cannot pass in a Teacher
    }
    public Class(string name, Teacher teacher)
    {
        Name = name;
        Teacher = teacher;
    }

    public void ChangeTeacher(int teacherId)
    {
        TeacherId = teacherId;
        Teacher = null;
    }

    public int Id { get; private set; }
    public string Name { get; private set; }
    public int TeacherId { get; private set; }
    public Teacher Teacher { get; private set; }
    public IList<StudentClass> Students { get; private set; }
}

and finally a Teacher class

public class Teacher
{
    public Teacher(string firstName, string lastName)
    {
        FirstName = firstName;
        LastName = lastName;
    }

    public int Id { get; private set; }
    public string FirstName { get; private set; }
    public string LastName { get; private set; }
    public IList<Class> Classes { get; private set; }
}

Tip: If you find it annoying to read the code in-line in the post like this, you can find the whole project on GitHub. The Entities above are available here. And there are 2 branches, clean and clean-5, that contains the “fixed” code.

The actual retrieval (and yes, I am only retrieving stuff for this post to keep it simple) is done through a basic repo that has the following interface

public class StudentRepository
{
    public Task<Student> WithId(int id) {
      ...
    }

    public Task<Student> WithStudentId(string studentId){
      ...
    }

    public Task<Student[]> InClass(string className){
      ...
    }
}

I’m not going to cover the mapping of the entities because it is pretty basic. Instead I want to focus on the problems I see here…

The “problems”

The above implementation contains a few things that are “problematic” for me. They might not be a problem for everyone, and in some cases they might not even be a problem at all. But if this was the core classes in my application, I would definitely make some changes to clean it up.

Mapping entities and IDs

Let’s start by looking at my number one annoyance to be honest. People mapping relationships using both an entity and an ID.

For some reason, people keep exposing both an ID and the related entity when defining a relationship. This is definitely not necessary, and makes the code confusing…

An example of it would be in the Class class

public class Class
{
    ...

    public int TeacherId { get; private set; }
    public Teacher Teacher { get; private set; }

    ...
}

If you know how EF works, you probably know that if you just set the TeacherId property to an existing teacher’s ID, EF will automatically populate the Teacher property for you when calling SaveChanges(). However, this is not obvious when looking at the code in any way. And if you aren’t familiar with EF, it’s almost impossible to grasp.

The fact what setting one property causes another property to be set at some other point in time, is just weird and confusing. Not to mention the fact that in some cases one or the other might actually not be set, even if they represent the same thing. On top of that it just looks ugly…

Sure, exposing just the Teacher property requires me to retrieve a Teacher instance before Im able to set it. And this might be a tiny performance hit, and a little bit annoying, but it is also a great time to verify that a teacher with the specified ID actually exists… It makes a lot more sense to find out that a teacher with that ID doesn’t exist, when you set up the relationship, than getting it when you tell EF to save the changes. Not to mention that it is often a lot easier to fix at that point, than trying to do it during the save. At least in my mind…

Note: When talking to a friend about this, I was told that this was not a common practice at all. However, I beg the difference. And looking at Microsoft’s own examples, we see this being demonstrated. For example at https://docs.microsoft.com/en-us/ef/core/modeling/relationships

So, to fix this, all we have to do is to remove the TeacherId and we are good to go.

public class Class
{
    ...

    public Teacher Teacher { get; set; }
}

That’s pretty much all we have to do from the entity’s point of view. However, depending on the way you have set up the mapping for the Class entity, you might need to remove the mapping for the TeacherId property. In my case, I depended on EF conventions to do that specific mapping, so I didn’t need to do anything else. It just works. And the mapping, if you are curious, looks like this

modelBuilder.Entity<Class>(x =>
{
    x.Property(x => x.Id).UseIdentityColumn();

    x.HasOne(x => x.Teacher).WithMany(x => x.Classes);

    x.HasKey(x => x.Id);
    x.ToTable("Classes");
});

I also tend to make the properties read only, and instead expose methods to make modifications. This allows the class to take more responsibility for its own functionality. For the Teacher relationship, that would look something like this

public class Class
{
    ...

    public void ChangeTeacher(Teacher teacher)
    {
        Teacher = teacher;
    }

    ...
    
    public Teacher Teacher { get; private set; }
}

Once again, a much cleaner looking class without confusing IDs that makes you wonder if you should set the id or the instance… And a clear methods that tells you what actions you can perform when using this class.

While on the topic of relationships, I might as well cover my other annoyance in this area. And that is over use of bi-directional navigation. Not every relationships has to support bi-directional navigation. In a lot of cases it is good enough to be able to go in one direction.

For example, an order row for an order does generally not need to provide a way to navigate to the order. Most of the time you don’t pull up an order row from the database, and then retrieve the order from that. Instead, you get the order, and then find the order rows. Because of this, we don’t need bi-directional navigation…

Note: There are definitely cases when you need bi-directional navigation, but I personally try to look at every relationship to verify wether or not that is actually needed…

However, if you look at the documentation, they are pretty much always setting up bi-directional navigation properties. And I get that we might just as well do that, as it is very easy. But I disagree with the idea that it is the best solution in all cases. Especially if you try to not use lazy loading.

Unfortunately, in my example, it might actually make sense to expose bi-directional navigation on all the entities. Depending on how the entities are to be used… So, to demonstrate this, we will have to ignore that and just pretend that there is no reason for the Teacher entity to keep a list of all the classes that the teacher has taught. But, the Class still needs to know who is teaching it.

Right now, the code entities look like this

public class Teacher
{
    ...

    public IList<Class> Classes { get; private set; }
}

public class Class
{
    ...

    public Teacher Teacher { get; private set; }
}

and the mappings look like this

modelBuilder.Entity<Teacher>(x =>
{
    x.Property(x => x.Id).UseIdentityColumn();

    x.HasKey(x => x.Id);
    x.ToTable("Teachers");
});

modelBuilder.Entity<Class>(x =>
{
    x.Property(x => x.Id).UseIdentityColumn();

    x.HasOne(x => x.Teacher).WithMany(x => x.Classes);

    x.HasKey(x => x.Id);
    x.ToTable("Classes");
});

As you can see, the Class entity maps the Teacher property with bi-directional navigation properties. From Class to Teacher using the Teacher property. And from Teacher to Class using the Classes property.

To simplify this, we can remove the Teacher.Classes property and just map it in one direction like this

modelBuilder.Entity<Class>(x =>
{
    x.Property(x => x.Id).UseIdentityColumn();

    x.HasOne(x => x.Teacher).WithMany().HasForeignKey("TeacherId");

    x.HasKey(x => x.Id);
    x.ToTable("Classes");
});

In this case we still map the Teacher property on Class, but we skip setting the inverse property. Instead we define the name of the column that contains the foreign key.

Mapping using attributes

At this point, I want to take a little side step and talk about a slightly different annoyance I have. The constant need to configure mappings using attributes…

As you can see from my examples, I use manually configured mappings in the DbContext.OnModelCreating method. This is a conscious choice over attributes. And even if I have received a lot of comments from colleagues that it is much easier and better to use attributes, I stand my ground. Why? Well, I have 2 distinct reasons…

First of all, I have not been through a single project where we haven’t ended up having to map at least some things manually anyway. And I really dislike using 2 different ways to solve the same thing, in the same project, as it makes it very hard to figure out where to find things.

Nice, you are using attribute based mapping! But why does this property work like this even if there is no attribute? Oh…wait…ok…so that specific thing is mapped in a different way in a different place…?

You see the problem?

So, I would much rather have it all in one place where it is easy to find. And since some things are either really hard to do, or impossible to do with attributes, I would much rather just do it all manually in OnModelCreating.

The second reason is a lot less technical… It is that I really dislike adding ORM specific stuff to my clean entities. I prefer to keep them as clean POCOs as much as possible. And adding mapping attributes to them just feels wrong to me. Not to mention the problems it would introduce if I ever decided to switch out EF.

Anyhow, now that I have explained why I use manually configured mappings over attributes, we can carry on with my next pet peeve…

Exposing IDs all over the code

This is another ID related thing… I really don’t see the need to expose the entities’ IDs all over the place. In a lot of cases, they aren’t really used for anything. They are just there because EF needs an ID to identify the entity, and the database needs one to handle relationships. So, if we want nice and clean entities that look like the real world, why would we have storage specific IDs exposed all over the place?

And in the cases then we have IDs in the real world, they often do not correspond to the simple IDs used in the database… In the demo code, the Student class actually has a completely different identifier called StudentId. This would be the “official” identifier for the student. Because of this, the database specific ID really doesn’t need to be exposed.

Note: The only reason that I have the WithId() method on the StudentRepository is that I want to show that you can still query entities by using the database specific ID, even if it has been hidden away as I am about to do on the Student class.

Ok…so how do we get away from exposing the IDs all over the place? Well, it isn’t hard at all. You just need to tell EF to map the value to a private field instead of a public property.

The first step is to internalize the ID using a private field instead of the public property.

public class Student
{
    private int? id;

    ...
}

Next, we need to update the EF mapping from the old mapping that uses a property like this

modelBuilder.Entity<Student>(x =>
{
    x.Property(x => x.Id).UseIdentityColumn();

    x.HasKey(x => x.Id);
    x.ToTable("Students");
});

to a mapping that uses a private field

modelBuilder.Entity<Student>(x =>
{
    x.Property("id").UseIdentityColumn(); // String instead of lambda

    x.HasKey("id");
    x.ToTable("Students");
});

As you can see, the only change made, was that instead of using a lambda expression to point out the Id property, I use a string with the name of the private field to use. And all of the sudden, Student doesn’t expose that unnecessary Id property anymore. I don’t even have to tell EF that it is a private field. It just works.

Note: It is true that in a lot of cases it makes sense to expose the ID property, or it might even be required to be exposed for some reasons. But in this case, with Student having another unique identifier (the StudentId property), the ID is really only there to handle the relationships in the DB.

Even if you want/need to expose the ID as a property on the entity, it could still be a good idea to hide it inside a private field and expose it using a more specific type, or value object.

Imagine that the Student class actually used the assigned integer as its ID. In that case, we could add a value object to represent the ID like this

public class Student
{
    private int? id;

    ...

    public StudentId ID => StudentId.Create(id);
}

public struct StudentId
{
    private int _id;

    public static StudentId Create(int id) => new StudentId { _id = id };

    ...
}

Just remember to override ToString, Equals etc on the value object to make sure it behaves like a value object…

Note: I’ll return to this idea of using a value object together with EF in a couple of minutes. At that point, I’ll also includes a more complete value object implementation…

That’s cool, but what about the WithId() method on the repository now that the ID property is gone? Well, it used to look like this (minus some includes)

public Task<Student> WithId(int id)
{
    return context.Students
                    ...
                    .Where(x => x.Id == id)
                    .SingleOrDefaultAsync();
}

But now that the Id property is gone, we can’t use this query anymore. The ID is still mapped though, so EF still knows about it. And that means that we can still query base on that value. We just need to change the query a little to make it work

public Task<Student> WithId(int id)
{
    return context.Students
                    ...
                    .Where(x => EF.Property<int>(x, "id") == id)
                    .SingleOrDefaultAsync();
}

As you can see, EF has this static helper class called EF that allows us to do things like query mapped private fields. So even if the public ID property has been removed, it is still possible to create the query we need. We just need to know the type and the name of the field, and we are good to go.

Long “include chains”

Another common thing is to use these long “include chains” to make sure that related items are included when lazy-loading is not enabled. And even though I personally never use lazy-loading, I really dislike these “include chains”…

public Task<Student> WithId(int id)
{
    return context.Students
                    .Include(x => x.Classes)
                    .ThenInclude(x => x.Class)
                    .ThenInclude(x => x.Teacher)
                    .Where(x => x.Id == id)
                    .SingleOrDefaultAsync();
}

This looks horrible! And on top of that, I often see these chains being added over and over again in different query methods… And they often look the same all over, since the returned objects often need to include the same data.

There are a few different solutions to this… One solution is to add it to the constructor, and expose a property that allows you to perform queries with the includes already in place. Like this

public class StudentRepository
{
    ...

    public StudentRepository(SchoolContext context)
    {
        Students = context.Students.Include(x => x.Classes)
                                    .ThenInclude(x => x.Class)
                                    .ThenInclude(x => x.Teacher);
    }

    public Task<Student> WithId(int id)
    {
        return Students.Where(x => EF.Property<int>(x, "id") == id)
                    .SingleOrDefaultAsync();
    }

    ...

    private IQueryable<Student> Students { get; }
}

That is so much cleaner! And it makes sure that all methods that performs queries against the Students property will get all the required includes. And if you ever need to change the included values, you can just make a change in a single place!

Another solution is to use an extension method like this

public static class DbSetExtensions
{
    public static IQueryable<Student> WithDefaultIncludes(this DbSet<Student> dbSet) => 
        dbSet.Include(x => x.Classes)
                .ThenInclude(x => x.Class)
                .ThenInclude(x => x.Teacher);
}

public class StudentRepository
{
    ...

    public Task<Student> WithId(int id)
    {
        return context.Students
                        .WithDefaultIncludes()
                        .Where(x => x.Id == id)
                        .SingleOrDefaultAsync();
    }
}

This still keeps the WithId method quite clean, but also allows you to have different includes by using different extension methods.

Another cool option is to skip the lambda based Include calls and use the string based overload. This allows you to defined several levels of includes in a single call like this

public class StudentRepository
{
    public StudentRepository(SchoolContext context)
    {
        Students = context.Students.Include("Classes.Class.Teacher");
    }

    ...
}

However, the downside to this is that it will remove the compile time check that verifies that the properties exist… On the other hand, it also allows you to include none public members…

Many-to-many mapping tables

Another annoying mapping thing I keep seeing, is many-to-many mappings that expose the connection table. This is a clear case of entities exposing the underlying database implementation/limitations…

An example of this would be the Student > StudentClass > Class mapping that is in the example… The Student class exposes a list of StudentClasses that in turn exposes a list of Class. This is only there because of the way that the database handles many-to-many mappings. In the real world, we would expect the Student entity to have a list of classes.

So, why aren’t we exposing it like that? “Because the database doesn’t look like that.” But that is a really poor excuse! We know that EF can map private fields. And we know that we can add our own read-only properties that expose private data. So why aren’t we using that?

The first step is to re-map the List<StudentClass> to a private field, and expose the actual classes through a read-only property. Like this

public class Student
{
    private List<StudentClass> classes = new ();
    ...
    public IReadOnlyList<Class> Classes => classes.Select(x => x.Class).ToList().AsReadOnly();
}

Note: I use new () to make sure that the field is set to an empty list. If we don’t, it will be null when no related classes are found. This in turn forces us to have null checks for that field…

Next,we can re-map the Student > StudentClass relationship to use the private field instead

public class SchoolContext : DbContext
{
    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        ...
        modelBuilder.Entity<StudentClass>(x =>
        {
            x.HasOne(x => x.Class).WithMany().HasForeignKey("ClassId");
            x.HasOne(x => x.Student).WithMany("classes").HasForeignKey("StudentId");

            x.HasKey("StudentId", "ClassId");
            x.ToTable("StudentClasses");
        });

        modelBuilder.Entity<Student>(x =>
        {
            ...
            x.Ignore(x => x.Classes);
            ...
        });
    }
}

As you can see, the StudentClass mapping is responsible for the actual mapping. It maps “itself” to the Student class using the WithMany("classes"). This will tell EF to map the list of StudentClass instances to the private field classes in the Student class. And then we need to tell EF to ignore the Classes property on the Student class, as it is not supposed to be mapped.

Thats it!

Note: Any changes to the list will have to be done by exposing methods on the Student class.

This looks so much better when using the Student class. It also reflects the real world a lot better. Sure, the ugliness is still there, but it is internalized inside the Student class.

However, if you are .NET 5, you can make this even better by using a new EF feature for this particular scenario. It allows you to remove the StudentClass class completely, but the code is a bit of a mouthful. It looks like this

modelBuilder.Entity<Class>(x =>
{
    ...

    x.HasMany<Student>("students").WithMany(x => x.Classes)
                                .UsingEntity<Dictionary<string, object>>(
                                    "StudentClasses",
                                    x => x.HasOne<Student>()
                                            .WithMany()
                                            .HasForeignKey("StudentId"),
                                    x => x.HasOne<Class>()
                                            .WithMany()
                                            .HasForeignKey("ClassId")
                                );

    ...
});

This tells EF to map a many-to-many relationship between Class and Student using the StudentClasses table. Using this syntax, we can skip the extra class completely, and let EF handle it. On the entities, this means that we can simplify them down to this

public class Student
{
    ... 

    public List<Class> Classes { get; private set; } = new();
}

public class Class
{
    private ICollection<Student> students;
    
    ...
}

One thing to note here, is that the Class class is now responsible for keeping track of the students in some way. In this case I have opted for using a private member called students

Converters

Previously I talked a bit about how IDs can be handled in different ways. And I mentioned that the Student class has it’s own identifier called StudentId. Right now, it is a string that is mapped like this

public class Student
{
    public Student(string studentId, string firstName, string lastName)
    {
        StudentId = studentId;
        ...
    }
    ...
    public string StudentId { get; private set; }

However, imagine that there are rules around the format of the string. Exposing it as just a string is not the best solution for that, even if we could add getters and setters on the property to limit what values are ok… This is one of those cases where a value object makes a lot of sense.

Instead of adding logic around the identity formatting to the Student class, it is much better put that in a separate value object. It could look something like this

public struct StudentId
{
    private static Regex ValidationRegex = new Regex("^[A-Z]{2}[0-9]{3}$");
    private string id;

    public StudentId(string id)
    {
        if (!ValidationRegex.IsMatch(id))
        {
            throw new ArgumentException("Invalid StudentId format", nameof(id));
        }

        this.id = id;
    }

    public override string ToString()
    {
        return id;
    }
    public override int GetHashCode()
    {
        return HashCode.Combine("StudentId", id);
    }
    public override bool Equals(object obj) => obj switch
        {
            string str when str == this.id => true,
            StudentId id when id.id == this.id => true,
            _ => false
        };
    public static bool operator ==(StudentId a, StudentId b) => a.id == b.id;
    public static bool operator !=(StudentId a, StudentId b) => a.id != b.id;
    public static bool operator ==(StudentId a, string b) => a.id == b;
    public static bool operator !=(StudentId a, string b) => a.id != b;
}

That is a lot of code, sorry about that, but essentially it is a struct that wraps a string identifier that consists of 2 letters and 3 digits. The actual functionality doesn’t really matter. The important part here is how we can get EF to do the mapping when we update the Student.StudentId property to be of this type

public class Student
{
    ...
    public StudentId StudentId { get; private set; }
    ...
}

If we just leave the mapping as it is right now, EF will throw an exception because it does not know how to handle StudentId. Luckily, we can tell EF how to handle this new type quite easily using a value converter. We just need to update the mapping of the StudentId property to look like this

modelBuilder.Entity<Student>(x =>
{
    x.Property(x => x.StudentId)
        .HasConversion(
            x => x.ToString(), 
            x => new StudentId(x)
        );
    ...
});

This tells EF that it needs to convert the value from the database before it can be added to the entity, as well as before it can be added to the database. All that is needed are 2 lambdas. The first one tells EF how to convert the StudentId to a string that can be stored in the database. And the second one tells EF how to convert the string from the database to a StudentId instance that can be added to the student. That’s it!

I promise, I’m getting very close to the end of this humongous blog post, but there is still one more thing I want to cover…

DbSet properties vs Set()

I see a lot of developers adding DbSet<T> properties for every one of the mapped types to their DbContext class. And even if I get that that makes it a bit easier to see what is available, I have to highlight that it isn’t required!

You don’t need to add a property for each type you want to be able to retrieve! As long as you have mapped the type in the DbContext, you can get hold of a DbSet<T> by calling DbContext.Set<T>(). This allows you to get hold of a DbSet<T> for any type that you have mapped, without having to expose a property.

So instead of adding a property for each type that you have mapped in the context, including many-to-many mapping objects and other things you will never actually query, you add just the ones you use very often, and use Set<T>() for the not so common types. Like this

return context.Set<Student>()
            .Where(x => x.Id == id)
            .SingleOrDefaultAsync();

This can make the DbContext class a lot cleaner, as you don’t add a ton of extra properties that you don’t actually use that often…

Conclusion

When you use EF, make EF do what you want, and not the other way around. Always try to make EF map your entities to the way that you want them to look, not the way that makes it easy to map using EF. And when that isn’t possible, try to hide the ugly stuff inside your classes, so that the ones using the classes don’t have to see it. This offers more flexibility to fix things when EF evolves, or when you decide to switch ORM. It also makes your code easier to use, not just better looking.

If you have any question, I’m available at @ZeroKoll as usual! And if you want to have a look at the code, it’s available on GitHub. Just remember to switch branches to see the “fixed” versions!