GraphQL with .NET Core (Part - XI: Data Loader)

Code samples used in this blog series have been updated to latest version of .NET Core (5.0.4) and GraphQL-Dotnet (4.2.0). Follow this link to get the updated samples.

With the updated version (4.2.0) of GraphQL-Dotnet, the DataLoader feature is no longer a part of the core library. It's shipped as a stand-alone NuGet package,

Install-Package GraphQL.DataLoader -Version 4.2.0

Our GraphQL queries are not quite optimized. Take the Orders query from CustomerType for example,

FieldAsync<ListGraphType<OrderType>, IReadOnlyCollection<Order>>(
    "orders",
    resolve: ctx =>
    {
        return repository.GetOrdersByCustomerId(ctx.Source.CustomerId);
    });

CustomerType.cs

Here, we are getting all the orders from the data store. This is all fun and games till you stay in the scaler zone of OrderType i.e. only querying the scaler properties of OrderType. But what happens when you query for one of the navigational property. For example, code in the OrderType is as following,

public OrderType(IRepository repository)
{
    Field(o => o.Tag);
    Field(o => o.CreatedAt);

    FieldAsync<CustomerType, Customer>("customer",
        resolve: ctx =>
        {
            return repository.GetCustomerById(ctx.Source.CustomerId);
        });
}

So, when you try to access the Customer field, practically you are initiating a separate request to your repository to load the related customer for a particular order.

If you are using the dotnet-cli, you can actually see all the EF query logs in the console for a query such as,

query GetOrders {
  orders {
    tag
    createdAt
    customer {
      name
      billingAddress
    }
  }
}

info: Microsoft.EntityFrameworkCore.Database.Command[20101]
      Executed DbCommand (26ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
      SELECT [o].[OrderId], [o].[CreatedAt], [o].[CustomerId], [o].[Tag]
      FROM [Orders] AS [o]
info: Microsoft.EntityFrameworkCore.Database.Command[20101]
      Executed DbCommand (23ms) [Parameters=[@__p_0='?' (DbType = Int32)], CommandType='Text', CommandTimeout='30']
      SELECT TOP(1) [c].[CustomerId], [c].[BillingAddress], [c].[Name]
      FROM [Customers] AS [c]
      WHERE [c].[CustomerId] = @__p_0
info: Microsoft.EntityFrameworkCore.Database.Command[20101]
      Executed DbCommand (1ms) [Parameters=[@__p_0='?' (DbType = Int32)], CommandType='Text', CommandTimeout='30']
      SELECT TOP(1) [c].[CustomerId], [c].[BillingAddress], [c].[Name]
      FROM [Customers] AS [c]
      WHERE [c].[CustomerId] = @__p_0
info: Microsoft.EntityFrameworkCore.Database.Command[20101]
      Executed DbCommand (1ms) [Parameters=[@__p_0='?' (DbType = Int32)], CommandType='Text', CommandTimeout='30']
      SELECT TOP(1) [c].[CustomerId], [c].[BillingAddress], [c].[Name]
      FROM [Customers] AS [c]
      WHERE [c].[CustomerId] = @__p_0

The logs very well suggest that; first, we are querying for all the orders and then for each order, we are querying for the customer as well. Here, for 3 customers we have 3 + 1 = 4 queries (total 4 hits on the database). Now, do your math and figure out how many times we will hit the database if we have N numbers of customers. Well, we will have a total N + 1 queries hence, the problem is named N + 1 problem.

To overcome this problem, we introduce DataLoader in our solution. DataLoader adds support for batching and caching in your GraphQL queries.

Adding support for DataLoader needs some configurations up front. Register the IDataLoaderContextAccessor and DataLoaderDocumentListener with a singleton lifetime in your ConfigureServices method,

services.AddSingleton<IDataLoaderContextAccessor, DataLoaderContextAccessor>(); 
services.AddSingleton<DataLoaderDocumentListener>();

IDataLoaderContextAccessor will be injected later in the constructors of graph types where data loader is needed. But first, in the middleware; we have to add the DataLoaderDocumentListener to the list of listeners of IDocumentExecutor's ExecutionOptions.

public async Task InvokeAsync(HttpContext httpContext, ISchema schema, IServiceProvider serviceProvider)
{
    if (httpContext.Request.Path.StartsWithSegments(_options.EndPoint) && string.Equals(httpContext.Request.Method, "POST", StringComparison.OrdinalIgnoreCase))
    {
        var request = await JsonSerializer
                                .DeserializeAsync<GraphQLRequest>(
                                    httpContext.Request.Body,
                                    new JsonSerializerOptions
                                    {
                                        PropertyNameCaseInsensitive = true
                                    });

        var result = await _executor
                        .ExecuteAsync(doc =>
                        {
                            doc.Schema = schema;
                            doc.Query = request.Query;
                            doc.Inputs = request.Variables.ToInputs();
                            doc.Listeners.Add(serviceProvider.GetRequiredService<DataLoaderDocumentListener>());
                        }).ConfigureAwait(false);

        httpContext.Response.ContentType = "application/json";
        httpContext.Response.StatusCode = 200;

        await _writer.WriteAsync(httpContext.Response.Body, result);
    }
    else
    {
        await _next(httpContext);
    }
}

Next, add a new method to your repository which takes a list of customer ids and returns a dictionary of customers with their ids as keys.

public async Task<IDictionary<int, Customer>> GetCustomersById(IEnumerable<int> ids, CancellationToken cancellationToken)
{
    return await _applicationDbContext.Customers.Where(c => ids.Contains(c.CustomerId)).AsNoTracking().ToDictionaryAsync(c => c.CustomerId);
}

Repository.cs

You can replace the Customer field with the following,

Field<CustomerType, Customer>()
    .Name("customer")
    .ResolveAsync(ctx =>
    {
        var loader = accessor.Context.GetOrAddBatchLoader<int, Customer>("GetCustomersById", repository.GetCustomersById);
        return loader.LoadAsync(ctx.Source.CustomerId);
    });

OrderType.cs

Idea behind GetOrAddBatchLoader is that it waits until all the customer ids are queued. Then it fires of the GetCustomersByIdAsync method only when all the ids are collected. Once the dictionary of customers is returned with the passed in ids; a customer that belongs to a particular order is returned from the field with some internal object mapping. Remember, this technique of queueing up ids is called batching. We will always have a single request to load related customers for orders no matter what i.e. we will at most have 2 requests.

info: Microsoft.EntityFrameworkCore.Database.Command[20101]
      Executed DbCommand (26ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
      SELECT [o].[OrderId], [o].[CreatedAt], [o].[CustomerId], [o].[Tag]
      FROM [Orders] AS [o]
info: Microsoft.EntityFrameworkCore.Database.Command[20101]
      Executed DbCommand (5ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
      SELECT [c].[CustomerId], [c].[BillingAddress], [c].[Name]
      FROM [Customers] AS [c]
      WHERE [c].[CustomerId] IN (1, 2, 3)

Notice the second query. See how it queries for all the customers with the incoming ids.

Similarly, for a collection navigation property, you have GetOrAddCollectionBatchLoader. Take the Orders field of the CustomerType for example. You add a new repository method as following,

public async Task<ILookup<int, Order>> GetOrdersByCustomerId(IEnumerable<int> customerIds, CancellationToken cancellationToken)
{
    var orders = await _applicationDbContext.Orders.Where(i => customerIds.Contains(i.CustomerId)).ToListAsync();
    return orders.ToLookup(i => i.CustomerId);
}

Notice, here we are returning an ILookup data structure instead of a dictionary. The only difference between them is ILookup can have multiple values against a single key whereas for the dictionary; a single key belongs to a single value.

Modify the Orders value inside the CustomerType as following,

Field<ListGraphType<OrderType>, IEnumerable<Order>>()
    .Name("orders")
    .ResolveAsync(ctx =>
    {
        var loader = accessor.Context.GetOrAddCollectionBatchLoader<int, Order>("GetOrdersByCustomerId", repository.GetOrdersByCustomerId);
        return loader.LoadAsync(ctx.Source.CustomerId);
    });

CustomerType.cs

GetOrAddCollectionBatchLoader and GetOrAddBatchLoader both caches the values of the field for the lifetime of a GraphQL query. If you only want to use the caching feature and ignore batching, you can simply use the GetOrAddLoader.

Caching is good for fields you request too frequently. So, you can add caching in your Items field of the GameStoreQuery as following,

Field<ListGraphType<ItemType>, IReadOnlyCollection<Item>>()
    .Name("items")
    .ResolveAsync(ctx =>
    {
        var loader = accessor.Context.GetOrAddLoader("GetAllItems", repository.GetItems);
        return loader.LoadAsync();
    });

GameStoreQuery.cs

Using data loader can resolve the issue of parallel query execution we tackled in the last post. We can go with the default DocumentExecuter instead of implementing of our own SerialDocumentExecuter