Batching queries with GraphQL

GraphQL has seen a booming growth over the last years. A large number of API providers are switching to this new paradigm of querying data on the Web.

At Mergify, we’re no exception. To implement our automation engine, we’re heavy users of the GitHub API whose version 4 is now provided using GraphQL. However, this latest version does not provide every feature that version 3 support. That means we’re actually mixing both API depending on our needs.

GraphQL is fantastic as it allows to retrieve numbers of information in a single call where an old-school REST API would have required several (tens of?) calls.

We, however, struggled to optimize this in certain cases as we’ll explain below.

Retrieving several items

Imagine that you want to retrieve a user with various attributes. With GraphQL, this is easily done with:

{
  human(id: "1000") {
    name
    height
  }
}

The human whose id is 1000 is retrieved with its name and height. Now, what if you want to retrieve all humans whose id is between 1000 and 1010 without 1003? That’s a weird query, and it’s unlikely that the API offers such a filtering method. You now have to do 9 queries.

The good news is that you can batch those queries in a single call — this is actually supported by GraphQL. Modern GraphQL servers accept a JSON array of queries to be sent as input.

Unfortunately for us, that’s not the case for GitHub GraphQL API.

The trick here, in ourcase, is to concatenate all the queries and name each of them by adding a prefix:

{
  Q0: human(id: "1000") {
    name
    height
  }
  Q1: human(id: "1001") {
    name
    height
  }
  Q2: human(id: "1002") {
    name
    height
  }
  Q3: human(id: "1004") {
    name
    height
  }
  # etc...
  Q9: human(id: "1010") {
    name
    height
  }
}

In the request above, 9 requests named Q0 to Q9 are being sent. The server replies with 9 results, prefixed with the same Q0 to Q9 name.

As far as we’ve seen, there’s no easy way to build this kind of query without doing it yourself. There is no library doing this. That’s why we had to build our own code.

What about… pagination?

Things can get a little uglier if you add pagination in the request above. Let’s imagine we want to request the list of friends of each of the human above. Our single request looks like

{
  human(id: "1000") {
    name
    friends(first: 100) {
      name
    }
  }
}

Most GraphQL queries who return a list uses a pagination system. That means you need to ask for pagination data, usually using the pageInfo attributes:

{
  human(id: "1000") {
    name
    friends(first: 100) {
      name
      pageInfo {
        hasNextPage
        endCursor
      }
  }
}

Once you retrieve the returned cursor, you’re able to use it as an after value for the friends attribute retrieval, getting the next batch of friends.

That being said, when batching queries, you might have some humans that have more than 100 friends, and some not. That’s where things get complicated, as you’d need to create a new batch of queries to iterate over the remaining friends.

Our solution

Mixing all of the issues listed above does not seem to be solved in any GraphQL libraries that we found out there. This is why we built our own set of functions to handle this kind of case.

We’re releasing today our Python package named graphql-utils as open source. It’s definitely in alpha version but serves us well to leverage the GitHub GraphQL API. It solves all the issues listed above.

We’ll be happy to see this more widely used, adopted and improved. Feel free to fork it and hack it!