Protecting against large GraphQL query attacks

Jacob Voytko
6 min readJan 18, 2021

Learn about how attackers can scrape your site, or execute denial-of-service attacks, using your publicly-available GraphQL interface. They can do this in one of four ways: by carefully constructing a single large query to execute it, by writing lots of parallel queries that can fetch related data, by using batched requests to issue lots of queries back-to-back, and finally by sending lots of requests.

How the attack works

Scrapers are a fact of the web. They do everything possible to pull information from your site for their own purposes. Heck, scrapers aren’t even necessarily bad. After all, the GoogleBot that indexes sites for Google Search? That’s a scraper. But scrapers can be abusive or harm your site. They want all of the data that your site has to offer, but they don’t have direct access to your database. So they want to poll the data as often as they can get away with it, which could run up your server bills or cause service quality issues for your users.

Denial-of-service attacks are also a reality. Some attackers will attempt to take down your site by forcing the site to do as much work as possible. They don’t always have a reason; they just enjoy breaking things.

GraphQL is the perfect API for this: it is a flexible query language that allows attackers to craft whatever queries they want.

Let’s say that you work for a site that has users, and users can sell items. Lots of sites fit this format: Amazon, Gumroad, etc. “Drop shippers” would love to know the price of every item on the site, so that they can sell the item for more money elsewhere and pocket the difference. For example

First, the overly large query. If your GraphQL endpoint allows completely unfettered access to all objects on the site, they could write a query that could return every single object that you have in your database:

query {
users {
items_for_sale {
name
description
price_usd
}
}
}

Okay, so you weren’t born yesterday. Of course you don’t allow every single user to be queried from a single field. So here’s where clever attacks come in: what if users just request the items for every single user individually?

query {
# Request data for user 1
user_1: user(id: 1) {
items_for_sale(first: 100) {
name
description
price
}
}
# Request data for user 2
user_2: user(id: 2) {
items_for_sale(first: 100) {
name
description
price
}
}
# ... a bunch of requests omitted # Request data for user 10000
user_10000: user(id: 10000) {
items_for_sale(first: 100) {
name
description
price
}
}
}

They carefully constructed a query that requested data for the first 10,000 users. Granted, they didn’t get ALL of the listings, since they only requested the first 100, but most users don’t have 100. They can also issue followup pagination requests for the users that do return 100 items.

Okay, so you patch the request to handle this. Now you have to ask yourself, does your GraphQL server allow multiple requests to be batched together? Many client frameworks gather together requests and send them all together. Apollo GraphQL allows this kind of batching. They call it “transport-level batching” in their blog post about it.

[
{
query: < first query >,
variables: < variables for first query >
},
{
query: < second query >,
variables: < variables for secondquery >
}
]

Surely you can imagine how this plays out: instead of issuing one enormous megaquery with 10,000 requests, you could issue a single HTTP request with 10,000 batches.

And finally, we have the final dimension: the attacker could just send 10,000 separate requests separately.

Persisting queries

One mitigation strategy is to use persisted queries.

Normally, GraphQL requests have two payloads: a query and variables.

In JSON format, the query bundle might look like this:

{
"query": "query {
user(id: $user_id) {
name
}
}",
"variables": {
"user_id": 123
}
}

This isn’t the only way to execute GraphQL queries, though. A server could be configured to persist GraphQL queries instead. The client can no longer specify custom GraphQL queries for themselves. Instead, the query itself is defined on the server and referenced by the client.

Here’s an example of what that request might look like:

{
"query": "SomeHashValueReferringToTheQuery",
"variables": {
"user_id": 123
}
}

On the server, the handler would look up "SomeHashValueReferringToTheQuery" and substitute it with the appropriate query before passing the query and variables to the GraphQL engine.

This can help mitigate two of the attacks: preventing the user from executing queries that fetch too much data, and preventing the user from executing carefully crafted queries that enumerate all of the data that they want to query. It can also work with other mitigations to help minimize the other attack vectors. If you limit the number of queries that can be batched together, then you can use persisted queries to limit the overall complexity of the queries. This could also work with rate limiters to prevent multiple requests from querying too much data.

This isn’t a silver bullet. Query hashing is challenging to productionize. For example, if a version of your client is released into the wild, then the server needs to understand all hashed queries as long as the app is supported. If your users are still running versions of a mobile app from 2019, then the query hashes from that version of the app need to keep working. This means that you need to store query hashes that are no longer used by any query in your codebase.

Limiting query complexity

There is another solution that is much more complex, but produces much better results: limiting query complexity.

In this scheme, you provide a specification for how “expensive” every single field of your query is. For example, maybe a raw field is worth 1 point and an object is worth 10 points. The server would calculate how many objects are being requested and the cost of each, and compare it to the query complexity threshold. If the threshold exceeds the configured limit, then the server rejects the request.

Here’s an example of how this works:

query {
# 1310 total
user(id: 1) { # 10 points
# This expression is 100 * (10 + 3) = 1300 points
items_for_sale(first: 100) { # 10 points for each object
# plus 1 for each field
name # one point
description # one point
price # one point
}
}
}

Overall, the query costs 1310 “points.”

  • Each inner field costs 1 point
  • The object read for each items_for_sale costs 10 points. So each item for sale costs 10 points plus the cost of each of its fields, which adds up to 13 points
  • There are 100 items for sale being requested, so 1300.
  • There is an outer-most user being requested, which is 10 points.
  • Thus, the query adds up to 1310 points.

If this is the heaviest query you anticipate executing, then you could set a single-query limit to something like 2000 points, to give yourself some room to add the query. See how your service performs under this load and determine whether you need to tighten this.

You could even combine this with user roles and allow specific users to execute heavier queries. For example, your employees could be allowed to execute queries for up to 100,000 points so that they can perform the large-scale analysis they need for their jobs.

Advanced rate-limiting with query complexity

Of course, this doesn’t allow you to prevent attackers from simply sending multiple requests back-to-back. For this, we will need to actually store the number of points that any IP or user token has consumed in the course of executing requests against the API, and prevent them from using too many within a certain time span.

For example, let’s say that there is a 10 minute limit of 10,000 points. If you execute the request to fetch data for 100 users, then that user has 8690 points left, and they can issue 6 more of those requests in the next 10 minutes before their account is throttled.

In general, you should set your limits such that it would be difficult for a person continually using your client to feel restricted by them. People can only issue requests so quickly. But a machine can issue as it is commanded, limited only by the laws of physics and the quality of the networking between you and the machine. You want to set your limits to protect against machines, and not people who are on their third cup of coffee.

Protect yourself with a general-purpose GraphQL proxy

Not every GraphQL framework offers features to limit query complexity. If you are in one of those situations, you can use a proxy server to reject those requests on your behalf.

I am an independent software developer, and I am working on a proxy that is capable of doing this. If you don’t want to go through the trouble of figuring out how to set up a rate limiter that can calculate query complexity and block requests based on it, then I feel like I can help you. The first 25 people to sign up will get 25% off their first year when the service launches, and everyone else will get 10% off their first year.

Learn more here

--

--

Jacob Voytko

Runnin’ my own business. Previously staff engineer @ Etsy, before that I worked on Google Docs