Issues with Third-Party Packages

Following my recent Scratch for an Itch article, I’d like to follow-up by examining problems with third-party packages. In this case, we’ll use the example of django-oauth2-provider

This awesome and small django app makes your life easier when using oauth2 authentication. It provides very customizable models and methods to deal with tokens, so creating your own authentication backend based on auth2 for a django app is really straightforward.

Some days ago, we noticed that some queries were taking too long to be executed, especially the ones related to authentication. It was relatively easy to see that this time was increasing in the last weeks. It turned out that this queries were performed on a model/table included in django-oauth2-provider.

This is the ‘problematic’ query:

SELECT `oauth2_accesstoken`.`id`, `oauth2_accesstoken`.`user_id`, `oauth2_accesstoken`.`token`, `oauth2_accesstoken`.`client_id`, `oauth2_accesstoken`.`expires`, `oauth2_accesstoken`.`scope` FROM `oauth2_accesstoken` WHERE (`oauth2_accesstoken`.`expires` > %s  AND `oauth2_accesstoken`.`token` = %s )

This query retrieves the user that is related to a token that is not expired or, in other words, a valid user. This is executed on all the requests that need to be authenticated, which is pretty much always. All our requests are authenticated.

So, basically, we have a query whose time to be run is increasing and it is used on each request. Maybe this is not a problem now, but we can assure this will become a big problem  in the future.

If we have a look at the models for this table:

class AccessToken(models.Model):
user = models.ForeignKey(User)
token = models.CharField(max_length=255, default=long_token)
client = models.ForeignKey(Client)
expires = models.DateTimeField(default=get_token_expiry)
scope = models.IntegerField(default=constants.SCOPES[0][0], choices=constants.SCOPES)

And we double-check the database:

mysql> SHOW INDEXES FROM oauth2_accesstoken;
+--------------------+------------+-----------------------------+--------------+-------------+-----------+
| Table              | Non_unique | Key_name                    | Seq_in_index | Column_name | Collation |
+--------------------+------------+-----------------------------+--------------+-------------+-----------+
| oauth2_accesstoken |          0 | PRIMARY                     |            1 | id          | A         |
| oauth2_accesstoken |          1 | oauth2_accesstoken_fbfc09f1 |            1 | user_id     | A         |
| oauth2_accesstoken |          1 | oauth2_accesstoken_4a4e8ffb |            1 | client_id   | A         |
+--------------------+------------+-----------------------------+--------------+-------------+-----------+
4 rows in set (0.10 sec)

…we noticed that we don’t have an index for token!!

This means we are performing a full table scan on every query, on every request, which results in really bad performance.

Just to show how bad it is, I will show some numbers in different situations (I’m using a very low capacity virtual machine, so numbers are scaled, but we can compare them though)

Number of tokens:  100  1k   10k  30k

Time to run query:  1ms  3ms 12ms 20ms

The proper solution to this would be adding a multiple column index for ‘token’ and ‘expires’.
This is possible in django 1.5, using the index-together feature, but not in the previous versions.

A compromise solution could be using just one index on token. Token is a 32-character string. So this index will definitely help, eliminating many rows from consideration, making the query faster.

Another compromise solution could be having 2 separate index, one on token and the other on expires. This doesn’t improve a lot the process time and at the same time increases the database size.

So we choose the 1 index solution as the best of the bad solutions to deal the problem and works on all django versions.

Last, as member of the open source community, ezeep submitted a pull request to the upstream project.

Feel free to add any insights in the comment section below.