Avoid Counting in Django Pagination

0 0 4 minutes read

This article looks at how to avoid the object count query in Django pagination.

If you are using Postgres and would like to use an approximate count instead of entirely removing it, check out Approximate Counting in Django and Postgres.

—

Django’s default Paginator relies on the object COUNT(*) query to calculate the number of pages. As your database grows, this query can become considerably slow and may end up causing most of the page load time — sometimes over 90%.

In this article, we’ll implement a paginator that skips the object COUNT(*) query. The implemented paginator assumes that knowing the number of pages isn’t essential for your use case. In other words, the paginator won’t know how many pages there are.

Project Setup

To make the tutorial easier to follow along, I’ve prepared a simple Django app. All the app does is store log messages in the database and returns them in a paginated API response.

If you want, you can follow along with your own Django project. However, your project might be a bit more difficult to benchmark.

First, clone the base branch of the GitHub repo:

$ git clone https://github.com/duplxey/django-count-avoid.git \
    --single-branch --branch base && cd django-count-avoid

Next, create a new virtual environment and activate it:

$ python3 -m venv venv && source venv/bin/activate

Install the requirements and migrate the database:

(venv)$ pip install -r requirements.txt
(venv)$ python manage.py migrate

Populate the database:

$ python manage.py populate_db

Each run of the populate command will add 2.5 million records to the database. I suggest you run the command two or three times.

The command also creates a superuser with the following creds: admin:password.

Run the server:

(venv)$ python manage.py runserver

Great, you’ve successfully setup the project.

Initial Benchmark

To benchmark the API, we’ll use the Django Silk package. Django Silk is a simple tool for profiling and monitoring Django projects. It tracks database query times and view load times and helps identify bottlenecks in your project.

Silk is already installed if you’re using the suggested log management project.

First, open your favorite web browser and navigate to http://localhost:8000/api/.

You’ll notice a relatively long load time.

Next, navigate to http://localhost:8000/silk/requests/ to see the requests’ statistics.

Ouch, the request took 867 milliseconds, 855 of which were spent on SQL queries.

Selecting “SQL” in the navigation bar shows that the SQL query that took the most time was SELECT COUNT(*), which took 854 milliseconds.

Countless Paginator

Let’s implement a paginator that doesn’t require the SELECT COUNT(*) query.

To do that, we’ll define two classes:

A CountlessPage class (based on Django’s Page class)
A CountlessPaginator class (based on Django’s Paginator class)

Create a paginators.py file within the logs app and put the following code inside:

# logs/paginators.py

import collections

from django.core.paginator import EmptyPage, PageNotAnInteger
from django.utils.translation import gettext_lazy as _


class CountlessPage(collections.abc.Sequence):
    def __init__(self, object_list, number, page_size):
        self.object_list = object_list
        self.number = number
        self.page_size = page_size

        if not isinstance(self.object_list, list):
            self.object_list = list(self.object_list)

        self._has_next = \
            len(self.object_list) > len(self.object_list[: self.page_size])
        self._has_previous = self.number > 1

    def __repr__(self):
        return "<Page %s>" % self.number

    def __len__(self):
        return len(self.object_list)

    def __getitem__(self, index):
        if not isinstance(index, (int, slice)):
            raise TypeError
        return self.object_list[index]

    def has_next(self):
        return self._has_next

    def has_previous(self):
        return self._has_previous

    def has_other_pages(self):
        return self.has_next() or self.has_previous()

    def next_page_number(self):
        if self.has_next():
            return self.number + 1
        else:
            raise EmptyPage(_("Next page does not exist"))

    def previous_page_number(self):
        if self.has_previous():
            return self.number - 1
        else:
            raise EmptyPage(_("Previous page does not exist"))


class CountlessPaginator:
    def __init__(self, object_list, per_page) -> None:
        self.object_list = object_list
        self.per_page = per_page

    def validate_number(self, number):
        try:
            if isinstance(number, float) and not number.is_integer():
                raise ValueError
            number = int(number)
        except (TypeError, ValueError):
            raise PageNotAnInteger(_("Page number is not an integer"))
        if number < 1:
            raise EmptyPage(_("Page number is less than 1"))
        return number

    def get_page(self, number):
        try:
            number = self.validate_number(number)
        except (PageNotAnInteger, EmptyPage):
            number = 1
        return self.page(number)

    def page(self, number):
        number = self.validate_number(number)
        bottom = (number - 1) * self.per_page
        top = bottom + self.per_page + 1
        return CountlessPage(self.object_list[bottom:top], number, self.per_page)

Both classes are primarily based on Django’s source code. The only two differences are that they skip the count query and use a different method to check whether the next and previous pages exist.

To determine whether the next page exists, we try to pass an extra object from the paginator to the page. In the page, we slice the object list back to its original size and check if an extra object is present. If so, the next page has at least one object; therefore, it exists.

In contrast, for the previous page, we just check if the page number is number > 1.

Next, use the CountlessPaginator in the index_view in logs/views.py like so:

# logs/views.py

def index_view(request):
    logs = Log.objects.all()
    paginator = CountlessPaginator(logs, 25)  # modified
    page_number = request.GET.get("page")
    page_obj = paginator.get_page(page_number)

    return JsonResponse(
        {
            "has_next": page_obj.has_next(),
            "has_previous": page_obj.has_previous(),
            "results": [log.to_json() for log in page_obj],
        }
    )

Don’t forget about the import:

from logs.paginators import CountlessPaginator

Benchmark the app again.

The request now only takes 12 milliseconds. That’s around 70 times faster than before.

Conclusion

In this article, we’ve looked at implementing a paginator that doesn’t rely on counting objects.

By swapping the default Django paginator with a custom one, we significantly improved the paginated API response time. In our case, the response time went from around 800 milliseconds to 12 milliseconds.

For more information, check out the following resources: