Understanding & Implementing Pagination

Pagination is a technique in APIs to manage and retrieve large data sets efficiently. When an API response contains many items, the server divides the data into smaller, manageable subsets called "pages." These pages are then returned one at a time to the client, allowing developers to control the amount of data they retrieve and reducing the impact on performance.

Pagination works in the following manner: When a client makes a request to a server, instead of sending all the data at once, the server sends a "page" with a specific number of results and provides a way for the client to request the next set of data. This reduces the load on the server and improves the user experience by providing faster response times.

There are several ways to implement pagination, but OneTrust uses the following methods: Page-Based Pagination (also known as Offset Pagination) and Request Continuation-Based Pagination (also known as Keyset Pagination or Cursor Pagination). In the context of OneTrust APIs, pagination is commonly employed to handle scenarios where the volume of data is significant, such as retrieving consent transactions or data subject profiles. OneTrust's approaches to pagination are detailed further below.

Page-Based (Offset) Pagination

In Page-Based pagination, or Offset Pagination, the API response includes information about the total number of pages and the current page. Developers in turn use the size and page query parameters to paginate the response, with the size parameter defining the number of results returned on each page and the page parameter defining the page number to return, as detailed in the Requests Parameters table below.

Request Parameters

ParameterDescriptionExample
pageThe page number of the results.
page refers to a subset of the total results returned by the API. When paginating through a large data set, the results are divided into discrete pages to make it more manageable. By default, the first page and default page are both 0. Users can request different pages to view different sets of data.
0
sizeThe number of results per page.
size represents the number of items or entries returned on each page. It determines the size of the subset of data returned with each request. The size parameter allows users to control the granularity of the pagination and tailor it to their specific needs.
2,000
sortThe sort criteria that dictates the order of the results.
sort specifies the order in which the results are presented. Sorting can be based on various criteria such as alphabetical order, numerical order, date, relevance, etc. Users can specify the sorting order and criteria using the sort parameter to organize the data according to their preferences. Generally, the sort parameter will follow the following format: property, direction. Property is the attribute's name, and direction is asc or desc.
name,asc

Code Example

For example, let's take a look at the Get List of Data Subjects API, which is used to retrieve a list of all data subjects. Organizations tend to have a large volume of data subjects, which means that typically the response for this API will return a large number of results to page through. Let's jump in and implement offset pagination using the steps below, which can be repeated for any offset pagination API.

Step 1: Define the Base URL

First, we will construct our base URL for the API endpoint, which will include the initial pagination parameters (page=0, size=1). For demonstrative purposes, we will use these simple values to keep the code blocks as small as possible, and we'll assume the organization has a total of 2 data subjects. When we set page to 0, we are telling the API to retrieve the initial page of results, which is something you should always do. When we set size to 1, we are telling the API to return only 1 result per page. Size is optional, but since we want to depict pagination, we will use 1 and page to pull in the 2 data subjects within the organization.

# Page set to 0; Optional Size set to 1
url = api_host + "/api/consentmanager/v1/datasubjects/profiles?page=0&size=1"
# Page set to 0; Optional Size set to 1
url = "https://app.onetrust.com/api/consentmanager/v1/datasubjects/profiles?page=0&size=1"

📘

In the real world, you will page hundreds, if not thousands, of times to retrieve all of your data subjects, so ensure you account for rate limits and error handling. Think of it like building a bookmark feature to allow you to find your place if something goes wrong. Additionally, select a size that makes sense for your application/script. A larger size will take longer to load but will require fewer requests, and a smaller size will load quickly but will require more requests.

Step 2: Retrieve the First Page of Data & Extract Metadata from the Initial Response

Now that we've defined the base URL, we can make an initial request (seen on the Request tab) to the API endpoint using the url to retrieve the first page of data, which you can see in the Response tab below. This first response is stored in the first_response parameter. Notice when viewing the Response, we have paging parameters such as totalPages, numberOfElements, number, and size. For our example, we set the size to 1 and asked for the first page using a value of 0. As a result, we see 1 data subject is returned under the content array and totalElements tells us there are a total of 2 data subjects.

first_response = requests.get(url, headers=headers).json()
{
    "content": [
        {
            "Id": "0d32a9e9-b56d-44e0-b5c2-313e44bb9b1f",
            "Language": None,
            "Identifier": "+1123749283",
            "LastUpdatedDate": "2024-02-21T16:15:45.84",
            "CreatedDate": "2024-02-21T16:15:45.84",
            "DataElements": [],
            "TestDataSubject": False,
        }
    ],
    "pageable": {
        "sort": {"empty": False, "sorted": True, "unsorted": False},
        "offset": 0,
        "requestContinuation": "AAEAZTU2...uXyilvY=",
        "pageNumber": 0,
        "pageSize": 1,
        "paged": True,
        "unpaged": False,
    },
    "last": False,
    "totalPages": 1,
    "totalElements": 2,
    "size": 1,
    "number": 0,
    "sort": {"empty": False, "sorted": True, "unsorted": False},
    "first": True,
    "numberOfElements": 2,
    "empty": False,
}

When dealing with pagination programmatically, it's best practice to extract data, such as totalPages, size, numberOfElements, etc., directly from the first response. This way, if you passed in a size that is not supported, you can work with the value/size that is being used. Remember that size is optional, so if a size was not provided, you want to grab the default size being used. Now that we've set our variables with values from the first request, we will loop through the pages to obtain all of our data subjects.

# Number of items per page
page_size = first_response["size"]
# Total number of elements across all pages
total_elements = first_response["numberOfElements"]
# Total number of pages
n_pages = first_response["totalPages"]
# Number of items per page
page_size = 1
# Total number of elements across all pages
total_elements = 2
# Total number of pages (pages start at 0, so 1 means 2 pages: page 0 and page 1)
n_pages = 1

📘

Refer to the Response Parameters section below for a full explanation of the parameters included in the response.

Step 3: Use a Pagination Loop to Retrieve All Data

Use a loop to iterate through each page and fetch data accordingly, adjusting the URL parameters for each request.

A loop iterates through each page (n_pages) to retrieve all the data. Inside the loop:

  • The URL for the current page is constructed with the appropriate page number (n) and size.
  • A request is made to fetch data for the current page using the constructed URL.
  • The content of the current page is extracted from the response and appended to the profiles_response list.

Lastly, aggregate the data from all pages into a single response structure for further processing or display. On the Response tab, you will see that 2 data subjects are displayed. In the real world, this would be a much larger number of data subjects, but you can use a similar loop to paginate through all of those records.

# Initialize an empty list to store the aggregated response
profiles_response = []

# Iterate through each page to retrieve all data
for n in range(n_pages):
    # Construct the URL for the current page
    url = api_host + "/api/consentmanager/v1/datasubjects/profiles?page="+str(n)+"&size="+str(page_size)
    
    # Make a request to fetch data for the current page and append it to the profiles_response list
    profiles_response += requests.get(url, headers=headers).json()["content"]

# Print the aggregated response containing data from all pages
print(profiles_response) 
[
    {
        "Id": "0d32a9e9-b56d-44e0-b5c2-313e44bb9b1f",
        "Language": None,
        "Identifier": "+1123749283",
        "LastUpdatedDate": "2024-02-21T16:15:45.84",
        "CreatedDate": "2024-02-21T16:15:45.84",
        "DataElements": [],
        "TestDataSubject": False,
    },
    {
        "Id": "bfbf3ce5-4f95-4494-9053-f12540b0606b",
        "Language": None,
        "Identifier": "[email protected]",
        "LastUpdatedDate": "2024-02-21T16:15:45.84",
        "CreatedDate": "2024-02-21T16:15:45.84",
        "DataElements": [],
        "TestDataSubject": False,
    },
]

Response Parameters

ParameterDescriptionExamples
emptyIndicates whether no results exist on the page.true, false
firstIndicates whether the current page is the first page of the list.true, false
lastIndicates whether the current page is the last page of the list.true, false
numberThe page number of the results.1, 2, 3
numberOfElementsThe number of results on the current page.10, 20, 30
pageableThe configuration parameters of the page.
pageable.offsetThe number of results to exclude from the start of the list.0, 10, 20
pageable.pagedIndicates whether the list is paged.true, false
pageable.pageNumberThe page number of the results.1, 2, 3
pageable.pageSizeThe number of results per page.10, 20, 30
pageable.sortThe sort criteria that dictates the order of the results.
pageable.sort.emptyIndicates whether sort criteria was left undefined.true, false
pageable.sort.sortedIndicates whether the list is sorted.true, false
pageable.sort.unsortedIndicates whether the list is unsorted.true, false
pageable.unpagedIndicates whether the list is not paged.true, false
sizeThe number of results per page.10, 20, 30
sortThe sort criteria that dictates the order of the results.
sort.emptyIndicates whether sort criteria was left undefined.true, false
sort.sortedIndicates whether the results were sorted.true, false
sort.unsortedIndicates whether the results were unsorted.true, false
totalElementsThe total number of results in the list.100, 200, 500
totalPagesThe total number of pages in the list.10, 20, 25

Request Continuation-Based (Keyset / Cursor-Based) Pagination

Request Continuation-Based Pagination, or Keyset / Cursor-Based Pagination, involves using a "cursor" known as a request continuation token to navigate through pages. If the number of records in the response is more than a page, it returns a requestContinuation token in the response. This requestContinuation token should be passed to the next request's body (or header, depending on the API) to paginate. This approach is often used when dealing with real-time or frequently changing data.

Request Parameters

ParameterDescriptionExample
sizeThe number of results per page.
size represents the number of items or entries returned on each page. It determines the size of the subset of data returned with each request. The size parameter allows users to control the granularity of the pagination and tailor it to their specific needs.
2,000
requestContinuationThe token used to paginate a response if the number of records is more than a page.
The requestContinuation token is a reference point (or "pointer") that represents a specific position in the data set. Cursor-based pagination works by returning this pointer to a specific item in the dataset. On subsequent requests, the server will return results after the given pointer (token passed in the request's body or header, depending on the API).
{"compositeToken": "{\"token\": \"...}
sortThe sort criteria that dictates the order of the results.
sort specifies the order in which the results are presented. Sorting can be based on various criteria such as alphabetical order, numerical order, date, relevance, etc. Users can specify the sorting order and criteria using the sort parameter to organize the data according to their preferences. Generally, the sort parameter will follow the following format: property, direction. Property is the attribute's name, and direction is asc or desc.
name,asc

Code Example

For this example, we'll look at the Get List of Transactions API, which is used to retrieve a list of all consent transactions. With organizations continuously ingesting large volumes of consent transactions, the response for this API frequently changes, which is why cursor-based pagination is the method used to paginate through the response. Let's jump in.

Step 1: Define Base URL

First, we will construct our base URL, which will include the initial pagination parameters (size=1). Notice that with this method, we are only using the size parameter and will not be using the page parameter. In cursor-based pagination, you flip the pages using a cursor, not the page number.

# Optional Size set to 1
url = api_host + "/api/consent/v2/transactions?size=1
# Optional Size set to 1
url = "https://app.onetrust.com/api/consent/v2/transactions?size=1"

📘

In the real world, you will page hundreds, if not thousands, of times to retrieve all of your data subjects, so ensure you account for rate limits and error handling. Think of it like building a bookmark feature to allow you to find your place if something goes wrong. Additionally, select a size that makes sense for your application/script. A larger size will take longer to load but will require fewer requests, and a smaller size will load quickly but will require more requests.

Step 2: Retrieve First Page of Data & Extract Metadata from Initial Response

Now that we've defined the base URL, we can make an initial request (seen on the Request tab) to the API endpoint using the url to retrieve the first page of data, which you can see in the Response tab below. This first response is stored in the n_response parameter. Notice when viewing the Response, we have paging parameters such as size, first, last, and requestContinuation. For our example, we set the size to 1, and the first page is returned by default. We see only one transaction returned in the content array, but unlike the paged-based example, we don't know the total number of transactions or pages. We will leverage our cursor, requestContinuation, to page through all the transactions. It's important to note that the first request should always pass a payload with requestContinuation set to None/Null.

payload = {"requestContinuation": None}
n_response = requests.post(url, json=payload, headers=headers).json()
{
    "content": [
        {
            "transactionGuid": "fa2fa4f6-406e-487b-a07d-8d35ecba3b39",
            "guid": "ac9a1096-a290-4699-a804-5782ed68a05e",
            "purposeGuid": "ac9a1096-a290-4699-a804-5782ed68a05e",
            "purposeVersion": 1,
            "expiryDate": None,
            "topics": [],
            "customPreferences": [],
            "transactionType": "CONFIRMED",
            "attributes": {},
            "purposeNote": None,
            "autoGenerated": False,
            "purposeAttachments": [],
            "receiptId": "aca6c7ec-13d1-4d55-a11f-4d1194597a58",
            "collectionPointUUID": "00000000-0000-0000-0000-000000000000",
            "identifier": "[email protected]",
            "consentCreationDate": "2024-02-22T16:21:31.728086165",
            "interactionDate": "2024-02-22T16:21:31.728087965",
            "collectionPointAttributes": None,
        }
    ],
    "pageable": {
        "sort": {"sorted": False, "unsorted": True, "empty": True},
        "offset": 0,
        "requestContinuation": '{"compositeToken":"{\\"token\\":\\"+RID:~...AAAAAAA=\\",\\"range\\":{\\"min\\":\\"\\",\\"max\\":\\"FF\\"}}","orderByItems":[{"item":"2024-02-05T00:05:05.761565896"}],"rid":"iNFkAI-ei-4lBz8AAAAAAA==","inclusive":true}',
        "pageNumber": 0,
        "pageSize": 1,
        "paged": True,
        "unpaged": False,
    },
    "size": 1,
    "number": 0,
    "sort": {"sorted": False, "unsorted": True, "empty": True},
    "numberOfElements": 1,
    "first": True,
    "last": False,
    "empty": False,
}

When dealing with pagination programmatically, it's best practice to extract data, such as size and last, directly from the response. This way, if you passed in a size that is not supported, you can work with the value/size that is being used. Since we don't know the number of pages or transactions, we will use the last parameter to monitor once we've reached the last page.

page_size = n_response["size"]
last = n_response["last"]
page_size = 1
last = False

📘

Refer to the Response Parameters section below for a full explanation of the parameters included in the response.

Step 3: Handle Last Page Logic

To handle last page logic, we need to create our loop to paginate all the transactions. But, what happens if the first page is the last page? To account for this, we will start by first checking if each page is the last page before trying to grab more transactions. If it is the last page, we will just get the data from this page and store it.

if last:
  transactions = n_response["content"]
else:
  # Loop Logic, refer to Step 4 below

Step 4: Pagination Loop to Retrieve All Data

Use a loop to iterate through each page and fetch data accordingly, adjusting the payload with the new requestContinuation parameter for each request. This is our cursor which tells the new request where we left off last time and to return the next set of transactions.

📘

If the number of records in the response is more than a page, it returns a requestContinuation token in the response. This requestContinuation token should be passed to the next request to paginate. If passing the requestContinuation in the body and it is the first request, you must submit a null value.

A loop iterates through each page to retrieve all the data. Inside the loop:

  • The content of the current page is extracted from the response and appended to the transactions list.

Lastly, aggregate the data from all pages into a single response structure for further processing or display. On the Response tab, you will see that 2 transactions are displayed. In the real world, this would be a large number of records, but you can use a similar loop to paginate through all of the records.

else:
    while last == False:
        n_url = api_host + "/api/consent/v2/transactions?size=1"
        n_response = requests.post(n_url, headers=headers, json=payload).json()
        transactions  = transactions + n_response["content"]
        last = n_response["last"]
        payload = {"requestContinuation": n_response["pageable"]["requestContinuation"]}
print("Number of Transactions: ",len(transactions))  
print(transactions)
[
    {
        "transactionGuid": "fa2fa4f6-406e-487b-a07d-8d35ecba3b39",
        "guid": "ac9a1096-a290-4699-a804-5782ed68a05e",
        "purposeGuid": "ac9a1096-a290-4699-a804-5782ed68a05e",
        "purposeVersion": 1,
        "expiryDate": None,
        "topics": [],
        "customPreferences": [],
        "transactionType": "CONFIRMED",
        "attributes": {},
        "purposeNote": None,
        "autoGenerated": False,
        "purposeAttachments": [],
        "receiptId": "aca6c7ec-13d1-4d55-a11f-4d1194597a58",
        "collectionPointUUID": "00000000-0000-0000-0000-000000000000",
        "identifier": "[email protected]",
        "consentCreationDate": "2024-02-22T16:21:31.728086165",
        "interactionDate": "2024-02-22T16:21:31.728087965",
        "collectionPointAttributes": None,
    },
    {
        "transactionGuid": "fa2fa4f6-406e-487b-a07d-8d35ecba3b40",
        "guid": "ac9a1096-a290-4699-a804-5782ed68a05d",
        "purposeGuid": "ac9a1096-a290-4699-a804-5782ed68a05e",
        "purposeVersion": 1,
        "expiryDate": None,
        "topics": [],
        "customPreferences": [],
        "transactionType": "CONFIRMED",
        "attributes": {},
        "purposeNote": None,
        "autoGenerated": False,
        "purposeAttachments": [],
        "receiptId": "aca6c7ec-13d1-4d55-a11f-4d1194597a58",
        "collectionPointUUID": "00000000-0000-0000-0000-000000000000",
        "identifier": "[email protected]",
        "consentCreationDate": "2024-02-22T16:21:31.728086165",
        "interactionDate": "2024-02-22T16:21:31.728087965",
        "collectionPointAttributes": None,
    },
]

Response Parameters

ParameterDescriptionExamples
emptyIndicates whether no results exist on the page.true, false
firstIndicates whether the current page is the first page of the list.true, false
lastIndicates whether the current page is the last page of the list.true, false
numberThe page number of the results.1, 2, 3
numberOfElementsThe number of results on the current page.10, 20, 30
pageableThe configuration parameters of the page.
pageable.pageSizeThe number of results per page.10, 20, 30
pageable.requestContinuationRequest continuation token used to paginate. If the number of records in the response is more than a page, it returns a requestContinuation token in the response. This requestContinuation token should be passed to the next request's body (or header, depending on the API) to paginate.{"compositeToken": "{\"token\": \"...}
pageable.sortThe sort criteria that dictates the order of the results.
pageable.sort.emptyIndicates whether sort criteria was left undefined.true, false
pageable.sort.sortedIndicates whether the list is sorted.true, false
pageable.sort.unsortedIndicates whether the list is unsorted.true, false
pageable.unpagedIndicates whether the list is not paged.true, false
sizeThe number of results per page.10, 20, 30
sortThe sort criteria that dictates the order of the results.
sort.emptyIndicates whether sort criteria was left undefined.true, false
sort.sortedIndicates whether the results were sorted.true, false
sort.unsortedIndicates whether the results were unsorted.true, false