Pagination is a technique in APIs to manage and retrieve large data sets efficiently. When an API response contains many items, the server divides the data into smaller, manageable subsets called "pages." These pages are then returned one at a time to the client, allowing developers to control the amount of data they retrieve and reducing the impact on performance.
Pagination works in the following manner: When a client makes a request to a server, instead of sending all the data at once, the server sends a "page" with a specific number of results and provides a way for the client to request the next set of data. This reduces the load on the server and improves the user experience by providing faster response times.
There are several ways to implement pagination, but OneTrust uses the following methods: Page-Based Pagination (also known as Offset Pagination) and Request Continuation-Based Pagination (also known as Keyset Pagination or Cursor Pagination). In the context of OneTrust APIs, pagination is commonly employed to handle scenarios where the volume of data is significant, such as retrieving consent transactions or data subject profiles. OneTrust's approaches to pagination are detailed further below.
Page-Based (Offset) Pagination
In Page-Based pagination, or Offset Pagination, the API response includes information about the total number of pages and the current page. Developers in turn use the size
and page
query parameters to paginate the response, with the size
parameter defining the number of results returned on each page and the page
parameter defining the page number to return, as detailed in the Requests Parameters table below.
Request Parameters
Parameter | Description | Example |
---|---|---|
page | The page number of the results.page refers to a subset of the total results returned by the API. When paginating through a large data set, the results are divided into discrete pages to make it more manageable. By default, the first page and default page are both 0. Users can request different pages to view different sets of data. | 0 |
size | The number of results per page.size represents the number of items or entries returned on each page. It determines the size of the subset of data returned with each request. The size parameter allows users to control the granularity of the pagination and tailor it to their specific needs. | 2,000 |
sort | The sort criteria that dictates the order of the results.sort specifies the order in which the results are presented. Sorting can be based on various criteria such as alphabetical order, numerical order, date, relevance, etc. Users can specify the sorting order and criteria using the sort parameter to organize the data according to their preferences. Generally, the sort parameter will follow the following format: property, direction. Property is the attribute's name, and direction is asc or desc. | name,asc |
Code Example
For example, let's take a look at the Get List of Data Subjects API, which is used to retrieve a list of all data subjects. Organizations tend to have a large volume of data subjects, which means that typically the response for this API will return a large number of results to page through. Let's jump in and implement offset pagination using the steps below, which can be repeated for any offset pagination API.
Step 1: Define the Base URL
First, we will construct our base URL for the API endpoint, which will include the initial pagination parameters (page=0, size=1). For demonstrative purposes, we will use these simple values to keep the code blocks as small as possible, and we'll assume the organization has a total of 2 data subjects. When we set page
to 0, we are telling the API to retrieve the initial page of results, which is something you should always do. When we set size
to 1, we are telling the API to return only 1 result per page. Size is optional, but since we want to depict pagination, we will use 1 and page to pull in the 2 data subjects within the organization.
# Page set to 0; Optional Size set to 1
url = api_host + "/api/consentmanager/v1/datasubjects/profiles?page=0&size=1"
# Page set to 0; Optional Size set to 1
url = "https://app.onetrust.com/api/consentmanager/v1/datasubjects/profiles?page=0&size=1"
In the real world, you will page hundreds, if not thousands, of times to retrieve all of your data subjects, so ensure you account for rate limits and error handling. Think of it like building a bookmark feature to allow you to find your place if something goes wrong. Additionally, select a size that makes sense for your application/script. A larger size will take longer to load but will require fewer requests, and a smaller size will load quickly but will require more requests.
Step 2: Retrieve the First Page of Data & Extract Metadata from the Initial Response
Now that we've defined the base URL, we can make an initial request (seen on the Request tab) to the API endpoint using the url
to retrieve the first page of data, which you can see in the Response tab below. This first response is stored in the first_response
parameter. Notice when viewing the Response, we have paging parameters such as totalPages
, numberOfElements
, number
, and size
. For our example, we set the size to 1 and asked for the first page using a value of 0. As a result, we see 1 data subject is returned under the content
array and totalElements
tells us there are a total of 2 data subjects.
first_response = requests.get(url, headers=headers).json()
{
"content": [
{
"Id": "0d32a9e9-b56d-44e0-b5c2-313e44bb9b1f",
"Language": None,
"Identifier": "+1123749283",
"LastUpdatedDate": "2024-02-21T16:15:45.84",
"CreatedDate": "2024-02-21T16:15:45.84",
"DataElements": [],
"TestDataSubject": False,
}
],
"pageable": {
"sort": {"empty": False, "sorted": True, "unsorted": False},
"offset": 0,
"requestContinuation": "AAEAZTU2...uXyilvY=",
"pageNumber": 0,
"pageSize": 1,
"paged": True,
"unpaged": False,
},
"last": False,
"totalPages": 1,
"totalElements": 2,
"size": 1,
"number": 0,
"sort": {"empty": False, "sorted": True, "unsorted": False},
"first": True,
"numberOfElements": 2,
"empty": False,
}
When dealing with pagination programmatically, it's best practice to extract data, such as totalPages
, size
, numberOfElements
, etc., directly from the first response. This way, if you passed in a size that is not supported, you can work with the value/size that is being used. Remember that size
is optional, so if a size was not provided, you want to grab the default size being used. Now that we've set our variables with values from the first request, we will loop through the pages to obtain all of our data subjects.
# Number of items per page
page_size = first_response["size"]
# Total number of elements across all pages
total_elements = first_response["numberOfElements"]
# Total number of pages
n_pages = first_response["totalPages"]
# Number of items per page
page_size = 1
# Total number of elements across all pages
total_elements = 2
# Total number of pages (pages start at 0, so 1 means 2 pages: page 0 and page 1)
n_pages = 1
Refer to the Response Parameters section below for a full explanation of the parameters included in the response.
Step 3: Use a Pagination Loop to Retrieve All Data
Use a loop to iterate through each page and fetch data accordingly, adjusting the URL parameters for each request.
A loop iterates through each page (n_pages
) to retrieve all the data. Inside the loop:
- The URL for the current page is constructed with the appropriate page number (
n
) andsize
. - A request is made to fetch data for the current page using the constructed URL.
- The content of the current page is extracted from the response and appended to the
profiles_response
list.
Lastly, aggregate the data from all pages into a single response structure for further processing or display. On the Response tab, you will see that 2 data subjects are displayed. In the real world, this would be a much larger number of data subjects, but you can use a similar loop to paginate through all of those records.
# Initialize an empty list to store the aggregated response
profiles_response = []
# Iterate through each page to retrieve all data
for n in range(n_pages):
# Construct the URL for the current page
url = api_host + "/api/consentmanager/v1/datasubjects/profiles?page="+str(n)+"&size="+str(page_size)
# Make a request to fetch data for the current page and append it to the profiles_response list
profiles_response += requests.get(url, headers=headers).json()["content"]
# Print the aggregated response containing data from all pages
print(profiles_response)
[
{
"Id": "0d32a9e9-b56d-44e0-b5c2-313e44bb9b1f",
"Language": None,
"Identifier": "+1123749283",
"LastUpdatedDate": "2024-02-21T16:15:45.84",
"CreatedDate": "2024-02-21T16:15:45.84",
"DataElements": [],
"TestDataSubject": False,
},
{
"Id": "bfbf3ce5-4f95-4494-9053-f12540b0606b",
"Language": None,
"Identifier": "[email protected]",
"LastUpdatedDate": "2024-02-21T16:15:45.84",
"CreatedDate": "2024-02-21T16:15:45.84",
"DataElements": [],
"TestDataSubject": False,
},
]
Response Parameters
Parameter | Description | Examples |
---|---|---|
empty | Indicates whether no results exist on the page. | true, false |
first | Indicates whether the current page is the first page of the list. | true, false |
last | Indicates whether the current page is the last page of the list. | true, false |
number | The page number of the results. | 1, 2, 3 |
numberOfElements | The number of results on the current page. | 10, 20, 30 |
pageable | The configuration parameters of the page. | |
pageable.offset | The number of results to exclude from the start of the list. | 0, 10, 20 |
pageable.paged | Indicates whether the list is paged. | true, false |
pageable.pageNumber | The page number of the results. | 1, 2, 3 |
pageable.pageSize | The number of results per page. | 10, 20, 30 |
pageable.sort | The sort criteria that dictates the order of the results. | |
pageable.sort.empty | Indicates whether sort criteria was left undefined. | true, false |
pageable.sort.sorted | Indicates whether the list is sorted. | true, false |
pageable.sort.unsorted | Indicates whether the list is unsorted. | true, false |
pageable.unpaged | Indicates whether the list is not paged. | true, false |
size | The number of results per page. | 10, 20, 30 |
sort | The sort criteria that dictates the order of the results. | |
sort.empty | Indicates whether sort criteria was left undefined. | true, false |
sort.sorted | Indicates whether the results were sorted. | true, false |
sort.unsorted | Indicates whether the results were unsorted. | true, false |
totalElements | The total number of results in the list. | 100, 200, 500 |
totalPages | The total number of pages in the list. | 10, 20, 25 |
Request Continuation-Based (Keyset / Cursor-Based) Pagination
Request Continuation-Based Pagination, or Keyset / Cursor-Based Pagination, involves using a "cursor" known as a request continuation token to navigate through pages. If the number of records in the response is more than a page, it returns a requestContinuation
token in the response. This requestContinuation
token should be passed to the next request's body (or header, depending on the API) to paginate. This approach is often used when dealing with real-time or frequently changing data.
Request Parameters
Parameter | Description | Example |
---|---|---|
size | The number of results per page.size represents the number of items or entries returned on each page. It determines the size of the subset of data returned with each request. The size parameter allows users to control the granularity of the pagination and tailor it to their specific needs. | 2,000 |
requestContinuation | The token used to paginate a response if the number of records is more than a page. The requestContinuation token is a reference point (or "pointer") that represents a specific position in the data set. Cursor-based pagination works by returning this pointer to a specific item in the dataset. On subsequent requests, the server will return results after the given pointer (token passed in the request's body or header, depending on the API). | {"compositeToken": "{\"token\": \"...} |
sort | The sort criteria that dictates the order of the results.sort specifies the order in which the results are presented. Sorting can be based on various criteria such as alphabetical order, numerical order, date, relevance, etc. Users can specify the sorting order and criteria using the sort parameter to organize the data according to their preferences. Generally, the sort parameter will follow the following format: property, direction. Property is the attribute's name, and direction is asc or desc. | name,asc |
Code Example
For this example, we'll look at the Get List of Transactions API, which is used to retrieve a list of all consent transactions. With organizations continuously ingesting large volumes of consent transactions, the response for this API frequently changes, which is why cursor-based pagination is the method used to paginate through the response. Let's jump in.
Step 1: Define Base URL
First, we will construct our base URL, which will include the initial pagination parameters (size=1). Notice that with this method, we are only using the size
parameter and will not be using the page
parameter. In cursor-based pagination, you flip the pages using a cursor, not the page number.
# Optional Size set to 1
url = api_host + "/api/consent/v2/transactions?size=1
# Optional Size set to 1
url = "https://app.onetrust.com/api/consent/v2/transactions?size=1"
In the real world, you will page hundreds, if not thousands, of times to retrieve all of your data subjects, so ensure you account for rate limits and error handling. Think of it like building a bookmark feature to allow you to find your place if something goes wrong. Additionally, select a size that makes sense for your application/script. A larger size will take longer to load but will require fewer requests, and a smaller size will load quickly but will require more requests.
Step 2: Retrieve First Page of Data & Extract Metadata from Initial Response
Now that we've defined the base URL, we can make an initial request (seen on the Request tab) to the API endpoint using the url
to retrieve the first page of data, which you can see in the Response tab below. This first response is stored in the n_response
parameter. Notice when viewing the Response, we have paging parameters such as size
, first
, last
, and requestContinuation
. For our example, we set the size to 1, and the first page is returned by default. We see only one transaction returned in the content
array, but unlike the paged-based example, we don't know the total number of transactions or pages. We will leverage our cursor, requestContinuation
, to page through all the transactions. It's important to note that the first request should always pass a payload with requestContinuation
set to None/Null.
payload = {"requestContinuation": None}
n_response = requests.post(url, json=payload, headers=headers).json()
{
"content": [
{
"transactionGuid": "fa2fa4f6-406e-487b-a07d-8d35ecba3b39",
"guid": "ac9a1096-a290-4699-a804-5782ed68a05e",
"purposeGuid": "ac9a1096-a290-4699-a804-5782ed68a05e",
"purposeVersion": 1,
"expiryDate": None,
"topics": [],
"customPreferences": [],
"transactionType": "CONFIRMED",
"attributes": {},
"purposeNote": None,
"autoGenerated": False,
"purposeAttachments": [],
"receiptId": "aca6c7ec-13d1-4d55-a11f-4d1194597a58",
"collectionPointUUID": "00000000-0000-0000-0000-000000000000",
"identifier": "[email protected]",
"consentCreationDate": "2024-02-22T16:21:31.728086165",
"interactionDate": "2024-02-22T16:21:31.728087965",
"collectionPointAttributes": None,
}
],
"pageable": {
"sort": {"sorted": False, "unsorted": True, "empty": True},
"offset": 0,
"requestContinuation": '{"compositeToken":"{\\"token\\":\\"+RID:~...AAAAAAA=\\",\\"range\\":{\\"min\\":\\"\\",\\"max\\":\\"FF\\"}}","orderByItems":[{"item":"2024-02-05T00:05:05.761565896"}],"rid":"iNFkAI-ei-4lBz8AAAAAAA==","inclusive":true}',
"pageNumber": 0,
"pageSize": 1,
"paged": True,
"unpaged": False,
},
"size": 1,
"number": 0,
"sort": {"sorted": False, "unsorted": True, "empty": True},
"numberOfElements": 1,
"first": True,
"last": False,
"empty": False,
}
When dealing with pagination programmatically, it's best practice to extract data, such as size
and last
, directly from the response. This way, if you passed in a size that is not supported, you can work with the value/size that is being used. Since we don't know the number of pages or transactions, we will use the last
parameter to monitor once we've reached the last page.
page_size = n_response["size"]
last = n_response["last"]
page_size = 1
last = False
Refer to the Response Parameters section below for a full explanation of the parameters included in the response.
Step 3: Handle Last Page Logic
To handle last page logic, we need to create our loop to paginate all the transactions. But, what happens if the first page is the last page? To account for this, we will start by first checking if each page is the last page before trying to grab more transactions. If it is the last page, we will just get the data from this page and store it.
if last:
transactions = n_response["content"]
else:
# Loop Logic, refer to Step 4 below
Step 4: Pagination Loop to Retrieve All Data
Use a loop to iterate through each page and fetch data accordingly, adjusting the payload with the new requestContinuation
parameter for each request. This is our cursor which tells the new request where we left off last time and to return the next set of transactions.
If the number of records in the response is more than a page, it returns a
requestContinuation
token in the response. ThisrequestContinuation
token should be passed to the next request to paginate. If passing therequestContinuation
in the body and it is the first request, you must submit a null value.
A loop iterates through each page to retrieve all the data. Inside the loop:
- The content of the current page is extracted from the response and appended to the
transactions
list.
Lastly, aggregate the data from all pages into a single response structure for further processing or display. On the Response tab, you will see that 2 transactions are displayed. In the real world, this would be a large number of records, but you can use a similar loop to paginate through all of the records.
else:
while last == False:
n_url = api_host + "/api/consent/v2/transactions?size=1"
n_response = requests.post(n_url, headers=headers, json=payload).json()
transactions = transactions + n_response["content"]
last = n_response["last"]
payload = {"requestContinuation": n_response["pageable"]["requestContinuation"]}
print("Number of Transactions: ",len(transactions))
print(transactions)
[
{
"transactionGuid": "fa2fa4f6-406e-487b-a07d-8d35ecba3b39",
"guid": "ac9a1096-a290-4699-a804-5782ed68a05e",
"purposeGuid": "ac9a1096-a290-4699-a804-5782ed68a05e",
"purposeVersion": 1,
"expiryDate": None,
"topics": [],
"customPreferences": [],
"transactionType": "CONFIRMED",
"attributes": {},
"purposeNote": None,
"autoGenerated": False,
"purposeAttachments": [],
"receiptId": "aca6c7ec-13d1-4d55-a11f-4d1194597a58",
"collectionPointUUID": "00000000-0000-0000-0000-000000000000",
"identifier": "[email protected]",
"consentCreationDate": "2024-02-22T16:21:31.728086165",
"interactionDate": "2024-02-22T16:21:31.728087965",
"collectionPointAttributes": None,
},
{
"transactionGuid": "fa2fa4f6-406e-487b-a07d-8d35ecba3b40",
"guid": "ac9a1096-a290-4699-a804-5782ed68a05d",
"purposeGuid": "ac9a1096-a290-4699-a804-5782ed68a05e",
"purposeVersion": 1,
"expiryDate": None,
"topics": [],
"customPreferences": [],
"transactionType": "CONFIRMED",
"attributes": {},
"purposeNote": None,
"autoGenerated": False,
"purposeAttachments": [],
"receiptId": "aca6c7ec-13d1-4d55-a11f-4d1194597a58",
"collectionPointUUID": "00000000-0000-0000-0000-000000000000",
"identifier": "[email protected]",
"consentCreationDate": "2024-02-22T16:21:31.728086165",
"interactionDate": "2024-02-22T16:21:31.728087965",
"collectionPointAttributes": None,
},
]
Response Parameters
Parameter | Description | Examples |
---|---|---|
empty | Indicates whether no results exist on the page. | true, false |
first | Indicates whether the current page is the first page of the list. | true, false |
last | Indicates whether the current page is the last page of the list. | true, false |
number | The page number of the results. | 1, 2, 3 |
numberOfElements | The number of results on the current page. | 10, 20, 30 |
pageable | The configuration parameters of the page. | |
pageable.pageSize | The number of results per page. | 10, 20, 30 |
pageable.requestContinuation | Request continuation token used to paginate. If the number of records in the response is more than a page, it returns a requestContinuation token in the response. This requestContinuation token should be passed to the next request's body (or header, depending on the API) to paginate. | {"compositeToken": "{\"token\": \"...} |
pageable.sort | The sort criteria that dictates the order of the results. | |
pageable.sort.empty | Indicates whether sort criteria was left undefined. | true, false |
pageable.sort.sorted | Indicates whether the list is sorted. | true, false |
pageable.sort.unsorted | Indicates whether the list is unsorted. | true, false |
pageable.unpaged | Indicates whether the list is not paged. | true, false |
size | The number of results per page. | 10, 20, 30 |
sort | The sort criteria that dictates the order of the results. | |
sort.empty | Indicates whether sort criteria was left undefined. | true, false |
sort.sorted | Indicates whether the results were sorted. | true, false |
sort.unsorted | Indicates whether the results were unsorted. | true, false |