The worker node services APIs target Data Discovery worker nodes. These endpoints interact directly with the worker node and allow access to service integrations hosted by the worker nodes. Use the Custom Scan APIs to retrieve custom scan jobs, submit metadata for cataloging, and submit content for classification. This guide will walk through all of the required steps to use the Custom Scan APIs.
Requirements
- Configure Ingress for Worker Node: for more information, see API Access for Worker Node Management.
- Client Credentials: For more information on creating OAuth 2.0 Client Credentials, see Managing OAuth 2.0 Client Credentials.
- Generate a credential for the custom scanner. Download the credential file containing the
client_id
andsecret
. These values will be used for authentication and authorization for the custom scanner to prove its legitimacy.
- Generate a credential for the custom scanner. Download the credential file containing the
- Custom Connector: (Worker Node Scan) Data Source
- Give your data source a name, set up your credentials, and choose the default settings or opt for an advanced configuration. Once configured, activate the data source. Record the Data Source ID for the custom connector, which can be obtained from the URL (e.g., https://{hostname}/data-discovery/data-sources/wizard-details/v2/{Data_Source_ID}/summary) in the OneTrust application or using the corresponding
id
value from the Get Data Sources API.
- Give your data source a name, set up your credentials, and choose the default settings or opt for an advanced configuration. Once configured, activate the data source. Record the Data Source ID for the custom connector, which can be obtained from the URL (e.g., https://{hostname}/data-discovery/data-sources/wizard-details/v2/{Data_Source_ID}/summary) in the OneTrust application or using the corresponding
Custom Scan Workflows: How to Steps
-
Use the Generate Access Token API to retrieve an access token.
-
Use the Get List of Scan Jobs API to query the worker node for pending scan jobs for a given data source. Make a note of the Job Identifier in the response body. The
jobId
will be used in the following steps.datasourceId
is the Custom Connector: (Worker Node Scan) Data Source identifier from the requirements above. This value can be obtained using the Get Data Sources API.- Optionally, extract the client credential reference from the
credentialReference
response parameter.
-
Once a scan job is available, use the Update Scan Job Status API to update the scan job's status from
PENDING
toIN_PROGRESS
.{ "status": "IN_PROGRESS" }
-
Use the Submit Data to Catalog API to submit catalog metadata for entities in your target data source.
{ "metadataRequestList": [ { "datasourceId": "", "jobId": 0, "parentEntityName": "Teradata-DS", "parentType": "DATASOURCE", "parentXPath": [], "metadataList": [ { "entityName": "Company", "entityType": "DATABASE", "childrenCount": 1, "description": "Entity Description.", "owner": "OT", "productVersion": "1", "productName": "SQL DB", "productFamily": "MySQL" } ] } ] }
{ "metadataRequestList": [ { "datasourceId": "", "jobId": 0, "parentEntityName": "Company", "parentType": "DATABASE", "parentXPath": [ "Company" ], "metadataList": [ { "entityName": "TEST-SCHEMA", "entityType": "SCHEMA", "childrenCount": 1, "description": "Entity Description.", "version": 1 } ] } ] }
{ "metadataRequestList": [ { "datasourceId": "", "jobId": 0, "parentEntityName": "TEST-SCHEMA", "parentType": "SCHEMA", "parentXPath": [ "Company", "TEST-SCHEMA" ], "metadataList": [ { "entityName": "Employee", "entityType": "OBJECT", "childrenCount": 3, "description": "Entity Description.", "rowsEstimate": 1000, "sizeEstimate": 120, "estimateTimestamp": 12345678, "rowsCalculated": 1000, "sizeCalculated": 120, "calculatedTimestamp": 12345678, "comments": "sample_comment_about_employee_table" } ] } ] }
{ "metadataRequestList": [ { "datasourceId": "", "jobId": 0, "parentEntityName": "TEST-SCHEMA", "parentType": "SCHEMA", "parentXPath": [ "Company", "TEST-SCHEMA" ], "metadataList": [ { "entityName": "Employee", "entityType": "TABLE", "childrenCount": 3, "description": "Entity Description.", "rowsEstimate": 1000, "sizeEstimate": 120, "estimateTimestamp": 12345678, "rowsCalculated": 1000, "sizeCalculated": 120, "calculatedTimestamp": 12345678, "comments": "sample_comment_about_employee_table" } ] } ] }
{ "metadataRequestList": [ { "datasourceId": "", "jobId": 0, "parentEntityName": "Employee", "parentType": "OBJECT", "parentXPath": [ "Company", "TEST-SCHEMA", "Employee" ], "metadataList": [ { "entityName": "first_name", "entityType": "FIELD", "description": "Entity Description.", "columnIndex": 1, "nullable": true, "columnDescription": "contains first_name", "encryption": "AES", "sizeEstimate": 10, "averageSizeCalculated": 15, "calculatedTimestamp": 12345 } ] } ] }
{ "metadataRequestList": [ { "datasourceId": "", "jobId": 0, "parentEntityName": "Employee", "parentType": "TABLE", "parentXPath": [ "Company", "TEST-SCHEMA", "Employee" ], "metadataList": [ { "entityName": "first_name", "entityType": "COLUMN", "description": "Entity Description.", "columnIndex": 1, "nullable": true, "columnDescription": "contains first_name", "encryption": "AES", "sizeEstimate": 10, "averageSizeCalculated": 15, "calculatedTimestamp": 12345 }, { "entityName": "last_name", "entityType": "COLUMN", "description": "Entity Description.", "columnIndex": 2, "nullable": true, "columnDescription": "contains last_name", "encryption": "AES", "sizeEstimate": 10, "averageSizeCalculated": 15, "calculatedTimestamp": 12345 }, { "entityName": "email", "entityType": "COLUMN", "description": "Entity Description.", "columnIndex": 3, "nullable": true, "columnDescription": "contains email", "encryption": "AES", "sizeEstimate": 10, "averageSizeCalculated": 15, "calculatedTimestamp": 12345 } ] } ] }
{ "metadataRequestList": [ { "datasourceId": "", "jobId": 0, "parentEntityName": "SMB-DS", "parentType": "DATASOURCE", "parentXPath": [ ], "metadataList": [ { "entityName": "TEST-FOLDER", "entityType": "FOLDER", "childrenCount": 1, "description": "Entity Description.", "sizeOnDisk": 372, "dataTotalSize": 0, "createdAt": 0, "lastModified": 0, "modifiedByUser": "user", "createdByUser": "user", "accessedTimestamp": 0, "fileOwner": "user" "encryption": "", "sharedAccess": "SHARED_EXTERNAL", "url": "", "eTag": "" } ] } ] }
{ "metadataRequestList": [ { "datasourceId": "", "jobId": 0, "parentEntityName": "TEST-FOLDER", "parentType": "FOLDER", "parentXPath": [ ], "metadataList": [ { "entityName": "TEST-FILE", "entityType": "FILE", "childrenCount": 1, "description": "Entity Description.", "sizeOnDisk": 372, "dataTotalSize": 0, "createdAt": 0, "lastModified": 0, "modifiedByUser": "user", "createdByUser": "user", "accessedTimestamp": 0, "fileOwner": "user" "encryption": "", "sharedAccess": "SHARED_EXTERNAL", "url": "", "eTag": "" } ] } ] }
-
For entities with content, use the Submit Data to Classify API to submit the content for classification.
{ "jobId": 10, "datasourceId": "", "totalCount" : 1000, "content": [ { "entityName": "first_name", "entityType": "COLUMN", "parentEntityName": "Employee", "parentEntityType": "TABLE", "parentXpath": [ "Company", "Employee" ], "data": [ "Donald" ], "endOfContent": true }, { "entityName": "last_name", "entityType": "COLUMN", "parentEntityName": "Employee", "parentEntityType": "TABLE", "parentXpath": [ "Company", "Employee" ], "data": [ "Trump" ], "endOfContent": true, "totalCount" : 1000 }, { "entityName": "email", "entityType": "COLUMN", "parentEntityName": "Employee", "parentEntityType": "TABLE", "parentXpath": [ "Company", "Employee" ], "data": [ "[email protected]" ], "endOfContent": true, "totalCount" : 1000 } ] }
-
When all entities have been extracted, use the Update Scan Job Status API to update the scan job's status from
IN_PROGRESS
toCOMPLETED
.{ "status": "COMPLETED" }