Custom Scan using Worker Node APIs

The worker node services APIs target Data Discovery worker nodes. These endpoints interact directly with the worker node and allow access to service integrations hosted by the worker nodes. Use the Custom Scan APIs to retrieve custom scan jobs, submit metadata for cataloging, and submit content for classification. This guide will walk through all of the required steps to use the Custom Scan APIs.

Requirements

  • Configure Ingress for Worker Node: for more information, see API Access for Worker Node Management.
  • Client Credentials: For more information on creating OAuth 2.0 Client Credentials, see Managing OAuth 2.0 Client Credentials.
    • Generate a credential for the custom scanner. Download the credential file containing the client_id and secret. These values will be used for authentication and authorization for the custom scanner to prove its legitimacy.
  • Custom Connector: (Worker Node Scan) Data Source

Custom Scan Workflows: How to Steps

  1. Use the Generate Access Token API to retrieve an access token.

  2. Use the Get List of Scan Jobs API to query the worker node for pending scan jobs for a given data source. Make a note of the Job Identifier in the response body. The jobId will be used in the following steps.

    1. datasourceId is the Custom Connector: (Worker Node Scan) Data Source identifier from the requirements above. This value can be obtained using the Get Data Sources API.
    2. Optionally, extract the client credential reference from the credentialReference response parameter.
  3. Once a scan job is available, use the Update Scan Job Status API to update the scan job's status from PENDING to IN_PROGRESS.

    {
      "status": "IN_PROGRESS"
    }
    
  4. Use the Submit Data to Catalog API to submit catalog metadata for entities in your target data source.

    {
      "metadataRequestList": [
        {
          "datasourceId": "",
          "jobId": 0,
          "parentEntityName": "Teradata-DS",
          "parentType": "DATASOURCE",
          "parentXPath": [],
          "metadataList": [
            {
              "entityName": "Company",
              "entityType": "DATABASE",
              "childrenCount": 1,
              "description": "Entity Description.",
              "owner": "OT",
              "productVersion": "1",
              "productName": "SQL DB",
              "productFamily": "MySQL"
            }
          ]
        }   
      ]
    }
    
    {
      "metadataRequestList": [
      {
          "datasourceId": "",
          "jobId": 0,
          "parentEntityName": "Company",
          "parentType": "DATABASE",
           "parentXPath": [
           "Company"
          ],
          "metadataList": [
            {
              "entityName": "TEST-SCHEMA",
              "entityType": "SCHEMA",
              "childrenCount": 1,
              "description": "Entity Description.",
              "version": 1
            }
          ]
        }
      ]
    }
    
    {
      "metadataRequestList": [
        {
          "datasourceId": "",
          "jobId": 0,
          "parentEntityName": "TEST-SCHEMA",
          "parentType": "SCHEMA",
          "parentXPath": [
            "Company",
            "TEST-SCHEMA"
          ],
          "metadataList": [
            {
              "entityName": "Employee",
              "entityType": "OBJECT",
              "childrenCount": 3,
              "description": "Entity Description.",
              "rowsEstimate": 1000,
              "sizeEstimate": 120,
              "estimateTimestamp": 12345678,
              "rowsCalculated": 1000,
              "sizeCalculated": 120,
              "calculatedTimestamp": 12345678,
              "comments": "sample_comment_about_employee_table"
            }
          ]
        }
      ]
    }
    
    {
      "metadataRequestList": [
        {
          "datasourceId": "",
          "jobId": 0,
          "parentEntityName": "TEST-SCHEMA",
          "parentType": "SCHEMA",
          "parentXPath": [
            "Company",
            "TEST-SCHEMA"
          ],
          "metadataList": [
            {
              "entityName": "Employee",
              "entityType": "TABLE",
              "childrenCount": 3,
              "description": "Entity Description.",
              "rowsEstimate": 1000,
              "sizeEstimate": 120,
              "estimateTimestamp": 12345678,
              "rowsCalculated": 1000,
              "sizeCalculated": 120,
              "calculatedTimestamp": 12345678,
              "comments": "sample_comment_about_employee_table"
            }
          ]
        }
      ]
    }
    
    {
      "metadataRequestList": [
        {
          "datasourceId": "",
          "jobId": 0,
          "parentEntityName": "Employee",
          "parentType": "OBJECT",
          "parentXPath": [
            "Company",
            "TEST-SCHEMA",
            "Employee"
          ],
          "metadataList": [
            {
              "entityName": "first_name",
              "entityType": "FIELD",
              "description": "Entity Description.",
              "columnIndex": 1,
              "nullable": true,
              "columnDescription": "contains first_name",
              "encryption": "AES",
              "sizeEstimate": 10,
              "averageSizeCalculated": 15,
              "calculatedTimestamp": 12345
            }
          ]
       }
      ]
    }
    
    {
      "metadataRequestList": [
        {
          "datasourceId": "",
          "jobId": 0,
          "parentEntityName": "Employee",
          "parentType": "TABLE",
          "parentXPath": [
            "Company",
            "TEST-SCHEMA",
            "Employee"
          ],
          "metadataList": [
            {
              "entityName": "first_name",
              "entityType": "COLUMN",
              "description": "Entity Description.",
              "columnIndex": 1,
              "nullable": true,
              "columnDescription": "contains first_name",
              "encryption": "AES",
              "sizeEstimate": 10,
              "averageSizeCalculated": 15,
              "calculatedTimestamp": 12345
            },
            {
              "entityName": "last_name",
              "entityType": "COLUMN",
              "description": "Entity Description.",
              "columnIndex": 2,
              "nullable": true,
              "columnDescription": "contains last_name",
              "encryption": "AES",
              "sizeEstimate": 10,
              "averageSizeCalculated": 15,
              "calculatedTimestamp": 12345
            },
            {
              "entityName": "email",
              "entityType": "COLUMN",
              "description": "Entity Description.",
              "columnIndex": 3,
              "nullable": true,
              "columnDescription": "contains email",
              "encryption": "AES",
              "sizeEstimate": 10,
              "averageSizeCalculated": 15,
              "calculatedTimestamp": 12345
            }
          ]
       }
      ]
    }
    
    {
      "metadataRequestList": [
    	{
          "datasourceId": "",
          "jobId": 0,
          "parentEntityName": "SMB-DS",
          "parentType": "DATASOURCE",
          "parentXPath": [
    
          ],
          "metadataList": [
            {
              "entityName": "TEST-FOLDER",
              "entityType": "FOLDER",
              "childrenCount": 1,
              "description": "Entity Description.",
    		  "sizeOnDisk": 372,
              "dataTotalSize": 0,
              "createdAt": 0,
    		  "lastModified": 0,
    		  "modifiedByUser": "user",
    		  "createdByUser": "user",
    		  "accessedTimestamp": 0,
    		  "fileOwner": "user"
    		  "encryption": "",
    		  "sharedAccess": "SHARED_EXTERNAL",
    		  "url": "",
    		  "eTag": ""
            }
          ]
        }
      ]
    }
    
    {
      "metadataRequestList": [
    	{
          "datasourceId": "",
          "jobId": 0,
          "parentEntityName": "TEST-FOLDER",
          "parentType": "FOLDER",
          "parentXPath": [
    
          ],
          "metadataList": [
            {
              "entityName": "TEST-FILE",
              "entityType": "FILE",
              "childrenCount": 1,
              "description": "Entity Description.",
    		  "sizeOnDisk": 372,
              "dataTotalSize": 0,
              "createdAt": 0,
    		  "lastModified": 0,
    		  "modifiedByUser": "user",
    		  "createdByUser": "user",
    		  "accessedTimestamp": 0,
    		  "fileOwner": "user"
    		  "encryption": "",
    		  "sharedAccess": "SHARED_EXTERNAL",
    		  "url": "",
    		  "eTag": ""
            }
          ]
        }
      ]
    }
    
  5. For entities with content, use the Submit Data to Classify API to submit the content for classification.

    {
      "jobId": 10,
      "datasourceId": "",
      "totalCount" : 1000,
      "content": [
        {
          "entityName": "first_name",
          "entityType": "COLUMN",
          "parentEntityName": "Employee",
          "parentEntityType": "TABLE",
          "parentXpath": [
            "Company",
            "Employee"
          ],
          "data": [
            "Donald"
          ], 
          "endOfContent": true
        },
        {
          "entityName": "last_name",
          "entityType": "COLUMN",
          "parentEntityName": "Employee",
          "parentEntityType": "TABLE",
          "parentXpath": [
            "Company",
            "Employee"
          ],
          "data": [
            "Trump"
          ],
          "endOfContent": true,
          "totalCount" : 1000
        },
        {
          "entityName": "email",
          "entityType": "COLUMN",
          "parentEntityName": "Employee",
          "parentEntityType": "TABLE",
          "parentXpath": [
            "Company",
            "Employee"
          ],
          "data": [
            "[email protected]"
          ],
          "endOfContent": true,
          "totalCount" : 1000
        }
      ]
    }
    
  6. When all entities have been extracted, use the Update Scan Job Status API to update the scan job's status from IN_PROGRESS to COMPLETED.

    {
      "status": "COMPLETED"
    }