Troubleshooting

This guide covers common issues encountered when integrating the AI Guard SDK, deploying the classification service, and operating the metrics pipeline.

SDK Issues

PermissionError: 401 Unauthorized

Symptom: Every classification or metric request raises PermissionError.

Possible Causes:

CauseResolution
Invalid API keyVerify the API key is correct and has Data Discovery scope
Expired API keyGenerate a new API key in the OneTrust admin console
Token validation endpoint unreachableEnsure the AI Guard service has outbound access to your OneTrust tenant URL
Wrong authorization modeConfirm the service is configured for the correct authorization type (onetrust or shared-secret)

Diagnostic Steps:

try:
    response = client.classify(request)
except PermissionError as e:
    print(f"Auth error: {e}")
    # Check: Is the token correct?
    # Check: Can the service reach the OneTrust tenant?

To test token validation independently:

curl -X POST https://ai-guard.example.com:4443/classifications/v1 \
  -H "Authorization: Bearer <your-token>" \
  -H "Content-Type: application/json" \
  -d '{"classifierDescription":{"type":"default"},"structured":false,"text":"test","context":{"actor":"user"}}'

ConnectionError or ConnectionRefusedError

Symptom: The SDK cannot connect to the AI Guard service.

Possible Causes:

CauseResolution
Service is not runningCheck pod/container status (kubectl get pods or docker ps)
Wrong URL or portVerify the URL matches the service's listen address and port (default: 4443)
Firewall blocking trafficEnsure port 4443 is open between your application and the AI Guard service
TLS mismatchIf the service uses TLS, ensure the SDK URL uses https://

Diagnostic Steps:

# Test basic connectivity
curl -k https://ai-guard.example.com:4443/health

# If using certificate pinning
curl --pinnedpubkey "sha256//<your-pin>" https://ai-guard.example.com:4443/health

ValueError on Client Construction (Certificate Pin)

Symptom: ValueError raised when creating AIGuardClient with pin_sha256.

Possible Causes:

CauseResolution
Invalid base64 encodingRe-extract the pin using the openssl commands
Digest is not 32 bytesEnsure you're using SHA-256 (not SHA-1 or SHA-512)

Regenerate the pin:

openssl x509 -in server.crt -pubkey -noout \
  | openssl pkey -pubin -outform DER \
  | openssl dgst -sha256 -binary \
  | base64

ssl.SSLError or TLS Handshake Failure

Symptom: TLS connection fails before any HTTP request is sent.

Possible Causes:

CauseResolution
Certificate pin mismatchThe server's key pair changed; update pin_sha256
Self-signed cert without pinningUse pin_sha256 or pass a custom session with verify pointing to the CA cert
Expired certificateRenew the server certificate

ValueError: 400 Bad Request

Symptom: Classification requests fail with ValueError.

Possible Causes:

CauseResolution
Malformed classifier descriptionVerify the classifier description type and required fields
Missing required fieldsEnsure context, classifier_description, and text are all provided
Invalid profile UUIDVerify the profile UUID exists and is accessible

RuntimeError: 502 Bad Gateway

Symptom: Classification returns a 502 error.

Possible Causes:

CauseResolution
Upstream classification service unavailableCheck that scan-job-manager is running and reachable
Classification profiles not loadedVerify JOB_EXECUTOR_BASE_URL is configured correctly

Service Issues

Service Fails to Start

Symptom: The AI Guard container exits immediately on startup.

Possible Causes:

CauseResolution
Missing environment variableA ${VAR} reference without a default that is unset causes exit. Check logs for the missing variable name.
Invalid config fileVerify YAML syntax in the config file
TLS certificate not foundEnsure TLS_KEY_PATH and CERTIFICATE_PATH point to valid PEM files
Crypto provider errorCheck that the container has access to the required crypto libraries

Diagnostic Steps:

# Check container logs
docker logs <container-id>
kubectl logs <pod-name>

# Look for error-level messages about configuration or TLS

Metrics Not Appearing in AI Governance

Symptom: Metrics are being sent from the SDK but do not appear in the AI Governance dashboard.

Possible Causes:

CauseResolution
Metrics not enabledEnsure the metrics section exists in the config file
Wrong exporter typeVerify metrics.exporter.type is set to onetrust for production
onprem-agent not reachableCheck connectivity to datadiscovery-onprem-agent:8080
Export interval not elapsedDefault interval is hour (3600s). Wait for the interval to complete.
Retry exhaustedCheck service logs for metrics export errors

Check if metrics are enabled:

curl -X POST https://ai-guard.example.com:4443/metric \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{"attributes":{"agent_id":"test","platform":"AMAZON_BEDROCK","new_session":"true"},"meter":{"name":"ai_guard.user","value":"1"}}'

# 200 OK = metrics enabled
# 400 Bad Request = metrics disabled

Classification Returns Empty Matches

Symptom: response.matches is always an empty list, even for text that should trigger classifiers.

Possible Causes:

CauseResolution
Wrong classifier profileTry ClassifierDescriptionDefault() to use the default profile
Confidence threshold too highCheck classification.min-allowed-likelihood setting
Text too short or ambiguousTest with known PII patterns (e.g., 321-507-0525 for US phone numbers)
Classification profiles not loadedCheck service logs for profile loading errors

Logging

AI Guard logs are formatted in Elastic Common Schema (ECS) JSON format to stdout.

Adjusting Log Level

Set the RUST_LOG environment variable:

RUST_LOG=debug  # Maximum detail for troubleshooting
RUST_LOG=info   # Default β€” lifecycle and request logs
RUST_LOG=warn   # Reduced β€” warnings and errors only
RUST_LOG=error  # Minimum β€” errors only

Key Log Messages

MessageLevelMeaning
server not configured for TLSWARNTLS section is missing; server running in plain HTTP
no metrics defined in configuration file, metrics disabledWARNMetrics section is missing
received SIGINT, terminatingINFOGraceful shutdown initiated
otel exporter closedINFOMetrics flushed successfully
server shutdown cleanINFOServer exited cleanly

What's Next?