AI Guard: Troubleshooting

This guide covers common issues encountered when integrating the AI Guard SDK, deploying the classification service, and operating the metrics pipeline.

SDK Issues

`PermissionError: 401 Unauthorized`

Symptom: Every classification or metric request raises PermissionError.

Possible Causes:

Cause	Resolution
Invalid API key	Verify the API key is correct and has Data Discovery scope
Expired API key	Generate a new API key in the OneTrust admin console
Token validation endpoint unreachable	Ensure the AI Guard service has outbound access to your OneTrust tenant URL
Wrong authorization mode	Confirm the service is configured for the correct authorization type (`onetrust` or `shared-secret`)

Diagnostic Steps:

try:
    response = client.classify(request)
except PermissionError as e:
    print(f"Auth error: {e}")
    # Check: Is the token correct?
    # Check: Can the service reach the OneTrust tenant?

To test token validation independently:

curl -X POST https://ai-guard.example.com:4443/classifications/v1 \
  -H "Authorization: Bearer <your-token>" \
  -H "Content-Type: application/json" \
  -d '{"classifierDescription":{"type":"default"},"structured":false,"text":"test","context":{"actor":"user"}}'

`ConnectionError` or `ConnectionRefusedError`

Symptom: The SDK cannot connect to the AI Guard service.

Possible Causes:

Cause	Resolution
Service is not running	Check pod/container status (`kubectl get pods` or `docker ps`)
Wrong URL or port	Verify the URL matches the service's listen address and port (default: `4443`)
Firewall blocking traffic	Ensure port `4443` is open between your application and the AI Guard service
TLS mismatch	If the service uses TLS, ensure the SDK URL uses `https://`

Diagnostic Steps:

# Test basic connectivity
curl -k https://ai-guard.example.com:4443/health

# If using certificate pinning
curl --pinnedpubkey "sha256//<your-pin>" https://ai-guard.example.com:4443/health

`ValueError` on Client Construction (Certificate Pin)

Symptom: ValueError raised when creating AIGuardClient with pin_sha256.

Possible Causes:

Cause	Resolution
Invalid base64 encoding	Re-extract the pin using the openssl commands
Digest is not 32 bytes	Ensure you're using SHA-256 (not SHA-1 or SHA-512)

Regenerate the pin:

openssl x509 -in server.crt -pubkey -noout \
  | openssl pkey -pubin -outform DER \
  | openssl dgst -sha256 -binary \
  | base64

`ssl.SSLError` or TLS Handshake Failure

Symptom: TLS connection fails before any HTTP request is sent.

Possible Causes:

Cause	Resolution
Certificate pin mismatch	The server's key pair changed; update `pin_sha256`
Self-signed cert without pinning	Use `pin_sha256` or pass a custom session with `verify` pointing to the CA cert
Expired certificate	Renew the server certificate

`ValueError: 400 Bad Request`

Symptom: Classification requests fail with ValueError.

Possible Causes:

Cause	Resolution
Malformed classifier description	Verify the classifier description type and required fields
Missing required fields	Ensure `context`, `classifier_description`, and `text` are all provided
Invalid profile UUID	Verify the profile UUID exists and is accessible

`RuntimeError: 502 Bad Gateway`

Symptom: Classification returns a 502 error.

Possible Causes:

Cause	Resolution
Upstream classification service unavailable	Check that `scan-job-manager` is running and reachable
Classification profiles not loaded	Verify `JOB_EXECUTOR_BASE_URL` is configured correctly

Service Issues

Service Fails to Start

Symptom: The AI Guard container exits immediately on startup.

Possible Causes:

Cause	Resolution
Missing environment variable	A `${VAR}` reference without a default that is unset causes exit. Check logs for the missing variable name.
Invalid config file	Verify YAML syntax in the config file
TLS certificate not found	Ensure `TLS_KEY_PATH` and `CERTIFICATE_PATH` point to valid PEM files
Crypto provider error	Check that the container has access to the required crypto libraries

Diagnostic Steps:

# Check container logs
docker logs <container-id>
kubectl logs <pod-name>

# Look for error-level messages about configuration or TLS

Metrics Not Appearing in AI Governance

Symptom: Metrics are being sent from the SDK but do not appear in the AI Governance dashboard.

Possible Causes:

Cause	Resolution
Metrics not enabled	Ensure the `metrics` section exists in the config file
Wrong exporter type	Verify `metrics.exporter.type` is set to `onetrust` for production
`onprem-agent` not reachable	Check connectivity to `datadiscovery-onprem-agent:8080`
Export interval not elapsed	Default interval is `hour` (3600s). Wait for the interval to complete.
Retry exhausted	Check service logs for metrics export errors

Check if metrics are enabled:

curl -X POST https://ai-guard.example.com:4443/metric \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{"attributes":{"agent_id":"test","platform":"AMAZON_BEDROCK","new_session":"true"},"meter":{"name":"ai_guard.user","value":"1"}}'

# 200 OK = metrics enabled
# 400 Bad Request = metrics disabled

Classification Returns Empty Matches

Symptom: response.matches is always an empty list, even for text that should trigger classifiers.

Possible Causes:

Cause	Resolution
Wrong classifier profile	Try `ClassifierDescriptionDefault()` to use the default profile
Confidence threshold too high	Check `classification.min-allowed-likelihood` setting
Text too short or ambiguous	Test with known PII patterns (e.g., `321-507-0525` for US phone numbers)
Classification profiles not loaded	Check service logs for profile loading errors

Logging

AI Guard logs are formatted in Elastic Common Schema (ECS) JSON format to stdout.

Adjusting Log Level

Set the RUST_LOG environment variable:

RUST_LOG=debug  # Maximum detail for troubleshooting
RUST_LOG=info   # Default — lifecycle and request logs
RUST_LOG=warn   # Reduced — warnings and errors only
RUST_LOG=error  # Minimum — errors only

Key Log Messages

Message	Level	Meaning
`server not configured for TLS`	WARN	TLS section is missing; server running in plain HTTP
`no metrics defined in configuration file, metrics disabled`	WARN	Metrics section is missing
`received SIGINT, terminating`	INFO	Graceful shutdown initiated
`otel exporter closed`	INFO	Metrics flushed successfully
`server shutdown clean`	INFO	Server exited cleanly

What's Next?

Error Reference — Complete HTTP error code reference
FAQ — Frequently asked questions
Networking Requirements — Verify network connectivity

SDK Issues

PermissionError: 401 Unauthorized

ConnectionError or ConnectionRefusedError

ValueError on Client Construction (Certificate Pin)

ssl.SSLError or TLS Handshake Failure

ValueError: 400 Bad Request

RuntimeError: 502 Bad Gateway

Service Issues

Service Fails to Start

Metrics Not Appearing in AI Governance

Classification Returns Empty Matches

Logging

Adjusting Log Level

Key Log Messages

What's Next?

`PermissionError: 401 Unauthorized`

`ConnectionError` or `ConnectionRefusedError`

`ValueError` on Client Construction (Certificate Pin)

`ssl.SSLError` or TLS Handshake Failure

`ValueError: 400 Bad Request`

`RuntimeError: 502 Bad Gateway`