GCP checksum mismatch errors after data plane upgrade to v1.1.32

Applies to:

Plan:
Deployment:

Summary

Issue: After upgrading to data plane v1.1.32, GCP self-hosted deployments experience checksum mismatch errors in braintrust-api logs, causing traces to appear incomplete (spans stuck “in progress”), 500 Internal Server Error responses on the /logs3 endpoint, and generally unusable tracing. Cause: Data plane v1.1.32 bundles AWS SDK v3.723+, which changed the default to enable CRC32 response checksum validation. GCP’s S3 compatibility layer does not return the checksum headers the SDK now expects, causing every object storage operation to fail with a checksum mismatch. Resolution: Set two environment variables on braintrust-api to disable strict checksum validation, or upgrade to Helm chart v5.0.1+ which includes this fix automatically.

Symptoms

You may see one or more of the following after upgrading to data plane v1.1.32:

Checksum mismatch errors in braintrust-api logs:

Error: Checksum mismatch: expected "H4DoSA==" but received "JuEkhQ=="
in response header "x-amz-checksum-crc32c"

Traces with child spans stuck “in progress” that never complete
500 Internal Server Error responses with "Service":"api" on the /logs3 endpoint
Spans within traces appearing inconsistently or missing

Who is affected

This issue only affects deployments that meet all of these criteria:

Running on GCP
Using S3 compatibility mode for object storage
Not using native GCS auth (ENABLE_GCS_AUTH is not set to true)
Upgraded to data plane v1.1.32

Resolution Steps

Option 1: Upgrade Helm chart to v5.0.1+ (recommended)

Step 1: Update your Helm chart version

Upgrade to Helm chart version 5.0.1 or later, which includes the fix automatically.

# In your Helm values or Terraform Helm release configuration
version = "5.0.1"

Step 2: Apply the upgrade

helm upgrade braintrust braintrust/braintrust -f values.yaml

Step 3: Verify the fix

Check braintrust-api logs to confirm the checksum mismatch errors have stopped. Send a test trace and verify that all spans complete successfully.

Option 2: Manually set environment variables

If you cannot upgrade the Helm chart immediately, set these two environment variables on the braintrust-api deployment:

Step 1: Add environment variables

Add the following to your braintrust-api configuration (via Helm extraEnvVars, Terraform, or your deployment manifest):

AWS_REQUEST_CHECKSUM_CALCULATION: "WHEN_REQUIRED"
AWS_RESPONSE_CHECKSUM_VALIDATION: "WHEN_REQUIRED"

Both variables are required. Setting only one will not fully resolve the issue.

Step 2: Restart the API pods

kubectl rollout restart deployment braintrust-api -n braintrust

Step 3: Verify the fix

Check braintrust-api logs for the checksum error. It should no longer appear. Send a test trace and verify all spans complete.

Do not roll back from v1.1.32 to an earlier data plane version. Database schema changes in v1.1.32 are not backward-compatible, and rolling back may cause additional data integrity issues.

​Summary

​Symptoms

​Who is affected

​Resolution Steps

​Option 1: Upgrade Helm chart to v5.0.1+ (recommended)

​Step 1: Update your Helm chart version

​Step 2: Apply the upgrade

​Step 3: Verify the fix

​Option 2: Manually set environment variables

​Step 1: Add environment variables

​Step 2: Restart the API pods

​Step 3: Verify the fix

Summary

Symptoms

Who is affected

Resolution Steps

Option 1: Upgrade Helm chart to v5.0.1+ (recommended)

Step 1: Update your Helm chart version

Step 2: Apply the upgrade

Step 3: Verify the fix

Option 2: Manually set environment variables

Step 1: Add environment variables

Step 2: Restart the API pods

Step 3: Verify the fix