Resolved -
This incident has been resolved. We'll continue to monitor for any disruptions, and follow up with a detailed RCA.
Sep 3, 14:36 UTC
Update -
We are nearing full resolution, and will continue to keep this incident updated.
Sep 3, 14:32 UTC
Update -
We are continuing to work through our asynchronous work queue.
Sep 3, 14:18 UTC
Update -
We are nearly recovered, but will keep the incident in monitoring state until all asynchronous work is fully stable.
Sep 3, 14:00 UTC
Update -
We are continuing to work through our asynchronous work queue.
Sep 3, 13:47 UTC
Update -
We are continuing to work through our asynchronous work queue.
Sep 3, 13:28 UTC
Update -
Our services are continuing to recover, including issuing any outstanding invoices and webhooks. We have not seen any elevated rate of API errors since initial recovery.
Sep 3, 13:17 UTC
Update -
We're continuing to see broad recovery, and there have been no API errors since 12:38 UTC. Continuing to work to bring back async services.
Sep 3, 13:03 UTC
Update -
Continuing to see broader recovery - API errors have recovered.
Sep 3, 12:46 UTC
Monitoring -
We're seeing broader recovery, and have seen no recent API errors since 12:38 UTC and now working to bring back async services.
Sep 3, 12:45 UTC
Update -
We are continuing to investigate the issue.
Sep 3, 12:32 UTC
Update -
We're continuing to focus on mitigating impact. Once again, we apologize for the disruption and will publish an RCA after the incident is resolved.
Sep 3, 12:15 UTC
Update -
We're seeing persistent partial recovery across writes, but a lower rate of failures persist. We believe the mitigations we've put in place are helping, but are continuing to pursue faster and more encompassing mitigation strategies.
Note that the ingestion API is not failing and has not during the incident; once the incident is resolved we do not expect any data gaps with event ingestion so no retries should be necessary.
Sep 3, 11:57 UTC
Update -
Although this incident is still active, we're seeing partial recovery for specific customers. We're continuing to treat this as top priority and working to mitigate the impact by running maintenance operations at our database layer.
Sep 3, 11:45 UTC
Update -
We are continuing to see elevated errors on write endpoints. We believe we understand the root cause, and are pursuing multiple parallel mitigation strategies to resolve the incident as quickly as possible.
Sep 3, 11:31 UTC
Update -
We're continuing to investigate, and are actively pursuing mitigation strategies. We apologize for the disruption and will provide status updates diligently here as we learn more.
Sep 3, 11:14 UTC
Update -
We are continuing to investigate this issue.
Sep 3, 10:51 UTC
Investigating -
We are currently investigating database issues causing API errors
Sep 3, 10:46 UTC