Application down - 500 error

Incident Report for Knak

Postmortem

On April 3rd at 7:26 AM EDT, our application experienced an incident resulting in unresponsiveness and the inability to serve requests. Upon immediate awareness, remediation efforts were initiated, and a temporary fix was implemented by 8:25 AM EDT, at which time resolution of the underlying cause was believed to be achieved.  On April 4th at 8:48 AM EDT, the application again became unresponsive. A swift response facilitated the deployment of another temporary fix by 9:03 AM EDT on April 4th. Subsequent investigation revealed the root cause to be the unintended accumulation of temporary server operating logs, leading to disk space exhaustion on the application server. A permanent fix has since been implemented to prevent further log accumulation, and no additional space consumption has been observed. No action is required from our customers regarding this incident.

Root Cause:
During runtime our logging system was failing to send logs to our external logging tools this prompted the server to fail back to logging within the server, the logging directory was ephemeral storage so a new deployment would temporarily fix the issue. It was not until April 3rd at 7:26AM EDT that our temporary storage had completely filled causing the server to not have any disk swap space; rendering the server to be unable to respond to request

Actions:

  1. We have since fixed the issue with our logging so that we are not failing back to logging on the server
  2. We have also added strict alerting on our temporary storage on our server to ensure that we are alerted if we are soon going to run out of space
Posted Apr 04, 2025 - 15:37 EDT

Resolved

This incident has been resolved.
Posted Apr 03, 2025 - 09:30 EDT

Update

We are continuing to monitor for any further issues.
Posted Apr 03, 2025 - 08:57 EDT

Update

We are continuing to monitor for any further issues.
Posted Apr 03, 2025 - 08:29 EDT

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Apr 03, 2025 - 08:29 EDT

Investigating

We are currently investigating this issue.
Posted Apr 03, 2025 - 08:09 EDT
This incident affected: Knak App.