Knak outage

Incident Report for Knak

Postmortem

First Reported

Starting at 455PM EST, on January 18th, 2021, we saw a sudden spike in requests that caused our application instance to auto-scale. Unfortunately, because of the sudden nature of the spike, the auto-scaling did not occur fast enough to meet the demand for requests. Within moments, we scaled from less than 200 requests per min, to nearly 6000. This resulted in some issues for approximately 10 mins during the initial spike.

Issue

The issue was traced back to an email containing a specific transformed image, causing a spike in requests on our end.

Although the vast majority of our images are served behind a Content Delivery Network (CDN) through AWS, the edited images are currently served through the application directly. Of the efforts to load the image, 238,852 were successful, and 24,873 failed.

Path Forward

We have made changes to our infrastructure in order to serve our edited images through a CDN in much the same way as our uploaded images are currently, in order to avoid this type of issue in the future.

Posted Jan 19, 2021 - 15:56 EST

Resolved

The Knak Application went down briefly (for 10 minutes).

Posted Jan 18, 2021 - 17:00 EST