Knak Tracking Pixel Degraded Performance

Incident Report for Knak

Postmortem

Quick Summary

Between January 9th and January 19th, Knak experienced an issue with our analytics tracking pixel affecting data collection for a subset of clients. This was caused by a system upgrade intended to improve performance and remove rate limits. While the upgrade successfully improved traffic capacity, it introduced a data formatting error that prevented tracking data from being saved to our database. The tracking pixel itself continued to function for end-users (email recipients), ensuring no broken images or errors were visible. We have since rolled back the change, restored partial data, and implemented new safeguards to prevent this from recurring.

What This Means

For clients who have the Knak Tracking Pixel enabled, we were unable to collect performance data between January 9th at 10:32 AM EST and January 19th at 5:26 PM EST.

While your emails were delivered successfully and the pixel image loaded correctly for your recipients, the "open" event data was not stored in our database. Consequently, analytics reports for campaigns sent or active during this specific window will not reflect the actual open rates or engagement metrics.

This did not impact downstream reporting from marketing automation platforms or any performance data Knak collects from those systems. This issue only affected assets with the “Knak Tracking Pixel” enabled in settings. No action is required from Knak clients.

Root Cause

In December, we identified that our previous tracking proxy service had a strict request limit that throttled data collection during high-volume campaigns. To support higher traffic and remove these limits, we migrated to a new, more scalable proxy service on January 9th.

The root cause of the incident was a data formatting mismatch between this new proxy service and our data processing software.

  • The Mismatch: The new service delivered tracking information (such as timestamps) in a different structural location than the previous service.
  • The Failure: When our processing software could not locate the timestamps in the expected place, it defaulted the values to the text string "Unknown."
  • The Data Loss: Our analytics database is designed to store timestamps strictly as numbers. When the system attempted to save the text "Unknown" into a number-only field, the database rejected the incoming data batches.

This resulted in a scenario where the tracking pixel successfully loaded for the end-user (the email recipient saw no errors), but the data was silently rejected by our database backend.

Timeline

All times are in Eastern Standard Time (EST)

  • January 9, 10:32 AM: The infrastructure upgrade to the new proxy service is deployed to remove rate limits. Data collection stops due to the formatting mismatch described above.
  • January 19, 3:38 PM: Our team identifies a gap in success logs for tracking jobs and immediately begins an investigation.
  • January 19, 4:06 PM: The engineering team confirms that while requests are hitting the server, data is not persisting into the analytics database.
  • January 19, 5:26 PM: Service Restored. We roll back the infrastructure to the previous proxy service. Valid data collection resumes immediately for all new requests.
  • January 20, 12:30 PM: The team identifies the specific database rejection error (String vs. Integer type mismatch).
  • January 20, 1:54 PM: A code fix is deployed to ensure timestamps can never default to an invalid format, preventing this specific failure mode in the future.
  • January 20, 2:01 PM: Engineering began investigating methods to recover and backfill any partial data available from the outage window.

Learnings & Actions Taken

To prevent a recurrence, Knak is implementing the following changes to our engineering and monitoring protocols:

  1. Eliminating Silent Failures: We will update our code to prioritize error visibility. While the tracking pixel will always load for the user (to ensure the recipient experience is never impacted), our backend systems will soon aggressively log and flag when default values are used instead of real data.
  2. Volume-Based Alerting: We are implementing new alerts that trigger specifically when data flow into our database drops unexpectedly. Previously, our alerts focused on server errors (crashes). Since the server was technically "running" (just processing bad data), no alarm sounded. The new alerts will focus on data throughput.
  3. Strict Data Validation: We have added stricter validation logic to our ingestion pipeline to ensure that data types (like timestamps) are verified before they reach the database, preventing a single formatting error from rejecting a batch of data.
  4. Partial Data Restoration: We have now began the process to partially restore some data between January 19th and January 20th
Posted Jan 21, 2026 - 10:24 EST

Resolved

This incident has been resolved.
Posted Jan 20, 2026 - 17:22 EST

Monitoring

A fix has been implemented and we are monitoring
Posted Jan 19, 2026 - 17:40 EST

Investigating

We are currently investigating an issue with the Knak tracking pixel
Posted Jan 19, 2026 - 16:39 EST
This incident affected: Knak App.