Some content was showing as a white screen
Time of incident: November 24th, 4:49 am
For a period of 20 minutes starting at 4:49am EST, some users saw a white screen when accessing Celero content.
We deployed a routine minor version update that includes several fixes. One of the dependencies was not deployed to production. Therefore some of our servers returned an error that resulted in a white screen for some users.
Who was affected?
Only some users were affected. As we run on multiple servers, the code is deployed gradually and immediately verified. As soon as we identified the issue, we started the process to revert to the previously deployed version which took several minutes to complete.
Once completed (20 min after the initial deployment), all of the services were back to normal.
What did we learn?
We followed our internal process of identifying the root causes so that issues like this don't repeat.
As a result, we will be making a few fixes:
- We have made some changes to our staging deployment process to ensure that any dependencies get deployed together.
- Unrelated to the issue, we found an opportunity to improve our monitoring and will be adding another monitor for more fine-grained internal alerting.
- The team followed the process as intended and did the maximum they could to minimize the impact. We are still taking extra precautions to prevent issues like this from repeating in the future.