Between 05.06.2020-12.06.2020, checks using the async IIFE syntax had runs marked as passed when, in reality, they were not correctly executed.
Impact
We detected 18 active checks which were affected.
Root Causes
Changes related to additional security measures for Browser checks changed the default behavior of the runner. These changes affected the way we handle promises which are not awaited or returned.
Resolution
Instead of exiting the process when the execution block was finished, we let the node process exit after it executed all promises.
Detection
A customer contacted us after their checks didn't detect an outage they were having.
What Are We Doing About This?
- We pushed the fix immediately on Friday after it was reported.
- We added fixtures to our test suite with async IIFE syntax and added checks to staging and production suites.
- We set up paging capabilities to these checks in case anything avoids our unit tests.
Timeline
05.06.2020
- 12:00 security changes were rolled out to half of the regions
08.06.2020
- 11:00 security changes were rolled out to all regions
12.06.2020
- 15:03 We got informed by a customer that their tests were passing without printing certain logs
- 15:13 We found the root cause of the issue and offered the customer a workaround
- 15:30 We implemented a quick fix for the issue
- 18:30 After talking to the customer, we decided that the issue had to be resolved ASAP
- 19:00 We started testing the fix
- 19:20 We pushed the fix to production and started observing the stats
- 20:00 We declared the incident resolved