June 16, 2021

How a Single Bug Can Trigger a Massive Outage

QA professionals know the importance of full test coverage for catching hidden software defects. The impact of undetected bugs that leak into production can range from inconsequential to nothing short of catastrophic. A recent Internet outage provides a real-world example of how a single bug can disrupt the global operation of digital business on the Internet.

The Guardian recently reported a massive Internet outage affecting many prominent websites including Amazon, CNN, Hulu, The New York Times as well as their own online news site. The outage lasted for 1 hour, taking down major websites, and leaving millions of online visitors with nothing more than an obscure message: “Error 503 service unavailable”. For major online retailers like Amazon, the financial impact is measured in thousands of dollars in lost business for each second of downtime.

The outage was traced to a failure in a Content Delivery Network (CDN) operated by Fastly, a heavyweight in the CDN arena with blue-chip customers and a 95% customer satisfaction rating. CDNs accelerate the delivery of web pages by caching content in data centers around the world and serving it to website visitors from the nearest location at the fastest possible speed. CDN solutions are offered by Akamai, Cloudflare, Amazon’s CloudFront and others.

The Internet blackout on the Fastly CDN was so far-reaching, it was initially thought to be a cyber security attack. However, Fastly engineers were able to quickly identify the source of the outage as a software defect and took immediate steps to restore service to their customers.

Fastly provided the following explanation for the root cause of the outage to The Guardian.

On May 12, we began a software deployment that introduced a bug that could be triggered by a specific customer configuration under specific circumstances … a customer pushed a valid configuration change that included the specific circumstances that triggered the bug, which caused 85% of our network to return errors.

Quality assurance professionals will recognize this as a combinatorial testing problem. When testing software with different combinations and permutations of data values, missing just one of them can have unforeseen consequences. The best practice is to test all combinations and permutations by injecting both valid and invalid data in all combinations to identify errors in the code before it’s released to production.

Maximizing Coverage with Synthetic Data

That’s easier said than done for QA organizations that rely solely on production data for testing. Production data can only provide the tester with data combinations that result from day-to-day execution of production code – the so-called happy path. In practice, this only represents about 30-50% of the data possibilities and falls far short of providing full test coverage.

Many testers resort to manually creating test data to cover the less likely and invalid data values and combinations. However, manual data creation is labor intensive, time consuming, and not well suited for a continuous integration and delivery lifecycle.

The best way to cover all possible combinations of data for a given set of values is to generate synthetic data with real-time synthetic test data generation. GenRocket’s Test Data Automation platform makes it easy for testers to generate the data they need to provide full coverage in a matter of minutes.

Using GenRocket’s PermutationGen data generator, testers can define the data values for combinatorial testing and generate all possible combinations to produce a comprehensive dataset for testing. The test data is based on the data model for the application under test, ensuring the structure and relationships of data tables are intact. And by selecting the appropriate output data receiver, synthetic data can be formatted to match the target data environment.

Maximize Test Coverage to Maximize Software Quality

Here’s the takeaway from this real-world example: Without full test coverage, there’s no way to prevent software defects from leaking into the production environment. Reliance on production data for testing will only test the happy path for how software performs in the real world. Manual data creation is a costly and cumbersome and not the way to bridge the data gap. Real-time synthetic test data allows testers to design the data they need and generate the combinations required for full coverage.

The use of synthetic test data generation could have prevented the Internet outage described above. If your application environment is subject to a similar risk of downtime, or other operational problems related to software quality, then consider the use of synthetic data to maximize coverage and minimize the possibility of releasing bugs into production. Our experts in Test Data Automation can demonstrate a synthetic data solution to meet your toughest software testing challenge.