Intelligent Data Subsetting and Synthetic Data Masking – The Best of Both Worlds

by admin on Sep 14, 2023

For thorough software testing, dev and test teams often need both data subsetting and data masking. Many work with costly and complex software platforms to mask production data, produce subsetting files, and manually develop edge and negative case data to provide complete test coverage.

There’s a better way. GenRocket offers a ‘best of all possible worlds’ scenario – one platform that can provide intelligent data subsetting, synthetic data masking, and much, much more.

Instead of working with traditional TDM platforms for test data coverage needs, GenRocket’s enterprise scalable synthetic test data automation platform provides one cost-effective way to produce the volume, variety and format of production data augmented with synthetic data needed for complete test coverage.

Let’s look at two aspects more closely: Intelligent Data Subsetting and Synthetic Data Masking.

What Is Data Subsetting?

Data subsetting involves selecting a representative subset of data from a larger dataset. This process provides a smaller, more manageable volume of data, which not only reduces storage costs but also accelerates the testing process. By choosing a subset that aligns with specific test scenarios, organizations can ensure that their tests are more meaningful and closely mimic real-world use cases.

The GenRocket Solution – Intelligent Data Subsetting

GenRocket’s Intelligent Data Subsetting solution enables teams to efficiently query and extract meaningful subsets from production databases for on-the-spot testing. This system extracts data from source production databases, encompassing all tables or specific related data tables.

Key features of GenRocket’s Intelligent Data Subsetting solution include:

  • Subsetting based on defined values, percentages, or specific numbers of rows.
  • Maintenance of referential integrity across different schemas.
  • Reduction of vast databases into manageable data subsets. For instance, transactions from all 50 states can be filtered to represent just one or two while ensuring referential integrity.

An impressive internal benchmark showcased the efficiency of GenRocket’s solution, taking less than two minutes to provision a subset containing 8 million records.


GenRocket - Data Privacy

Data Masking: Ensuring Data Privacy and Security

Data masking is the process of disguising original data to protect sensitive information while maintaining the data’s authenticity and usability. Industries subject to strict data protection regulations, such as finance and healthcare, must ensure that sensitive data remains secure and compliant with regulations like GDPR and CCPA.

The GenRocket Solution – Synthetic Data Masking

GenRocket truly stands out with its Synthetic Data Masking capabilities, allowing dynamic masking of sensitive database fields using real-time synthetic data replacement. This solution includes:

  • Field-level masking for SQL databases and various file formats.
  • Real-time replacement of sensitive data elements with 100% secure synthetic data.
  • Complete compliance with all global data privacy regulations.
  • Total security, as only metadata is accessed, leaving sensitive data untouched.
  • Rapid and efficient data masking compared to conventional techniques.
  • No need for data storage or reservation; fresh synthetic data is generated for every test run.

GenRocket supports a wide variety of databases, including Oracle, MS SQL Server, IBM DB2, PostgreSQL, and MySQL, and numerous file formats such as CSV, JSON, XML, and more.

The Power of Combining Intelligent Data Subsetting and Synthetic Data Masking

When combined, GenRocket’s Intelligent Data Subsetting and Synthetic Data Masking capabilities form a comprehensive solution that significantly accelerates the time to provision data for testing. The GenRocket platform ensures that test teams always have fresh data sets at their disposal, eliminating the need for data reservation or refresh.

Furthermore, GenRocket enhances its offering with the ability to augment data subsets with additional synthetic data. Synthetic Data Augmentation allows teams to supplement masked production data with controlled and conditioned synthetic data, delivering a more comprehensive testing dataset. This enriched dataset enables both positive and negative data testing, edge case conditions, and scalable data volumes for load and performance testing.

Simplified and Efficient Test Data Management

The GenRocket platform offers a highly streamlined process for test data provisioning, with features including:

  • Using XTS to seamlessly import data models while maintaining referential integrity.
  • A visual representation of the data model for effortless navigation and selection.
  • Micro-subsetting for crafting smaller, test-case specific subsets.
  • High-speed data delivery, retrieving about 5 million rows per minute to ensure on-demand data delivery for each test case in a matter of seconds.
  • Enhanced security through synthetic data masking, a high-speed synthetic data replacement process that outperforms traditional obfuscation, which can be reverse engineered.

Key Takeaways

In a world where software quality and data security are paramount, GenRocket’s Test Data Automation solution offers an innovative and advanced alternative to traditional TDM. By providing an integrated platform for synthetic data generation, subsetting, and masking, GenRocket ensures dev and test teams have optimal datasets tailored to their specific needs, available on-demand through a self-service platform.

Read the full article, which includes use cases for both healthcare and financial services.


Request a Demo

See how GenRocket can solve your toughest test data challenge with quality synthetic data by-design and on-demand