Many Hollywood screenplays based on true stories take the liberty of amalgamating one or more people into a single character. Making one character do the work that several did in real life tightens the story to make it more compelling so it’s “a better version than the truth.” Synthetic data testing for health plan configuration is built on a similar concept. Instead of making a copy of actual production data for testing, synthetic data is invented using random selection processes so testing data does not directly represent any single entity or record within the production data set.
Yet synthetic data is not random: Created well, it fully represents the richness of the solution being tested. Further, whereas production data copies are limited to real-world events, synthetic data allows health plan providers to create data that meets the precise criteria required to test the full scope of a transaction or application. That helps ensure complete configuration quality.
Payers also need to find an alternative to testing with actual member data. It’s likely that more employers and regulators will limit the use of employees’ personal health information (PHI) for testing and other purposes, even if the data is masked. Synthetic data enables payers to simultaneously protect PHI and create a pathway to more robust testing. These all build a strong case for payers to swiftly adopt synthetic data.
How to use synthetic data
The most common method of data creation for testing is making a copy of production data. However, data dumps can be huge, while the amount of information used in testing is but a tiny subset of a few thousand records. Testing teams often cannot articulate their specific data needs until they are actively developing plans for a specific test cycle. To fend off multiple requests for data, the IT organization gives the team the full data dump. But when using synthetic data, a plan provider does not need to copy a million members and 10 years of claims history from live production for a test. Here is a more effective way for teams to identify their needs and approach test-data management:
- Connect: Recognizing how cases, process and data are connected helps streamline data requirements. Testing teams can better analyze what they are testing and why by identifying their test use cases, the test process they’ll use, the data required to execute the test and the metrics that signal success or failure.
- Capture: Instead of using complete member records, capture the information that’s important to executing the test, such as demographics, statistics and claim models.
- Create: Use that information to create a set of employer groups, members, providers and claims that look and act like production data — but are invented entities. The “likeness” of any given actual member to an invented member is not reliable or consistent.
These created entities are termed “unicorns.” Each unicorn represents a perfect test use case or multiple use cases. Instead of searching for real-world data to fulfill the needs of each and every test, teams may build a unicorn member with the specific desirable attributes that will trigger the multiple exact test outcomes required in a single pass. Or, a unicorn may have multiple perfect circumstances that meet many different testing needs. If testing vision claims, for example, the unicorn will “have” glasses. If testing limits on physical therapy, the unicorn will have “visited” his therapist 10 times. Teams may change any of the unicorns’ characteristics that have no bearing on the testing result and keep those important to testing. They can evolve unicorns, making them single or married, childless or parents to eight children, all with different health conditions by virtue of the claims assigned to them.
All the data is real and valid — and assigned to the invented entities that nonetheless look and behave as though they are actual members culled from production data.