Many Hollywood screenplays based on true stories take the liberty of amalgamating one or more people into a single character. Making one character do the work that several did in real life tightens the story to make it more compelling so it’s “a better version than the truth.” Synthetic data testing for health plan configuration is built on a similar concept. Instead of making a copy of actual production data for testing, synthetic data is invented using random selection processes so testing data does not directly represent any single entity or record within the production data set.
Yet synthetic data is not random: Created well, it fully represents the richness of the solution being tested. Further, whereas production data copies are limited to real-world events, synthetic data allows health plan providers to create data that meets the precise criteria required to test the full scope of a transaction or application. That helps ensure complete configuration quality.
Payers also need to find an alternative to testing with actual member data. It’s likely that more employers and regulators will limit the use of employees’ personal health information (PHI) for testing and other purposes, even if the data is masked. Synthetic data enables payers to simultaneously protect PHI and create a pathway to more robust testing. These all build a strong case for payers to swiftly adopt synthetic data.
The most common method of data creation for testing is making a copy of production data. However, data dumps can be huge, while the amount of information used in testing is but a tiny subset of a few thousand records. Testing teams often cannot articulate their specific data needs until they are actively developing plans for a specific test cycle. To fend off multiple requests for data, the IT organization gives the team the full data dump. But when using synthetic data, a plan provider does not need to copy a million members and 10 years of claims history from live production for a test. Here is a more effective way for teams to identify their needs and approach test-data management:
These created entities are termed “unicorns.” Each unicorn represents a perfect test use case or multiple use cases. Instead of searching for real-world data to fulfill the needs of each and every test, teams may build a unicorn member with the specific desirable attributes that will trigger the multiple exact test outcomes required in a single pass. Or, a unicorn may have multiple perfect circumstances that meet many different testing needs. If testing vision claims, for example, the unicorn will “have” glasses. If testing limits on physical therapy, the unicorn will have “visited” his therapist 10 times. Teams may change any of the unicorns’ characteristics that have no bearing on the testing result and keep those important to testing. They can evolve unicorns, making them single or married, childless or parents to eight children, all with different health conditions by virtue of the claims assigned to them.
All the data is real and valid — and assigned to the invented entities that nonetheless look and behave as though they are actual members culled from production data.
The unicorns must incorporate data that goes beyond that gleaned from the claims processing solution. Dozens of business process systems might be used in any given test cycle. Creating synthetic data that provides the related information across the enterprise suite of applications enables robust and comprehensive business process testing. For example, a test suite may need to represent not only a given claim, but also specialized pricing triggers, workflow controls, care management integration and supporting documentation systems such as images or laboratory results.
This process requires analyzing what’s important in each unique application, creating data that fits each application’s requirements and then making that data consumable in a way that addresses the data schema. Applications may include HIPAA privacy modules, an image management solution or a provider network management system, as well as lab and pharmacy connected data sets. To maintain the interrelationships among the application databases, testing teams should create the required data elements once, then use that data set to write to the different databases.
All the designed data is stored for repeated use. Testing teams select the data they need, format it for a specific application and load the processes appropriate for the test cycle. The result is unicorn test data that better meets all the diverse testing needs of the organization. That results in much smaller non-production data sets, with much more efficient and effective testing execution cycles.
Teams will no longer have to search far and wide for data that matches their testing requirements. All necessary data will be built into a single unicorn member.
While a complete solution for creating and using synthetic data is still emerging, plans can start storyboarding for their adoption of synthetic data testing with the following steps:
Payers may start using synthetic data by cloning a single test case, such as a data set for a claim representing a broken leg, to test the benefits configuration for a physician joining a plan’s network. The cloning enables plans to instantly create multiples of the “same” data types, whether two or 200, ready to be tested in multiple passes of the same test. This means:
In the future, much of the synthetic data creation may be automated, using machine learning elements that leverage information gained from prior test iterations. Tools will extract claims data and similarly analyze non-core administrative systems to identify data types and interrelationships to create synthetic data that represents transactions across the enterprise. That eventually will result in a comprehensive pool of non-production-designed data, ready for testing any type of solution — including those invented in a payer’s imagination. Current production data represents the past. Using synthetic data, plans can test contract and physician performance under different configurations and business models, such as value-based care.
By enabling testing teams to create better-than-real-world characters, synthetic data enables payers to improve testing efficiency, protect PHI and intelligently test business scenarios. Transitioning to synthetic data will enable payers to add larger-than-life performance to their testing capabilities, tightening the test case story to make it more compelling so it’s indeed a better version than the truth.
This article was written by Tom Newman, General Manager, Cognizant Optimization Software Products.
To learn more, visit the Healthcare section of our website or contact us.