Helping organizations engage people and uncover insight from data to shape the products, services and experiences they offer

Learn More

Contact Us


We'll be in touch soon!


Refer back to this favorites tab during today's session for access to your selections.
Refer back to this favorites tab during today's session for access to your selections.x CLOSE


Are Clinical Data Hubs Living Up to the Hype?


A robust and resilient data foundation can help life sciences organizations overcome the information sharing challenges that undermine clinical trials management. Here’s how to get the most from your data hubs, lakes and warehouses.

Great expense and effort is put into clinical trial data management each year. You may be well aware of initiatives at your own organization to that end, but how effective are they?

It can be difficult to tell, but the first place to look is what we call the data foundation, the technology and processes that allow an organization to manage electronic data capture (EDC) and external data. While data hubs, data lakes, and data warehouses are various buzzwords used to describe the technology used, they are certainly not all created equal.

There are many challenges that can arise, and we’ve condensed three of the biggest ones below to help you make sure your data foundation is solid and ready for building.

Format frustrations

Consider the snowflake: everyone knows that they have the reputation for being individuals, each one completely unique. Yet, to the naked eye, they all look the same.

Clinical trials are similar most people believe them to all work the same way but each one has nuances and special circumstances, and data acquisition can be equally diverse.

We implement data hubs to accommodate data coming from many different sources. But often, organizations fail to build for heavy variances of the type of data coming in for clinical studies. In these cases, the organization believes data from any given source will be delivered the same way. However, that’s not the reality. After the data hub is deployed, the organization must take on a vast amount of overhead to massage incoming data that doesn’t meet the predefined criteria, which takes a good deal of time and resources and becomes an ongoing problem.

So what is the answer? Ensure that your organization’s data hub plan accommodates its data in a way that’s flexible enough to support clinical trials, especially if they need to scale. For example, patient data varies heavily between studies, so organizations need to address this variance. One way is by using metadata, which is attaching data that describes the incoming patient data, and then utilizing data hubs that use the metadata and adjust data acquisition and storage structures to successfully handle and process patient data. This offers a better approach than trying to fit the data into rigid data structures.

Operational data, on the other hand, mostly differs by organization. It has no industry standard to represent it, so we’ve helped clients by applying a variant of the common extract, load, transform (ETL) approach — what we call extract, load, transform (ELT). This allows us to accept incoming data in a more flexible way, since it doesn’t limit the type of data that is ingested.

Regardless of the approach your organization takes to loading data, we suggest making sure incoming data can be accepted in a way that supports the changing demands of clinical trials.

Has your lake become a swamp?

There’s some consistent confusion around data repositories. Data warehouse, data hub, data lake — with all of these names, it’s easy to confuse them. For example, the concept of a data lake sounds really useful: store all of your data in a single place regardless of format. It sounds attractive, especially after the previous issue of accepting data of various types.

But if you think of a lake where people have dumped things, like old cars, electronics, trash, etc., then you know that drinking directly from the shore could be dangerous. Like a dirty lake, you can’t use data directly from such a data store; you need to clean it before it’s safe to consume.

This is the reality of this type of data store, so it’s important to first know the type of data repository you have. Understand its benefits, but also understand the additional work or upkeep you will need to take on in order to effectively use the data that it manages. Here’s an example: audit trail history and data lineage are things that we know to be incredibly important to support critical actions such as clinical data review, centralized monitoring and decisions that are made out of the data. A rich audit history doesn’t just help during audits and defend decisions that were made, but will also significantly help the organization trust the data hub as an authoritative source of truth.

If this is important to you — and, in managing data, it probably is — you need to make sure that your data storage foundation supports it. If it doesn’t, it’s time to either make a change or augment your solutions.

Second, you need to understand how your data hub enables you to consume the data it stores. If you’re spending IT’s time on consolidating all of your data into one place, but your reviewers are still reading from weeks-old spreadsheets, there’s a problem. Having the luxury of a data repository is something that should benefit the entire trial team, not just a technical data operations team. Again, you may need to make a change in the type of data hub you have, or augment your storage with a visualization solution to improve decision-making by data review teams.

To see or not to see?

Did you know that recent research indicates that humans only perceive the “gist” of what we see, rather than all of the details?

So consider a medical monitor who has to review data from a dozen different sites over the course of a trial. If your organization has effectively implemented a data repository or data hub that provides the aforementioned visualization, there will be many opportunities to review the data over time. However, with that frequent ability to review, there is also a frequent risk of missing adverse effects, trends or red flags that require investigation buried among the other information being updated.

However, an effective way to prevent missed risk indicators is to describe pieces of information when they appear to the reviewer — again, by using metadata. We’ve succeeded when we use it to “flag” data of various types when it comes into the data hub.

For example, the monitor reviewing site data mentioned above would visibly see metadata indicating information that is critical in some way, or information that has changed, or even information that is in direct mismatch to the way data from other sites trends or appears. Doing this can proactively orient reviewers to the value that a data hub can and should provide.

Figure 1

A foundation for future development

Your organization’s data technology landscape, regardless of approach, must service your entire clinical study team. With careful consideration of each of these challenges illuminated above, your organization can build a data foundation that will sustain business success for a long time.

For more insights, please visit the Life Sciences section of our website, or contact us.

Related Thinking

Save this article to your folders



The Transformational Power of Sharing

The benefits of a shared platform that exponentially increases...

Save View

Save this article to your folders



Metrics in Protocol Development: Are You...

To streamline clinical trials, life sciences organizations need to use the...

Save View

Save this article to your folders



Digital Platforms Vitalize Value-Based...

Biopharmaceutical companies are entering new territory by inking...

Save View