Helping organizations engage people and uncover insight from data to shape the products, services and experiences they offer

Learn More

Contact Us


We'll be in touch soon!


Refer back to this favorites tab during today's session for access to your selections.
Refer back to this favorites tab during today's session for access to your selections.x CLOSE


Information Architecture: Trolling in the Data Lake? Get Yourself a Fish-Finder


A semantically-rich information architecture can help businesses see hidden patterns in the murky and fast-proliferating data pools that surround people, organizations and devices.

You say you want to play in the world of Facebook, Instagram or Amazon, creating data-rich, customized user experiences that draw tens of millions of users? Do you dream of crushing industry giants with individualized, online recommendations the way Netflix and Pandora do?

If so, to quote the police chief in Jaws when he sees the famous shark, "You're gonna need a bigger boat" — in this case, to hold all the data you'll need. More importantly, you're going to need a smarter boat, one that can help you find the meaning in the giant lakes of data created by everything from social media to the evolving Internet of Things. In other words, you need a fish-finder for identifying the business insights among the volumes of data continuously generated by people, products, processes and organizations, which we call a Code Halo™.

Size Matters

Consider what we call the "Trillion-Dollar Club," which consists of companies that together generated more than $1 trillion in market value over the last decade: Apple, Amazon, Google, Facebook, Netflix and Pandora. These businesses upended entire industries by analyzing and acting on the Code Halos generated "only" by the 10 billion devices connected to the Internet and mostly used by people.1

Coming soon, to an industry near you, are the billions of devices in the Internet of Things. When everything from smart fitness wristbands to smart cars to jet engines and shipping pallets are constantly and automatically generating information about their operation and their users' activities, the term "big data" will seem hopelessly quaint. As the number of connected devices grows ten-fold to 100 billion, data volume is expected to double every two years to 44 zettabytes or 44 trillion gigabytes, by 2020.2

This data growth is not only inevitable, but it is also essential to creating and improving your all-important algorithms to create better, more personalized experiences for customers. It is this data that you will need to store, manage and extract meaning from, if you are to avoid an "extinction event." This data might fuel a mobile app that guides customers to parking spaces near your store rather than a competitor's, based on historical turnover at Internet-enabled parking meters. Or it might underpin a corporate app that orders inventory for neighborhood drugstores based on usage reports from local smart insulin monitors, combined with area Web searches for cold remedies.

Your mission (whether or not you accept it) is to not only manage the sheer bulk of data, but to also draw meaning from the bits and bytes. This requires going way beyond traditional data repositories to what we call the data lake. You won't be able to afford the time, effort and cost of loading all this data into a big data repository, nor could you easily find and use the data you need in it.

Semantic technology lets you build on and extend your data warehousing and big data investments to drive much more powerful insights from a much broader data set more quickly.

Jump in the Lake

Think of a data warehouse as a dusty, expensive building filled with papers in static file folders, all organized in a rigid classification system that was obsolete as soon as it was created. That's your classic data model and it won't let you fully exploit the Code Halos that you need to succeed.

Think instead of all the data from all your sources, internal and external, old and new, flowing into a massive "data lake." As the lake gets bigger and bigger, with more and different types of data, how do you identify and gather the data you need without going broke or getting lapped by your competitors?

The data has to tell you itself. What you need is the data equivalent of a fish-finder that can peer into the murky darkness of the data lake and tell you which ghostly image is an old sunken tree and which is a school of prized game fish.


Figure 1

This "fish-finder" for business insights is here, in the form of a smart, semantic model that captures the meaning of data, as well as the related domain expertise from data (whether it is structured, unstructured or semi-structured). The building blocks for such a model are standards and technologies such as:

  • The Resource Definition Framework (RDF), which organizes data in a graph structure, reducing development time and cost while delivering business value more quickly.

  • The Web Ontology Language (OWL), which provides a comprehensive model of data definitions and relationships that is human- and machine-readable.

  • The SPARQL Query Language, which is a SQL-like query language for semantic data that can leverage ontological relationships to execute smarter questions across multiple databases in a single query.

  • Inferencing, which makes it easier for users to construct queries by capturing and embedding expertise in the ontology model.

Give Me Meaning — or Else

Remember that what drives business-changing insights and user experiences isn't just data, but also meaning. We need to know what the data means and what it represents before we can use it in Code Halo algorithms that deliver business and user benefits.

For example, when a user, an application or a device searches for "customer," does the query come from the perspective of a shared services organization, in which the "customer" is an employee within the business? Or is it from the perspective of sales, in which the customer is an outsider who pays for a product or service? Without a semantic model to guide you to the right "customer," you're not only wasting your time, but you could also undermine the customer experience.

An intelligent semantic model can deliver meaning and intelligence that empowers better decision-making. A conventional business intelligence system might describe "PPM" as "defects per parts per million." That's a good start, but it doesn't deliver the full business meaning. Try, instead, a fuller semantic-enabled explanation, such as, "PPM or defects per parts per million, is used by our specialty components line to justify premium pricing for our XL line of products." That gives business users a richer idea of what the data means to them.

This semantic model also helps identify the data needed to craft more personalized customer experiences. An electric utility, for example, might use the model to find and combine a customer's name, address, account number and service area with data from his Web-connected thermostat and smartphone location to offer a smart-home service for adjusting the air-conditioning temperature 20 minutes before he arrives home.

A semantic model makes it easier to embed domain expertise — field-based insights into how your customers, products or markets work — into the data. An ad placement application on a music streaming site might, for example, "learn" that listeners who prefer classical or jazz respond more often to detailed, fact-oriented ads, while those who like popular music respond better to simpler, more emotional appeals. Now you're talking targeted ads, a win for both the advertiser and the consumer — if it's done right.

Start Small, Focus on the Achievable

Building this "fish-finder" — an intelligent semantic model that sits on top of your current information architecture — probably sounds daunting. But it can be done, starting with important but gradual changes:

  • Prioritize the onboarding of data by its ability to create truly individualized customer experiences or generate business-changing insights into market needs or operations.

  • Onboard data assets as they become available, without waiting for a specific use case. Those uses will emerge when you least expect them and when they do, you'll need the data immediately. At a minimum, map the data to the semantic model for easier access.

  • Load the data without filtering or transforming it, since new data rules may override old rules. Filtering and transformation rules can be applied as the data is moved to an analytics engine or during query execution.

  • Model the data using familiar terminology. This makes it easier to change the model as needed without physically moving the data. Customize models for specific business groups, encouraging them to ensure its accuracy and completeness.

  • Enable search mechanisms that make it easier for business users to see the data that is available and accessible.

  • Balance legal and compliance needs for security, with the imperative to improve the customer experience through analytics.

We could go on and on with suggestions, but the most important one is: Don't wait. Start building your private semantic Web now to understand your customers, markets and industry before your competitors. Don't be the last in your data lake to get a fish-finder to reap the insights that enable your organization to deliver the next great user experience, product or service.


1 For more on Code Halos and innovation, read "Code Rules: A Playbook for Managing at the Crossroads," and the book, "Code Halos: How the Digital Lives of People, Things and Organizations are Changing the Rules of Business," by Malcolm Frank, Paul Roehrig and Ben Pring, published by John Wiley & Sons, April 2014. 

2 "The Digital Universe in 2020," IDC and EMC, April 2014,

Related Thinking

Save this article to your folders



Five Digital Truths Every Executive Must...

Code Halos – the digital footprints left by people, organizations and...

Save View

Save this article to your folders



Changing Tide: Five Things U.S....

Five years after losing top ranking to China, the future is bright for...

Save View

Save this article to your folders



The Five Shades of the Digital Consumer

As more and more people embrace digital commerce in ever greater...

Save View
Information Architecture: Trolling in the Data Lake? Get Yourself a Fish-Finder