In the technological age in which we live, the mantra seems to be more data is always better. In pursuit of this, we require our teams and our technology to deliver the most data in the shortest possible timeframe. More data – faster – often is seen as the best way to solve business problems.
But what if the solution doesn’t lie simply in more data, but in careful curation of the information we already have? Pause for a moment to consider if what you really need is more data, or if it’s a better way to process the data you already have. Acquiring more information for the sake of it misses the point of data acquisition.
Sometimes we get so distracted by the pursuit of more and better information that we ignore the information we already have. We sacrifice quality for sheer overwhelming quantity. We blindly apply the latest the latest technology solution to the problem, thinking that will capture elusive lost revenue.
Next time you’re tempted to throw more data at the problem, consider: Have you done the hard work of cleaning and organizing the data you already have?
What if lack of data isn’t the problem? In this day and age, data acquisition might very well be the easy part. It’s what you do once you have the data that is mission critical to business success. Every step of the way, you need to ask yourself if you are effectively processing and using the data you already have. If you aren’t engaging in this crucial next step, your problem isn’t data quantity.
Next time you’re tempted to throw more data at the problem, consider: Have you done the hard work of cleaning and organizing the data you already have? Preparing real-time process data to actually be useful is hard, but necessary work.
There is a logical progression of transforming raw data to useful information and knowledge. It starts with Data Acquisition, but that’s just the beginning. Unfortunately, there’s no direct flight to Knowledge — you first have a substantial layover in Data Curation.
As someone who has been using operating data for over 35 years to solve asset performance problems, I’ve learned that before I apply more data, I need to first prepare the data before I attempt to use it. At Power Factors we use the acronym CAST to describe the tedious but important data preparation phase of data processing. CAST stands for:
After data is collected from the various data streams and stored in a time series database, it must be qualified before being consumed by the user. The act of validating, estimating and updating the raw data and storing it in a clean state is called data curation.
Aggregation is the process of taking curated data and storing it in the location and format data consumers need it. A data platform should include a variety of data stores, including time series, structured and unstructured data stores.
Unstructured time series data is useful for massive data collection tasks and simple event detection. Then, adding a contextual framework to the time series data makes it easier to find, compare and use. For example, with unstructured data only, if an inverter experiences a forced outage, very little is known — just the date and time the inverter stopped running. However, if context such as the make, model and serial number of the inverter and the type and frequency of historical inverter failures for that inverter class, users are equipped with a much richer data set for resolving the recurring problem.
Transforming the qualified data can take many forms. For example, an operational or commercial event can be fired based on predetermined trigger thresholds, or the data may be used in a machine learning algorithm or a component of an advanced analytic.
Sure, I want more data, and more data can help. But not until I have first figured out how to efficiently manage and transform the data I already have. So next time someone says, “More data is always better,” enlighten them by responding, “…sure, once you’ve taken care of the data you already have.”
Steve Hanawalt is a founder and executive vice president at Power Factors.