Big Data is often described in terms of three V’s: volume, velocity and variety. Let's dive into what exactly that means and how state and local governments can begin to tackle Big Data.
The hard disk drives that stored data in the first personal computers were minuscule compared to today’s hard disk drives. Storage volumes have grown because data volumes have grown, whether it’s the space required to run a modern operating system or data sets that are used to map genomes. Despite the fact that multiterabyte hard disk drives cost as little as $100, the advent of data such as high-definition video ensures that users will fill up those drives in short order.
The challenge isn’t just that the world today generates lots of data; it generates data quickly — at high velocity — often in real time. Consider a modern IP video camera used to monitor a bridge crossing. Software can analyze the incoming video data to alert security professionals if a suspicious package appears on the bridge that wasn’t in previous video frames.
But to be effective, this analysis must happen as rapidly as the video data enters the system. In this case, velocity pertains not only to how quickly data is generated, but also to how quickly someone interprets and acts upon it.
Another source of high-velocity data is social media. Twitter users are estimated to generate nearly 100,000 tweets every 60 seconds. This comes in addition to almost 700,000 Facebook posts and more than 100 million emails a minute. Somewhere in that deluge is information related to an agency’s mission, perhaps from citizens voicing their dissatisfaction or seeking assistance.
Social media data is also a good example of the variety of information that characterizes Big Data. Social media information, like roughly 80 percent to 90 percent of all data today, is unstructured. It doesn’t arrive in neat records that are easily searchable. Unstructured data is often text heavy and doesn’t fit neatly into relational tables. Its explosion in recent years has driven the Big Data movement.
Sensors are another massive source of unstructured and semistructured data. Researchers at HP Labs estimate that by 2030, 1 trillion sensors will be in use. These sensors monitor conditions in the physical world, such as weather, energy consumption and environmental surroundings, as well as in cyberspace. Depending on the application, sensors can generate multiple terabytes of data per day.
Structured data is what comes to mind when thinking about traditional databases, filled with customer relationship management (CRM) records, statistics or financial transactions. One of the best opportunities presented by Big Data technologies is to bring together structured and unstructured data to reveal new insights.
Ironically, the challenge of Big Data actually begets even more data. Big Data (especially unstructured data) must be described in a way that software tools, such as business intelligence, analytics and query tools, can identify and ingest. That’s where metadata comes into play.
Metadata is data that describes data, making it discoverable across an enterprise infrastructure or even in the cloud. Agencies must manage metadata as well as the underlying data. The better they manage metadata, the more valuable their Big Data will be.
It's a complex issue for state and local governments to tackle but the potential benefits are enormous. Download our free white paper Proactive Planning for Big Data for more information.