What is Big Data?

← Back to Glossary

Big Data Definition

Big data consists of large, complex data sets that cannot be processed using traditional data-processing methods and software.

History

The first references to the term Big Data appeared in the 1990s. As the internet and personal computers became more mainstream, the amount of data collected began to accelerate. In the early 2000s the first social networking sites like Facebook emerged, followed by the Internet of Things, along with mobile devices like cell phones further accelerated the growth in data collection.  

The Three V’s of Big Data

In 2001, Gartner analyst Doug Laney famously defined big data by three main properties, known as the three V’s of big data:

Volume

Big data inherently implies a large amount of data. Storing all of this data can be a challenge for organizations, however with the rise of cloud data platforms, managing large data volumes has become much easier. 

Velocity

The rate at which data is received and then acted on. With the rise of big data, organizations are not only collecting more data, but the speed at which data streams into an organization has significantly increased. 

Variety

The variety of data refers to the different types of data available now. Traditional datasets were formatted and structured, fitting in relational databases. However, Big Data also incorporates unstructured and semi-structured data types that require alternative processing methods. 

Value and Veracity

There are two additional properties that have emerged in addition to the three V’s of Big Data: Value and Veracity. 

Value

With so much data now collected, organizations can become paralyzed with the sheer quantity available. More data does not necessarily equate to more value for the organization. Value refers to the worth of the data once it is processed and analyzed. Turning data into actionable insights, sometimes referred to as profitability for an organization, is when an organization can reap tremendous benefits from their big data. 

Veracity

How much do you trust the data your organization collects and analyzes? It is essential for everyone in the organization to trust the veracity of the data collected and analyzed in order to actually implement the business decisions yielded from its insights. 

Big Data Use Cases

Big data enables organizations to gain a potential competitive advantage by being able to more quickly understand and optimize various aspects of their business, including: operations, customer service, product development, sales and marketing tactics. 

  1. Sales and Operations – Big data can be applied in a variety of ways to optimize operations, including forecasting future sales, and inventory management. 
  2. Customer Services – Companies can improve customer relations by identifying patterns and measuring the types and number of customer tickets, likelihood of a customer churning, and speed to customer service followup or resolution. 
  3. Product development and Usage – Knowing how customers interact with your product (mobile versus desktop) or specific times of high product usage can help organizations create better user experiences. 
  4. Marketing – Marketers are able to access both first and third party data to better predict product preferences, interests, software usage, and other buyer signals to create more personalized marketing experiences that can yield higher conversion rates. 

Industries That Use Big Data

  • Energy – Oil and gas companies and utilize Big Data to identify potential drilling locations.
  • Financial Services – Big Data has been widely adopted by Financial institutions to better inform investment decisions and improve risk management.
  • Healthcare – With access to Big Data, healthcare providers can better assess their patients and the quality of care provided, ultimately, improving patient outcomes. 
  • Retail – Retailers can track their customer’s product usage/purchases to provide recommendations to consumers based on their history, therefore providing more relevant and personalized content. 
  • Manufacturing – Manufacturers can utilize Big Data to improve production lines efficiency and quality.

Challenges

  1. Collection and Management – Collecting data from many disparate sources and a variety of formats, integrating, cleaning and data preparation can be challenging and time consuming. If the process is not managed properly, it can greatly slow down speed to insights and thus the overall value of the data. 
  2. Storing – Prior to the cloud, storing Big Data was a much greater challenge for organizations. Today, Big Data is typically stored in a data lake vs data datawarehouse, as the former supports unstructured and semi-structured data types whereas the latter only supports structured data. 
  3. Sharing & Data Governance – Ensuring that everyone in an organization is utilizing the same datasets and definitions can be a great challenge, only exasperated by the largeness of Big Data datasets. 
  4. Query Performance – Querying such large datasets can take a long time, and often time-to-insight is a common challenge reported by analysts. 
  5. Analysis – Making sure that the data and analysis are aligned and answering business needs.

Ultimately, the results of all these challenges can mean higher costs. Organizations must find and establish practices to collect, store, process, analyze and derive insights from their big data in a cost effective manner. 

Best Practices

  • Define business objectives – Because of the massive amount and complexity of Big Data datasets, it is easy to get lost in the data. Prior to beginning a project, it is essential to map out the key business requirements and define a set of questions you intend to answer using the data and analysis. 
  • Identify data sources needed – Simply because the data exists does not necessarily mean you should collect or even analyze it. Once your business objectives are defined, map out what data and data sources are required.
  • Create a data strategy and governance policy – A well-defined analytics strategy will inform decision-making across the organization. An essential component of this strategy is a clear data governance framework for managing Big Data. Document how data will be collected, stored, processed, cleaned, prepared and accessed. In addition, create a plan for who will have access to the data and identify any staffing and technology requirements and gaps.
  • Select tools – Does your organization have all the tools necessary to properly and efficiently utilize Big Data? Where will data be stored? Which Business Intelligence tools will be used to analyze the data?
  • Solidify roadmap – Create a timeline with key deliverables mapped to business objectives. This way everyone can keep on track and expectations are managed from both the staffing and executive levels.

Additional Resources:

The Practical Guide to Using a Semantic Layer for Data & Analytics
Semantic Layer - diagram