What is Data Modeling?

← Back to Glossary

Definition

Data Modeling is the practice of modeling data to enable it to be physically structured to support analytical queries that provide business insights and create advanced analytics directed to address specific business questions.  Data models are both logical and physical, representing the structural elements of an integrated dataset formed from one or more data sources, including dimensions, hierarchies, entities, attributes and metrics. 

Purpose

The purpose of data modeling is to visually and physically represent the structure of the data in an integrated form organized to conform to a common set of data query-oriented elements, such as dimensions, hierarchies, attributes and metrics. Data models are typically both conceptual, logical and physical, with the conceptual version focusing on data subjects / entities and their relationships to each other; the logical version depicts the description of the data content and relationships, whereas the physical version represents actual data stored in tables for accessing and querying.  

How is Data Modeling Used?

Data Modeling is used to visually understand and physically structure data for consistent access, querying and comprehending. 

The Benefits of a Good Data Model

The benefits of data modeling are to ensure that data is defined and structured based on business context relevance so that data created and made available physically can be used to generate meaningful insights and analytics via queries and processes that consistently organizes the data according to the model. 

Common Roles and Responsibilities Associated with Data Modeling

Roles important to the creation of data models are as follows:

  • Business Owner – There needs to be a business owner who understands the business needs for data and subsequent reporting and analysis. This to ensure accountability, actionability as well as ownership for data quality and data utility based on the data model. The business owner and project sponsor are responsible for reviewing and approving the data model. For larger, enterprise-wide datasets, including data warehousing, a governance structure should be considered to ensure cross-functional engagement and ownership for all aspects of data acquisition, modeling and usage.
  • Data Modelers – Data Modelers are responsible for each type of data model: conceptual, logical and physical. Data Modelers may also be involved with defining specifications for data transformation and loading.
  • Technical Architect – The technical architect is responsible for logical and physical technical infrastructure and tools. The technical architect works to ensure the data model is physically able to be accessed, queried and analyzed.
  • Data Analyst / Business Analyst – Often a business analyst or more recently, data analyst are responsible for defining the uses and use cases of the data, as well as providing design input to data structure and queries intended to be performed and improved. Responsibilities also include owning the roadmap for how data is going to be enhanced to address additional business questions and existing insights gaps.   

Key Business Processes Associated with Data Modeling

The process for developing and deploying data models are as follows:

  • Gathering Business Requirements – Business requirements or features and user stories (in the agile context) are defined to represent how the data is going to be used, including the business questions to be addressed, data subjects to acquire and elements of the core dimensions necessary to effectively query as well as example output reflecting query presentation.
  • Data Acquisition and Profiling – Data sources are identified, acquired and analyzed for relevance, accuracy, completeness and dimensionality.  
  • Data Modeling – Data models representing conceptual, logical and physical are created by the data modeler and reviewed and approved jointly by business and technical teams.

Common Technologies Associated with Data Modeling

Technologies involved with data modeling are as follows:

  • Data Modeling applications – These applications make it easier to create, present and instantiate conceptual, logical and physical data models.
  • Semantic Layer – Semantic layer applications enable the development of a logical and physical data model for use by business intelligence and analytics applications. The Semantic Layer ensures that the data consumed by these applications is modeled singularly, consistently and made accessible easily. Companies like AtScale also provide the capability to automate data query optimization via automated data aggregation.   

Trends / Outlook for Data Modeling

Key trends in the Data Modeling arena are as follows:

  • JSON Data Modeling – JavaScript Object Notation (JSON) simplifies the exchanging and storing of structured data and is now the standard for internet communications, whether it’s in-between IoT devices, computers, web servers, or any combination thereof. 
  • Data Model Automation  – New and improved data modeling applications will rely on machine learning (ML)-assisted processes for developing data models automatically. This means end-to-end data automation, from collection and preparation to exploration and modeling, with the latter involving identification and deployment of the correct models. To this end, model management systems are currently being developed and refined to manage production data models that need updating.
  • Industry-Specific Models – Industry-specific regulators are starting to require that data models be designed, shared (common) with transparency. Leading vendors offer industry-specific data models and frameworks with the requisite terminology, data structure designs, and reporting to help ease governance and compliance efforts. 
  • Time-Series Data Modeling – Time-series databases (TSDB) are designed for storing and querying data records associated with timestamps. These are ideal for use cases where an event occurrence(s) is the primary dimension of concern. 
  • Data Lake Models – Data lakes were created as a response to the limitations of schema-dependent data warehouses: in many cases unable to support increasingly demanding performance and scaling requirements, the need arose for centralized repositories of both structured and unstructured data capable of unrestricted, untransformed data storage.

What is Agile Data Modeling?

Agile data modeling describes a more simplified provisioning of data models, allowing business users to create their own models. This reduces or eliminates the need for human data engineers to provision data, considerably expediting the data modeling process. With agile data modeling, not only can existing queries be answered quickly and consistently, but the time savings opens the door to a dramatic expansion of the company’s data exploration and insight generation.

Requirements of Agile Data Modeling

Traditionally, data had to be tagged manually with the company’s definition of what type of data it is and what it is used for. Agile data modeling gives users a much deeper understanding of the data. More information encoded into the model, along with the appropriate UX application for conveying that information, means faster and more accurate representations of use cases. If all of your data is tagged with this level of granularity, it guarantees interoperability and data can be mixed and matched to build robust data models and drive valuable business insights.

Agile data modeling helps ensure an organization has the ability to stay competitive with fast, agile big data analytics. However, successful agile data modeling requires a detailed understanding of the data: statistics on the data, the databases involved, the load on those shared resources, use cases and intent of data consumers, security constraints, etc. Analysts therefore need platforms that are both operational in scale, and flexible enough to support the investigative nature of their jobs.

AtScale and Data Modeling

AtScale’s semantic layer provides the ability for both business users, data analytics and data modelers to easily and rapidly create a multidimensional data model of data for consistent consumption by multiple applications that provide business intelligence and analytics. Further, AtScale also provides automated data aggregation to significantly improve data query speed.

Additional Resources