What is data modeling?
Data modeling is the process of creating a visual representation or a blueprint that defines the information collection and management systems of any organization. This blueprint or data model helps different stakeholders, like data analysts, scientists, and engineers, to create a unified view of the organization’s data. The model outlines what data the business collects, the relationship between different datasets, and the methods that will be used to store and analyze the data.
Why is data modeling important?
Organizations today collect a large amount of data from many different sources. However, raw data is not enough. You need to analyze data for actionable insights that can guide you to make profitable business decisions. Accurate data analysis needs efficient data collection, storage, and processing. There are several database technologies and data processing tools, and different datasets require different tools for efficient analysis.
Data modeling gives you a chance to understand your data and make the right technology choices to store and manage this data. In the same way an architect designs a blueprint before constructing a house, business stakeholders design a data model before they engineer database solutions for their organization.
Data modeling brings the following benefits:
- Reduces errors in database software development
- Facilitates speed and efficiency of database design and creation
- Creates consistency in data documentation and system design across the organization
- Facilitates communication between data engineers and business intelligence teams
What are the types of data models?
Data modeling typically begins by representing the data conceptually and then representing it again in the context of the chosen technologies. Analysts and stakeholders create several different types of data models during the data design stage. The following are three main types of data models:
Conceptual data model
Conceptual data models give a big picture view of data. They explain the following:
- What data the system contains
- Data attributes and conditions or constraints on the data
- What business rules the data relates to
- How the data is best organized
- Security and data integrity requirements
The business stakeholders and analysts typically create the conceptual model. It is a simple diagrammatic representation that does not follow formal data modeling rules. What matters is that it helps both technical and nontechnical stakeholders to share a common vision and agree on the purpose, scope, and design of their data project.
Example of conceptual data models
For example, the conceptual data model for an auto dealership might show the data entities like this:
- A Showrooms entity that represents information about the different outlets the dealership has
- A Cars entity that represents the several cars the dealership currently stocks
- A Customers entity that represents all the customers who have made a purchase in the dealership
- A Sales entity that represents the information about the actual sale
- A Salesperson entity that represents the information about all the salespeople who work for the dealership
This conceptual model would also include business requirements, such as the following:
- Every car must belong to a specific showroom.
- Every sale must have at least one salesperson and one customer associated with it.
- Every car must have a brand name and product number.
- Every customer must provide their phone number and email address.
Conceptual models thus act as a bridge between the business rules and the underlying physical database management system (DBMS). Conceptual data models are also called domain models.
Logical data model
Logical data models map the conceptual data classes to technical data structures. They give more details about the data concepts and complex data relationships that were identified in the conceptual data model, such as these:
- Data types of the various attributes (for example, string or number)
- Relationships between the data entities
- Primary attributes or key fields in the data
Data architects and analysts work together to create the logical model. They follow one of several formal data modeling systems to create the representation. Sometimes agile teams might choose to skip this step and move from conceptual to physical models directly. However, these models are useful for designing large databases, called data warehouses, and for designing automatic reporting systems.
Example of logical data models
In our auto dealership example, the logical data model would expand the conceptual model and take a deeper look at the data classes as follows:
- The Showrooms entity has fields such as name and location as text data and a phone number as numerical data.
- The Customers entity has a field email address with the format [email protected] or [email protected]. The field name can be no more than 100 characters long.
- The Sales entity has a customer’s name and a salesperson’s name as fields, along with the date of sale as a date data type and the amount as a decimal data type.
Logical models thus act as a bridge between the conceptual data model and the underlying technology and database language that developers use to create the database. However, they are technology agnostic, and you can implement them in any database language. Data engineers and stakeholders typically make technology decisions after they have created a logical data model.
Physical data model
Physical data models map the logical data models to a specific DBMS technology and use the software’s terminology. For example, they give details about the following:
- Data field types as represented in the DBMS
- Data relationships as represented in the DBMS
- Additional details, such as performance tuning
Data engineers create the physical model before final design implementation. They also follow formal data modeling techniques to make sure that they have covered all aspects of the design.
Example of physical data models
Suppose that the auto dealership decided to create a data archive in Amazon S3 Glacier Flexible Retrieval. Their physical data model describes the following specifications:
- In Sales, the sale amount is a float data type, and the date of sale is a timestamp data type.
- In Customers, the customer name is a string data type.
- In S3 Glacier Flexible Retrieval terminology, a vault is the geographical location of your data.
Your physical data model also includes additional details such as which AWS Region you will create your vault in. The physical data model thus acts as a bridge between the logical data model and the final technology implementation.
What are the types of data modeling techniques?
Data modeling techniques are the different methods that you can use to create different data models. The approaches have evolved over time as the result of innovations in database concepts and data governance. The following are the main types of data modeling:
Hierarchical data modeling
In hierarchical data modeling, you can represent the relationships between the various data elements in a tree-like format. Hierarchical data models represent one-to-many relationships, with parents or root data classes mapping to several children.
In the auto dealership example, the parent class Showrooms would have both entities Cars and Salespeople as children because one showroom has several cars and salespeople working in it.
Graph data modeling
Hierarchical data modeling has evolved over time into graph data modeling. Graph data models represent data relationships that treat entities equally. Entities can link to each other in one-to-many or many-to-many relationships without any concept of parent or child.
For example, one showroom can have several salespeople, and one salesperson can also work at several showrooms if their shifts vary by location.
Relational data modeling
Relational data modeling is a popular modeling approach that visualizes data classes as tables. Different data tables join or link together by using keys that represent the real-world entity relationship. You can use relational database technology to store structured data, and a relational data model is a useful method to represent your relational database structure.
For example, the auto dealership would have relational data models that represent the Salespeople table and Cars table, as shown here:
Salesperson ID | Name |
1 | Jane |
2 | John |
Car ID | Car Brand |
C1 | XYZ |
C2 | ABC |
Salesperson ID and Car ID are primary keys that uniquely identify individual real-world entities. In the showroom table, these primary keys act as foreign keys that link the data segments.
Showroom ID | Showroom name | Salesperson ID | Car ID |
S1 | NY Showroom | 1 | C1 |
In relational databases, the primary and foreign keys work together to show the data relationship. The preceding table demonstrates that showrooms can have salespeople and cars.
Entity-relationship data modeling
Entity-relationship (ER) data modeling uses formal diagrams to represent the relationships between entities in a database. Data architects use several ER modeling tools to represent data.
Object-oriented data modeling
Object-oriented programming uses data structures called objects to store data. These data objects are software abstractions of real-world entities. For example, in an object-oriented data model, the auto dealership would have data objects such as Customers with attributes like name, address, and phone number. You would store the customer data so that every real-world customer is represented as a customer data object.
Object-oriented data models overcome many of the limitations of relational data models and are popular in multimedia databases.
Dimensional data modeling
Modern enterprise computing uses data warehouse technology to store large quantities of data for analytics. You can use dimensional data modeling projects for high-speed data storage and retrieval from a data warehouse. Dimensional models use duplication or redundant data and prioritize performance over using less space for data storage.
For example, in dimensional data models, the auto dealership has dimensions such as Car, Showroom, and Time. The Car dimension has attributes like name and brand, but the Showroom dimension has hierarchies like state, city, street name, and showroom name.
What is the data modeling process?
The data modeling process follows a sequence of steps that you must perform repetitively until you create a comprehensive data model. In any organization, various stakeholders come together to create a complete data view. Although the steps vary based on the type of data modeling, the following is a general overview.
Step 1: Identify entities and their properties
Identify all the entities in your data model. Each entity should be logically distinct from all other entities and can represent people, places, things, concepts, or events. Each entity is distinct because it has one or more unique properties. You can think of entities as nouns and attributes as adjectives in your data model.
Step 2: Identify the relationships between entities
The relationships between the different entities are at the heart of data modeling. Business rules initially define these relationships at a conceptual level. You can think of relationships as the verbs in your data model. For instance, the salesperson sells many cars, or the showroom employs many salespeople.
Step 3: Identify the data modeling technique
After you conceptually understand your entities and their relationships, you can determine the data modeling technique that best suits your use case. For example, you might use relational data modeling for structured data but dimensional data modeling for unstructured data.
Step 4: Optimize and iterate
You can optimize your data model further to suit your technology and performance requirements. For example, if you plan to use Amazon Aurora and a structured query language (SQL), you will put your entities directly into tables and specify relationships by using foreign keys. By contrast, if you choose to use Amazon DynamoDB, you will need to think about access patterns before you model your table. Because DynamoDB prioritizes speed, you first determine how you will access your data and then model your data in the form it will be accessed.
You will typically revisit these steps repeatedly as your technology and requirements change over time.
How can AWS help with data modeling?
You can also use AWS Amplify DataStore for faster and easier data modeling to build mobile and web applications. It has a visual and code-based interface to define your data model with relationships, which will accelerate your application development.
Get started with data modeling on AWS by creating a free account today.