Are you new to data modeling, an analytics engineer trying to improve your craft, or an experienced data engineer that hasn’t used conceptual data models very much? Whatever your experience, I will share the best use of conceptual data models to help add clarity and ensure the success of your next data project.
A conceptual data model is a technology-agonistic, high-level, abstract representation of the data an organization uses, or intends to use, in its business operations. It provides a big-picture view of an organization's data requirements without diving into technical details of how the data will be stored or manipulated. The goal of the conceptual model is to create a shared understanding of the business by capturing the essential concepts of a business or organization.
So, what do “shared understanding” and “essential concepts” mean? A shared understanding of the business ensures that everyone has the same definitions of the essential concepts including which are included or excluded. Essential concepts mean roles, people, places, or processes, that are an integral part of the business model. Remember to think big, start small, and scale fast as outlined in my self-service BI article.
When modeling data, conceptual data models are the first of three primary types of data models. Those three are conceptual, logical, and physical. Stay tuned for my next article where we’ll define logical and physical models and discuss the key differences.
Two primary artifacts arise from the process of creating a conceptual data model—the actual CDM diagram and the definitions of the essential concepts. These artifacts will be continuously modified and used to communicate the shared understanding of the business.
The real value of the conceptual data model is that it:
Links the business drivers, strategic goals, and strategies with business questions, facts, and qualifiers
Creates a shared understanding between technology and the business, enabling clear communications and fostering debate on the essentials
Improves clarity and precision of the data model going forward
Creates a verifiable context and a boundary to the scope of the problem
Figure 1 - Transactional CDM with only entities and relationships
The simple diagram in Figure 1 demonstrates the essential components of a hotel reservation system including people (customers and guests), physical buildings (hotel), sub-divisions of the property (rooms), and the intersection of room and guest (reservation).
The cardinality of the relationships between entities is not defined in Figure 1. It only defines the direction of the relationships. Some relationships are bidirectional, meaning many-to-many. While cardinality is important, it is usually not a detail or complexity that impacts the conceptual models. The fact that the relationship is present is the essential aspect.
Figure 2 - Transactional CDM with entities, relationships, and identifiers
Figure 2 demonstrates a slightly different style with the addition of critical identification for the entities such as the room and confirmation number. However, there are no data types. While the relationships show cardinality, it lacks a descriptor. The visuals of a conceptual model are much less rigid than logical and physical models. Those models are rooted in formal languages and formal definitions.
The goal of a CDM is to capture the essence of the business and communicate that with a broad audience. So, a technical audience may not object to a CDM that aligns closely with formal data diagramming. In contrast, a non-technical business-oriented audience may want images of hotels, stick figures for customers, and stories about relationships.
The main takeaways are that:
1. The audience matters
2. There is no single best format for diagram
3. The diagram should capture the birds-eye of the business, the essentials.
We are still missing the business definitions that constitute our conceptual data model. These definitions describe the entity and the relationship. Here is a quick example of what those may look like.
Figure 3 - Definitions of the entities and relationships
Last but certainly not least, let’s discuss the validation of our shared understanding. Our business stakeholders should be able to read the definitions, view the diagram, which is typically only a single page, and answer their key business questions. Conceptual data models should be bounded or “regularized to the page.”1 Better stated: if it doesn’t make the page, it doesn’t make the business.
Collaboration is vital to making this process valuable and productive; conceptual data models are nothing if not a collaboration with the business stakeholders. Using an iterative process that involves the stakeholders throughout the process ensures that any changes to the model are small mid-course adjustments. It also ensures that the CDM reflects the business goals and that model validation is simple and easy to complete.
The business object will dictate how the conceptual data modeling process is conducted. Are you building a transactional model for a mobile application? How about an analytics model for a line of business? Or a warehouse model meant to serve as an enterprise data warehouse?
Remember, the conceptual data model is independent of the underlying data platform technology, but its use varies greatly and will impact the modeling process. Here are a few types of conceptual model examples.
Developing a CDM for a transactional system usually requires understanding the unique requirements for speed, transaction integrity, customer experience, scalability, and ease of use. These types of systems are very common in online retail, reservations, inventory, and even financial services. See Figure 1 and Figure 2 for an example of a hotel.
It’s crucial to understand that all conceptual data models focus on the entities like customer, room, and reservation. It also identifies key attributes like customer id and confirmation number. Lastly, it identifies the business interactions between the entities. It doesn’t focus on actual tables, column names, data types, or the PK/FK relationships. All of which are defined in business definitions. See example in Figure 3.
Figure 4 - CDM for analytics
The CDM for an analytical use case often focuses on measuring the business process with quantitative and qualitative measures. The entities that we are focusing on here are aggregates and categories. For this reason, it is common for the analytical CDM to look like a star schema with facts and dimensions.
Recall, the business purpose defines the activities and the output of the model. It is common for analytical scenarios to develop a matrix of facts and qualifiers, unsurprisingly called a Fact Qualifier Matrix (FQM). The FQM defines three concepts:
Measure - the aggregate that our business needs to understand the business process
Qualifier - the categories, grouping, and criteria for the measures
Intersection - the connection that shows which qualifiers apply to which measure
Figure 5 - Fact Qualifier Matrix
Creating a FQM is not difficult, and it will inform and validate your conceptual data model. To create an FQM, simply ask your stakeholders:
What measures do you use to understand the business?
How to qualify, group, or categorize the measures?
How granular does the measure need to be? For example, do you calculate sales daily, hourly, monthly?
For each measure, determine if the qualifier applies that measure.
As you can see in our example, a non-technical person can easily understand our diagram and FQM. They can see that the system will measure revenue over time, by hotel, by room, and by customer. The FQM also serves as a valuable tool to verify and validate the model.
Enterprise CDMs introduce additional challenges due to the increased number of stakeholders, lines of business, and the complexity of larger organizations. But that’s not a problem because we have “subject areas.”
Subject areas enable you to subdivide your model into separate logical groups or domains. Data modeling is an iterative process—even more so when using subject areas. To effectively use subject areas:
Engage the business leaders and stakeholders to identify common subject areas.
Prioritize which subject areas should be addressed and when.
For each subject area, repeat the process of identifying entities, attributes, and relationships.
Once the CDM for a specific subject area is complete, logical and physical modeling can start. It is not necessary to complete the entire enterprise model before moving on to logical and physical modeling.
As you can see in Figure 5, the model focuses on the reservation subject area with links to the other subject areas like marketing and payment. Using subject areas in this way ensures that you maintain the scope and assure all stakeholders see how their concerns are being addressed.
If you are the type of human that reads the ending of a book first, then this section is for you. Here are some key takeaways:
Conceptual Data Models provide the big-picture view of an organization's data requirements without diving into technical details—maintain your scope.
CDMs capture the essence of the business problem and will align with future logical and physical data models.
The primary goal of the CDM is to create a shared understanding between technology and the business team, enabling clear communications and fostering debate on the essential concepts.
Definitions of entities, relationships, and core concepts are part of the conceptual data model. The model is not just another diagram.
Represent the CDM visually and convey its contents to your stakeholders using the communication methods and style that they find most engaging.
Building your conceptual data model is the first and arguably the most important step in delivering a successful data initiative. Often, we’re in a hurry to implement the physical model, but sometimes you need to slow down to speed up. Building trust and confidence with your business stakeholders will ensure you deliver better and faster data models in the future.
So, on your next data modeling project, try it out. Create a conceptual data model, gain the trust and confidence of your stakeholders, and deliver real business value.
ThoughtSpot can help you turn your data models into accessible, AI-Powered analytics—helping you bring data into the boardroom and empowering business users to answer ad-hoc queries using natural language search. See for yourself when you start a 30-day free trial.
1[Charles] “Darwin was no [Ben] Franklin, adding assorted considerations for days. Despite the seriousness with which he approached this life-changing choice, Darwin made up his mind exactly when his notes reached the bottom of the diary sheet. He was regularizing to the page. This is reminiscent of both Early Stopping and the Lasso: anything that doesn’t make the page doesn’t make the decision.”
Christian, Brian; Griffiths, Tom. Algorithms to Live By (p. 168). Henry Holt and Co.. Kindle Edition.