avatarYulia Kosarenko

Summary

The article outlines common mistakes made when creating Entity Relationship Diagrams (ERDs) and provides guidance on how to avoid them to ensure clear and effective data models.

Abstract

Data modelling is crucial for various professionals working with data, and even simplistic ERDs can greatly aid in data analysis and understanding relationships between data groups. The article, based on the author's experience teaching business analytics, identifies typical ERD errors, such as using cryptic names, pluralizing entity names, including too many words in names, confusing entities with categories or states, omitting identifying or time attributes, mixing up attributes with values, establishing one-directional relationships, misplacing attributes in parent-child links, neglecting one-to-many relationships, confusing metrics with attributes, and mixing up modelling notations. It emphasizes the importance of singular entity names, clear and concise naming conventions, proper categorization, and the correct representation of relationships and attributes to create effective and error-free ERDs.

Opinions

  • The author stresses that descriptive and simple entity names are crucial for models that require minimal explanation.
  • Entity names should be singular as each entity represents a category of similar things.
  • Entity names should be concise, avoiding unnecessary words like "entity," "object," or "data."
  • The author advises against creating separate entities for categories when attributes can suffice.
  • The article suggests that an entity's state should be represented as an attribute rather than as separate entities.
  • Identifying attributes are deemed essential for distinguishing between instances of the same entity.
  • Time attributes are highlighted as critical for tracking events or transactions.
  • The author cautions against including possible attribute values on an ERD, advocating for the use of code tables instead.
  • Relationships in ERDs should be bidirectional, with clear cardinality indications.
  • In one-to-many relationships, the child entity should contain the link to the parent entity.
  • Metrics should not be confused with attributes, as metrics are calculated and can change, whereas attributes are stable characteristics.
  • Consistency in modelling notation is recommended to improve clarity and reuse of diagrams across an organization.

Common ERD Mistakes

How to create entity relationship diagrams that make sense

All images by the author

Data modelling is not only for data architects and software engineers. Anyone who works with data, including data analysts, business analysts, and business intelligence designers, needs to understand and be able to create at least simple data models.

Even simplistic entity relationship diagrams (ERDs) can help support analysis and data discovery, identify crucial relationships and clarify important groups of data.

Sometimes, a conceptual model is sufficient, and on other occasions, a starter logical model with a few attributes will work best to support business & data analysis activities. See this article and video if you want to learn more.

As I taught cohort after cohort of students in a business analytics program, I saw them struggle with the same questions, and make the same mistakes over and over when creating their first ERDs. This article is a summary of typical ERD mistakes and how to avoid them.

Before we get there, let’s recap the main elements of the diagram.

ERD Building Blocks

Entity: A person, thing, event or concept about which the business keeps data.

Examples: customer, product, order, supplier, employee, department, interaction.

Attribute: a distinct characteristic of an entity for which data is maintained.

Examples: name, address and cell number could be the attributes of a customer entity; product entity attributes could be product name, price, colour and size.

Relationship: an association between entities that represents how the entities relate to each other.

Examples: a product is supplied by a supplier; an order contains products; a department is composed of employees.

Next, let’s review typical mistakes as they relate to each of the ERD building blocks: entities, attributes, and relationships.

1. Cryptic entity names.

The main purpose of a conceptual data model is to understand the main groups of data and their relationships, therefore, descriptive names are important.

The best model is the one that requires little commentary. Always aim to give data entities simple and clear names that would be relevant and easily understood by business stakeholders.

2. Plural entity names.

An entity is a category of similar things or concepts and thus must be singular. Every instance of an entity is one thing that belongs to the category.

Take, for example, an entity called “Product”. Instances of this entity represent different products sold e.g. hammer, saw, and screwdriver. Each instance represents one product. Data about multiple products e.g. a product catalog will consist of multiple instances of the entity “Product”.

3. Too many words in entity names

The entity name should reflect the essence of the object it represents. Avoid additional superfluous nouns such as entity, object, record, information, or data.

Use:

  • “Customer” instead of “Customer_Information”
  • “Order” instead of “OrderTracking”
  • “Product” instead of “Product_entity”
  • “Invoice” instead of “InvoiceAttributes”

4. Confusing entities and categories

Similar things (entities) will have many common characteristics (attributes). For example, each product can be characterized by a product identifier, product name, model, and price.

Often we need to categorize similar objects such as using product categories: outerwear, loungewear, shoes, and athletic clothing.

This does not imply that each of these product categories should be a new entity. If they share many similar attributes and we manage the data about products of all categories consistently, there is no need to multiply data entities.

Data models should be as simple as it is reasonable.

Moreover, key data entities can often be categorized in more than one way. For example, products can be categorized based on:

  • the purpose of the product (outerwear, lounge wear, shoes and athletic clothing)
  • target consumers (babies, children, adult clothing)
  • season (winter vs summer)
  • price range (mass market vs luxury)

It will make no sense to manage all these categories and subcategories as separate entities. Rather, product categories should be treated as attributes and used as dimensions when grouping and aggregating data for analysis.

5. Confusing entities and states

Key business entities often have a life cycle with distinct states. For example, an order may progress through the following states as part of its life cycle:

  • new
  • approved
  • pending
  • cancelled
  • completed

Just like in the previous example with categories, the state is a characteristic of an order. At any time, an order may be in only one state. This makes it appropriate to model state as an attribute.

6. Missing identifying attributes

Most entities require an identifier, a name, or both. This is how we will distinguish the data about each instance of the same entity when storing business data.

How can you tell one customer from another? By their name and customer number.

How will you find a correct record in the product catalog? By a product ID, product name or model.

If you want to refer to a specific location, how will you do it? Does each location have a name?

Always consider how individual instances of the same entity will be identified. Missing unique identifiers is one of the most frequent root causes of data issues.

7. Missing time attributes

When modelling entities that represent events or transactions, the time attributes must be present.

For example, when customers are placing orders, it’s essential to track order dates. When a customer is making a credit card purchase in a store, the date and time of the credit card transactions are captured as part of the Payment entity. When a call center receives and responds to calls, the date and time of each call are critical attributes for tracking the interactions.

Date and time attributes are necessary for tracking operational metrics, automating business flows, and are frequently used as search criteria.

8. Confusing attributes with values

This is a true rookie mistake, and it is rooted in our desire to capture what we know — in this case, possible attribute values.

When creating data models, learn to focus on the characteristics you need to track, not the values of these characteristics.

Thus, as in previous examples, indicate on the model that each product has a type, or each order has a status. Permissible values of such categorical attributes do not belong on an ERD, as they represent data — in this case, reference data.

A traditional way to capture permissible values is in code tables that will consist of a category identifier and its description, as shown in the examples below.

If you are in doubt, just ask yourself this question: What if there were twenty permissible values? or a hundred? or a thousand? Would you have a thousand different attributes in this entity? This mental exercise will usually settle the argument.

9. One-directional relationship

In data modelling, we consider both directions of relationships between data entities. More often than not, the relationships (their cardinalities) are not symmetrical, e.g. differ depending on the direction.

The most frequent relationship in data models is one-to-many, or so-called hierarchical (parent & child) relationship.

For example, a college student must earn multiple credits while studying at a college, however, each credit belongs to one and only one student.

Each course will require the completion of multiple assignments, however, each assignment is associated with one specific course.

Each diploma program is administered by one and only one college, however, each college will administer multiple programs.

Always validate that the ERD indicates cardinality on both sides of a relationship.

10. Incorrect placement of attributes linking parent and child

A parent and child relationship (one-to-many association) requires that we link the parent and child entity, and there is only one correct way to do it: the child has an attribute linking it to the parent.

Consider an example of a customer placing one or more orders:

The parent entity (customer) can be associated with multiple child entities (orders).

A child entity (order) is associated with one parent (customer).

Therefore, each order entity must have a link to the parent — the customer and not the other way around.

If you get confused, ask yourself — what if this customer placed a thousand orders? or a million? what would the data look like?

11. Missing one-to-many relationships

This point expands on the previous tip. A one-to-many relationship exists between two entities. Instead, it is a mistake to try and “squeeze” the information about “many” into “one” as in the example below.

Here is a helpful tip: if you are tempted to name an attribute using a plural noun, it is likely that a one-to-many relationship is hiding there. Pull out the child entity and make it separate.

12. Confusing metrics and attributes

It is in our human nature to summarize things. When we model data, it is hard not to think about the next steps — how will we analyze this data and report the results.

This causes another frequent mistake — confusing metrics with entity attributes.

A metric is a result of a calculation or summary statistics; it can be calculated at different times, for different criteria, and with different results.

For example, a student’s average grade will change with each new course completion. An employee’s length of tenure will increase with time.

Thus, a metric is different from an attribute which is a relatively stable characteristic of an entity instance and not the result of a calculation dependent on data in other entities.

Metrics belong in reports, dashboards and business intelligence outputs, therefore may be included in dimensional models, but they do not belong in relational data models such as ERDs.

13. Mixing up modelling notations

Finally, any modelling requires consistency, in particular in using modelling notations.

There are several ERD notations available, and most organizations apply discipline to ensure that everyone uses the same notation to improve shareability, clarity, and reuse across the organization.

Always confirm the accepted modelling notation and then use it.

Do not mix up different notations in one diagram.

For more consistency and clarity tips, refer to this article.

These ERD mistakes are also summarized in my video.

For more data management and analysis knowledge and skills check out my new course Data Management & Analytics Fundamentals.

Website | Contact | Training | Mentorship | Courses | Job Interview Clinic | Books | Downloads | LinkedIn | YouTube | Instagram | Twitter

Data
Technology
Database
Data Analysis
Data Modeling
Recommended from ReadMedium