Relational vs Non-Relational Databases, Querying Data, and What Product Managers Actually Need to Know

Technical Knowledge for Product Managers

Welcome to part five of a series of articles covering the technical knowledge that every Product Manager ought to know. Today I’m going to be talking about what databases are, how they work, and why product managers should understand them.

If you’d like to learn why I think this knowledge is important, then head back to my article where I discussed WHY it’s important to work on technical skills in Product Management. The other articles in the series are also linked at the bottom of this one, alongside an extended “Data” reading list.

Becoming a (more) Technical Product Manager, Part I: Diagnosing the Problem

Why become more technical? And what do I need to learn?

medium.com

And don’t forget to subscribe to email updates if you want to be notified when the next article is published!

What do Product Managers Actually Need to Know about Databases?

In my opinion, Product Managers should have a basic understanding of how databases work, including:

The difference between relational and non-relational databases

I’ll cover this in some detail below. This is pretty fundamental knowledge.

The basics of data modelling and database design

When setting up new features or business units, you might need to be involved with decisions on database design, something that requires both business and technical considerations.

How to write and interpret SQL queries

There is some debate within the PM community whether this is a “must have” skill for Product Managers or just a nice-to-have. It is certainly the case that this might not be a task you need to do regularly. Some organisations might already have established excellent pre-made dashboards with the required views, or an analytics setup such that the data visualisation tools remove most of the need for writing queries. You might even have a specialist Product Analyst to help you get your hands ont he data you need.

However, it’s generally an advantage to be familiar with available data sets and able to query on them directly; it will give you additional speed and flexibility when you have an urgent question or need additional data to support prioritisation. You never know when your go-to ‘data guy’ is on vacation. Besides, basic SQL fluency tends to be the best route to really understanding how relational databases work.

How to work with data engineers and developers to ensure data integrity and security.

For example — defining which personal data is sensitive and liasing between business and tech to define an approach for anonymising such data.

How to use data to inform product decisions and strategy

As a PM, you also need to be able to set goals and metrics that encourage the desired outcome, and incentivise the right behaviours.

You will also need to use data to make arguments, prioritise your roadmap, and validate hypotheses.

At a bare minimum, this means understanding basic statistical concepts. However, this topic is far too broad to cover in just one article, so I’ve included some book recommendations on this topic below.

I’m a little biased on this subject, as I used to work as a data analyst in larger tech company. However, difference in influencing power when a PM understands data and knows how to use it to their advantage is just HUGE. Once you’ve sat through a meeting with somebody trying to pitch a feature using unclear and poorly-labelled graphs, whilst completely missing an obvious confounding factor in their analysis… It’s painful. Don’t be like that.

When do Product Managers Interact with Databases?

Here are some of the key touchpoints:

Defining data requirements for new products and features
Collaborating with data engineers and developers to design and implement data structures
Monitoring and analyzing data to measure product performance and user behavior
Integrating third-party data sources or APIs into the product

So, without further ado, let’s cover the basics you need to succeed!

What is a Database?

A database is an organized way of storing data. It’s likea digital filing cabinet for information that can be accessed, managed, and updated as needed. Databases can be small or large, simple or complex, and store a wide range of data types, such as text, numbers, images, and videos.

What is a Database Management System?

A Database Management System (BBMS) is an intermediary between end-users and the database, enabling users to manage and access the data stored there. It’s the engine that powers a database, providing tools for creating, modifying, and querying data. Some common DBMS include Oracle, MySQL, Microsoft SQL Server, and MongoDB.

Relational vs Non-Relational Databases

There are two main types of databases: relational and non-relational. Relational databases organize data into tables that have a predefined structure, with rows representing records and columns representing fields. Non-relational databases, on the other hand, store data in a more flexible and dynamic way, often using document or key-value stores. The choice between relational and non-relational databases depends on the type of data being stored and the specific use case.

Relational Databases (SQL databases)

Relational databases store data in a tabular, structured way. In each table, there is atleast one column. SQL is the programming language used with relational databases.

Some of the most commonly used relational databases are

PostgreSQL
Microsoft SQL Server
MySQL
Oracle

You should use a SQL databases when:

You need highly structured data across multiple tables
Data is related — when one user updates a record, every instance of the data refreshes. You don’t need to change multiple files.
You don’t plan to frequently change database structure
Accurate results for complex queries
Data integrity and security

They scale vertically — as you store more data, you need to increase computing power.

Non-relational Databases (NoSQL databases)

These are databases that don’t use a relational data model. They don’t store data in a tabular formula; there are no tables, rows, columns or relationships between different data points.

Some of the most commonly used non-relational databases are

MongoDB
Redis
HBase
BigTable
Apache Cassandra
Google Cloud Bigtable
Amazon DynamoDB

Four main types of non-relational databases are

Column-oriented databases — similar to relational databases, data stored in cells grouped in columns
Key-value stores — data storied in dictionaries in key-value pair collections. Unique keys point to a specific value.
Document stores use documents to hold and encode data. Typically the document is a json file, but documents within a DB can have different data types.
Graph databases — complex and can handle large sets of data. They represent data on a graph, which shows how different sets of data relate to each other. ie OrientDB

You should use a NoSQL database when:

Need for high scalability — they scale horizontally
Less costly initial implementation and flexibility in schema
You need fast and continous availability for big data

Examples of use:

Netflix — uses SimpleDB, HBase and Cassandra in combination. This allows Netflix to scale the operation infinitely
Facebook Messenger — created Cassandra to help index messages, using a person’s ID as part of a primary key
Uber — used Riak to power their app

Cheat Sheet: The Most Important SQL Queries

I believe that — in most cases — the best way to understand how the data behind your product is structured is to actually query on the database.

Product Managers should be familiar with at least the following SQL queries:

SELECT — retrieves data from one or more tables. The query below retrieves all columns and rows from the customers table.

SELECT * FROM customers;

WHERE — Filters data based on specific criteria. The query below retrieves all columns and rows from the customer table where the country is Switzerland.

SELECT * FROM customers WHERE country='Switzerland';

JOIN — Combines data from two or more tables. This query retrieves the order ID, customer name, and order date from the “orders” table and the “customers” table, where the customer ID matches. Learn more about the different types of joins here.

SELECT orders.order_id, customers.customer_name, orders.order_date
FROM orders
JOIN customers ON orders.customer_id = customers.customer_id;

GROUP BY — Groups data based on one or more columns. This query groups customers by country and returns the count of customers in each country. Learn about the other aggregate functions such as avg and sum here.

SELECT country, COUNT(*) AS count
FROM customers
GROUP BY country;

ORDER BY — Sorts data in ascending or descending order. This query retrieves all columns and rows from the products table, sorted by the highest price first.

SELECT * FROM products
ORDER BY price DESC;

This is a tiny glimse into the querying power SQL can provide. I would really encourage you have a go yourself, if you have access to a dataset at work. Otherwise there are plenty of (free!) courses online that will offer you a chance to practice your skills on test data. There is no way better to learn than by doing!

When I wanted to play around with data analysis in R, I found a dataset of powerlifting competition results provided by OpenPowerlifting to test out linear regression. Having some existing context/knowledge about the data (I’m a competitive powerlifter!), I was able to learn a new skill whilst being engaged in a topic that interests me.

This is my challenge to you to take the next step and strengthen your data skills today ;)

Want to improve your technical knowledge? I’ll be sharing articles every week. Here are the ones published so far:

Don’t forget to subscribe for email notifications to be updated when the next article in the series goes live!