avatarKrystyna Waterhouse

Summary

The provided content discusses the importance of understanding databases, particularly the differences between relational and non-relational databases, for Product Managers, and emphasizes the need for basic SQL querying skills to enhance data-driven decision-making.

Abstract

The article is part of a series aimed at enhancing the technical knowledge of Product Managers, focusing on databases. It underscores the necessity for Product Managers to grasp the fundamentals of how databases work, including the distinction between relational and non-relational databases. The article argues that Product Managers should be familiar with data modeling, database design, and especially the skill of writing and interpreting SQL queries, despite some debate in the field about the necessity of these skills. It also highlights the importance of collaborating with data engineers and developers to ensure data integrity and security, and the use of data to inform product decisions and strategy. The author, drawing from personal experience as a data analyst, advocates for the power of data understanding in influencing product management outcomes and provides a cheat sheet for essential SQL queries, encouraging hands-on practice.

Opinions

  • The author believes that Product Managers should have a foundational understanding of databases to effectively work with data engineers and developers, and to make informed product decisions.
  • There is an emphasis on the debate within the Product Management community regarding the necessity of SQL skills, with the author leaning towards these skills being advantageous

Relational vs Non-Relational Databases, Querying Data, and What Product Managers Actually Need to Know

Technical Knowledge for Product Managers

Welcome to part five of a series of articles covering the technical knowledge that every Product Manager ought to know. Today I’m going to be talking about what databases are, how they work, and why product managers should understand them.

If you’d like to learn why I think this knowledge is important, then head back to my article where I discussed WHY it’s important to work on technical skills in Product Management. The other articles in the series are also linked at the bottom of this one, alongside an extended “Data” reading list.

And don’t forget to subscribe to email updates if you want to be notified when the next article is published!

What do Product Managers Actually Need to Know about Databases?

In my opinion, Product Managers should have a basic understanding of how databases work, including:

The difference between relational and non-relational databases

I’ll cover this in some detail below. This is pretty fundamental knowledge.

The basics of data modelling and database design

When setting up new features or business units, you might need to be involved with decisions on database design, something that requires both business and technical considerations.

How to write and interpret SQL queries

There is some debate within the PM community whether this is a “must have” skill for Product Managers or just a nice-to-have. It is certainly the case that this might not be a task you need to do regularly. Some organisations might already have established excellent pre-made dashboards with the required views, or an analytics setup such that the data visualisation tools remove most of the need for writing queries. You might even have a specialist Product Analyst to help you get your hands ont he data you need.

However, it’s generally an advantage to be familiar with available data sets and able to query on them directly; it will give you additional speed and flexibility when you have an urgent question or need additional data to support prioritisation. You never know when your go-to ‘data guy’ is on vacation. Besides, basic SQL fluency tends to be the best route to really understanding how relational databases work.

How to work with data engineers and developers to ensure data integrity and security.

For example — defining which personal data is sensitive and liasing between business and tech to define an approach for anonymising such data.

How to use data to inform product decisions and strategy

As a PM, you also need to be able to set goals and metrics that encourage the desired outcome, and incentivise the right behaviours.

You will also need to use data to make arguments, prioritise your roadmap, and validate hypotheses.

At a bare minimum, this means understanding basic statistical concepts. However, this topic is far too broad to cover in just one article, so I’ve included some book recommendations on this topic below.

I’m a little biased on this subject, as I used to work as a data analyst in larger tech company. However, difference in influencing power when a PM understands data and knows how to use it to their advantage is just HUGE. Once you’ve sat through a meeting with somebody trying to pitch a feature using unclear and poorly-labelled graphs, whilst completely missing an obvious confounding factor in their analysis… It’s painful. Don’t be like that.

Photo by Myriam Jessier on Unsplash

When do Product Managers Interact with Databases?

Here are some of the key touchpoints:

  1. Defining data requirements for new products and features
  2. Collaborating with data engineers and developers to design and implement data structures
  3. Monitoring and analyzing data to measure product performance and user behavior
  4. Integrating third-party data sources or APIs into the product

So, without further ado, let’s cover the basics you need to succeed!

What is a Database?

A database is an organized way of storing data. It’s likea digital filing cabinet for information that can be accessed, managed, and updated as needed. Databases can be small or large, simple or complex, and store a wide range of data types, such as text, numbers, images, and videos.

What is a Database Management System?

A Database Management System (BBMS) is an intermediary between end-users and the database, enabling users to manage and access the data stored there. It’s the engine that powers a database, providing tools for creating, modifying, and querying data. Some common DBMS include Oracle, MySQL, Microsoft SQL Server, and MongoDB.

Relational vs Non-Relational Databases

There are two main types of databases: relational and non-relational. Relational databases organize data into tables that have a predefined structure, with rows representing records and columns representing fields. Non-relational databases, on the other hand, store data in a more flexible and dynamic way, often using document or key-value stores. The choice between relational and non-relational databases depends on the type of data being stored and the specific use case.

Relational Databases (SQL databases)

Relational databases store data in a tabular, structured way. In each table, there is atleast one column. SQL is the programming language used with relational databases.

Some of the most commonly used relational databases are

  • PostgreSQL
  • Microsoft SQL Server
  • MySQL
  • Oracle

You should use a SQL databases when:

  • You need highly structured data across multiple tables
  • Data is related — when one user updates a record, every instance of the data refreshes. You don’t need to change multiple files.
  • You don’t plan to frequently change database structure
  • Accurate results for complex queries
  • Data integrity and security

They scale vertically — as you store more data, you need to increase computing power.

Non-relational Databases (NoSQL databases)

These are databases that don’t use a relational data model. They don’t store data in a tabular formula; there are no tables, rows, columns or relationships between different data points.

Some of the most commonly used non-relational databases are

  • MongoDB
  • Redis
  • HBase
  • BigTable
  • Apache Cassandra
  • Google Cloud Bigtable
  • Amazon DynamoDB

Four main types of non-relational databases are

  • Column-oriented databases — similar to relational databases, data stored in cells grouped in columns
  • Key-value stores — data storied in dictionaries in key-value pair collections. Unique keys point to a specific value.
  • Document stores use documents to hold and encode data. Typically the document is a json file, but documents within a DB can have different data types.
  • Graph databases — complex and can handle large sets of data. They represent data on a graph, which shows how different sets of data relate to each other. ie OrientDB

You should use a NoSQL database when:

  • Need for high scalability — they scale horizontally
  • Less costly initial implementation and flexibility in schema
  • You need fast and continous availability for big data

Examples of use:

  • Netflix — uses SimpleDB, HBase and Cassandra in combination. This allows Netflix to scale the operation infinitely
  • Facebook Messenger — created Cassandra to help index messages, using a person’s ID as part of a primary key
  • Uber — used Riak to power their app
Photo by charlesdeluvio on Unsplash

Cheat Sheet: The Most Important SQL Queries

I believe that — in most cases — the best way to understand how the data behind your product is structured is to actually query on the database.

Product Managers should be familiar with at least the following SQL queries:

SELECT — retrieves data from one or more tables. The query below retrieves all columns and rows from the customers table.

SELECT * FROM customers;

WHERE — Filters data based on specific criteria. The query below retrieves all columns and rows from the customer table where the country is Switzerland.

SELECT * FROM customers WHERE country='Switzerland';

JOIN — Combines data from two or more tables. This query retrieves the order ID, customer name, and order date from the “orders” table and the “customers” table, where the customer ID matches. Learn more about the different types of joins here.

SELECT orders.order_id, customers.customer_name, orders.order_date
FROM orders
JOIN customers ON orders.customer_id = customers.customer_id;

GROUP BY — Groups data based on one or more columns. This query groups customers by country and returns the count of customers in each country. Learn about the other aggregate functions such as avg and sum here.

SELECT country, COUNT(*) AS count
FROM customers
GROUP BY country;

ORDER BY — Sorts data in ascending or descending order. This query retrieves all columns and rows from the products table, sorted by the highest price first.

SELECT * FROM products
ORDER BY price DESC;

This is a tiny glimse into the querying power SQL can provide. I would really encourage you have a go yourself, if you have access to a dataset at work. Otherwise there are plenty of (free!) courses online that will offer you a chance to practice your skills on test data. There is no way better to learn than by doing!

When I wanted to play around with data analysis in R, I found a dataset of powerlifting competition results provided by OpenPowerlifting to test out linear regression. Having some existing context/knowledge about the data (I’m a competitive powerlifter!), I was able to learn a new skill whilst being engaged in a topic that interests me.

This is my challenge to you to take the next step and strengthen your data skills today ;)

Want to improve your technical knowledge? I’ll be sharing articles every week. Here are the ones published so far:

  1. What Is The Internet?
  2. Tech Stacks and Programming Languages
  3. Cloud Computing
  4. APIs
  5. Databases ← You are here

Don’t forget to subscribe for email notifications to be updated when the next article in the series goes live!

Further reading

Data Analytics
Product Management
Tech
Self Improvement
Sql
Recommended from ReadMedium