5 must-read books for data engineers
If you are interested in the field of data engineering, there are 5 books that are worth reading and having in your bookshelf. I start with the most elementary and classic one that allows you to get a good understanding of the data science and data engineering and continue with other advanced books.
The first book is Data Science from Scratch by Joel Grus. This book provides a comprehensive introduction to data science, including chapters on data engineering topics such as data cleaning and preparation, data storage, and data processing. Because of its simplicity and easy readability, it’s a perfect book to start as your first book if you want to build a career in data.

The second book is Data Engineering with Python by Paul Crickard. This book provides a strong foundation in data modeling techniques and pipelining. It begins by introducing the basics of data engineering. It then covers the frameworks and tools necessary for building data pipelines to handle large datasets. Throughout the book, you will learn how to transform, clean, and perform analytics on your data, as well as how to create data pipelines and work with datasets of varying complexity. Additionally, you will learn how to design the architectures on which you can implement data pipelines using real-world examples.

Now it’s a good time to venture into big data. The third book is Big Data: Principles and Best Practices of Scalable Real-time Data Systems by Nathan Marz and James Warren. This book covers the principles and best practices of building scalable, real-time data systems. It covers topics such as data ingestion, data processing, and data storage. The book is suitable for individuals with a technical background who are interested in building large-scale data systems. It covers a range of technologies and approaches, including distributed systems, data pipelines, and data storage systems, and provides practical guidance on how to design and implement scalable data systems. The book also includes case studies and examples to illustrate key concepts and techniques.

The next stop is a classic book: Designing Data-Intensive Applications by Martin Kleppmann: This book covers the design of data systems, including topics such as data modeling, data storage, data processing, and data consistency. The book is suitable for individuals with a range of technical backgrounds who are interested in data system design. It provides a broad overview of data system design and covers a range of technologies and approaches, including databases, data pipelines, and distributed systems.
The book is organized into four parts: Data Models, Data Storage, Data Processing, and Data Integration. The Data Models section covers the design of data models, including topics such as relational data modeling, document data modeling, and graph data modeling. The Data Storage section covers the design of data storage systems, including topics such as indexing, transactions, and data replication. The Data Processing section covers the design of data processing systems, including topics such as batch processing, stream processing, and data pipelines. The Data Integration section covers the design of data integration systems, including topics such as data integration patterns, ETL, and data lakes.
Throughout the book, the author provides practical guidance and examples to illustrate key concepts and techniques. The book also includes case studies and examples from a range of industries, including finance, e-commerce, and social media.

Finally, if you want to become an effective data engineer, you need to know data warehousing, so the next book is a classic book for the best practices in data warehousing: The Data Warehouse Toolkit, by Ralph Kimball and Margy Ross. It is is a comprehensive guide to building and maintaining data warehouses. It covers the entire data warehouse lifecycle, from design and architecture to implementation and maintenance. The book is organized into three parts: The Fundamentals, The Data Warehouse Toolkit Process, and The Data Warehouse Environment. The Fundamentals section covers the basics of data warehousing and covers topics such as data modeling, data preparation, and data integration. The Data Warehouse Toolkit Process section covers the process of designing and building a data warehouse, including topics such as dimensional modeling, ETL, and data staging. The Data Warehouse Environment section covers the infrastructure and tools needed to support a data warehouse, including topics such as hardware, software, and security. The book is suitable for individuals with a technical background who are interested in building and maintaining data warehouses. It provides practical guidance and includes numerous examples and case studies to illustrate key concepts and techniques.

I hope you enjoyed reading this 🙂. If you’d like to support me as a writer consider signing up to become a Medium member. It’s just $5 a month and you get unlimited access to Medium 🙏 . Before leaving this page, I appreciate if you follow me to see my future articles in your home page 👉 Also, if you are a medium writer yourself, you can join my Linkedin group. In that group, I share curated articles about data and technology. You can find it: Linkedin Group
