My Personal Top 5 Data Science Books Set to Publish in 2024
5 Data Science Books I Can't Wait to Dive into in 2024
In this captivating article, I express my anticipation for five highly anticipated data science books scheduled for publication in 2024. This thoughtfully curated selection promises a rich exploration of key facets within the data science domain focusing on software engineering and practical Large Language Models.
This diverse and comprehensive selection of upcoming releases signifies a pivotal moment for data science enthusiasts, offering a roadmap to navigate the evolving landscape of the field in the year 2024.

Table of Contents:
- Software Engineering for Data Scientists
- Build a Large Language Model (From Scratch)
- Hands-On Large Language Models
- LLMs in Production
- Designing Large Language Model Applications
Most insights I share in Medium have previously been shared in my weekly newsletter, To Data & Beyond.
If you want to be up-to-date with the frenetic world of AI while also feeling inspired to take action or, at the very least, to be well-prepared for the future ahead of us, this is for you.
🏝Subscribe below🏝 to become an AI leader among your peers and receive content not present in any other platform, including Medium:
1. Software Engineering for Data Scientists

The first book is Software Engineering for Data Scientists by Catherine Nelson and it is expected to be released in August 2024. Since data science happens in code. Therefore writing reproducible, robust, scaleable code is key to a data science project’s success — and is essential for those working with production code. This practical book bridges the gap between data science and software engineering, clearly explaining how to apply the best practices from software engineering to data science.
Examples are provided in Python, from popular packages such as NumPy and pandas. Suppose you want to write better data science code. In that case, this guide covers the essential topics you need (and that are often missing from introductory data science or coding classes), including how to:
- Understand data structures and object-oriented programming
- Clearly and skillfully document your code
- Package and share your code
- Integrate data science code with a larger codebase
- Write APIs
- Create secure code
- Apply best practices to common tasks such as testing, error handling, and logging
- Work more effectively with software engineers
- Write more efficient, maintainable, and robust code in Python
- Put your data science projects into production
- And more
2. Build a Large Language Model (From Scratch)

The second book is Build a Large Language Model (from Scratch) by machine learning expert and author Sebastian Raschka. The book is written in MEAP style and till the point of writing this article, there are 2 chapters out of eight published.
This book is a one-of-a-kind guide to building your working LLM. In it, machine learning expert and author Sebastian Raschka reveals how LLMs work under the hood, tearing the lid off the Generative AI black box. The book is filled with practical insights into constructing LLMs, including building a data-loading pipeline, assembling their internal building blocks, and finetuning techniques. As you go, you’ll gradually turn your base model into a text classifier tool, and a chatbot that follows your conversational instructions.
Build a Large Language Model (from Scratch) teaches you how to:
- Plan and code all the parts of an LLM
- Prepare a dataset suitable for LLM training
- Finetune LLMs for text classification and with your data
- Use human feedback to ensure your LLM follows instructions
- Load pretrained weights into an LLM
3. Hands-On Large Language Models

The third book is Hands-On Large Language Models by Jay Alammar and Maarten Grootendorst. The book will be released by December 2024. AI has acquired startling new language capabilities in just the past few years.
Driven by the rapid advances in deep learning, language AI systems can write and understand text better than ever before. This trend enables the rise of new features, products, and entire industries. With this book, Python developers will learn the practical tools and concepts they need to use these capabilities today.
You’ll learn how to use the power of pre-trained large language models for use cases like copywriting and summarization; create semantic search systems that go beyond keyword matching; build systems that classify and cluster text to enable scalable understanding of large amounts of text documents; and use existing libraries and pre-trained models for text classification, search, and clusterings.
This book also shows you how to:
- Build advanced LLM pipelines to cluster text documents and explore the topics they belong to
- Build semantic search engines that go beyond keyword search with methods like dense retrieval and re-rankers
- Learn various use cases where these models can provide value
- Understand the architecture of underlying Transformer models like BERT and GPT
- Get a deeper understanding of how LLMs are trained
- Understanding how different methods of fine-tuning optimize LLMs for specific applications (generative model fine-tuning, contrastive fine-tuning, in-context learning, etc.)
- Optimize LLMs for specific applications with methods such as generative model fine-tuning, contrastive fine-tuning, and in-context learning.
4. LLMs in Production

The fourth book is LLMs in Production by Christopher Brousseau and Matthew Sharp. The book is written in MEAP style and till the point of writing this article, there are five chapters out of eleven published.
Large Language Models (LLMs) are the foundation of AI tools like ChatGPT, LLAMA, and Bard. This practical book offers clear, example-rich explanations of how LLMs work, how you can interact with them, and how to integrate LLMs into your applications. In LLMs in Production, you will:
- Grasp the fundamentals of LLMs and the technology behind them
- Evaluate when to use a premade LLM and when to build your own
- Efficiently scale up an ML platform to handle the needs of LLMs
- Train LLM foundation models and finetune an existing LLM
- Deploy LLMs to the cloud and edge devices using complex architectures like RLHF
- Build applications leveraging the strengths of LLMs while mitigating their weaknesses
5. Designing Large Language Model Applications

The last book I am looking forward to in 2024 is Designing Large Language Model Application by Suhas Pai which will be released in December 2024. Transformer-based language models are powerful tools for solving a variety of language tasks and represent a phase shift in the field of natural language processing. However, the transition from demos and prototypes to full-fledged applications has been slow. With this book, you’ll learn the tools, techniques, and playbooks for building useful products that incorporate the power of language models.
Experienced ML researcher Suhas Pai provides practical advice on dealing with commonly observed failure modes and counteracting the current limitations of state-of-the-art models. You’ll take a comprehensive deep dive into the Transformer architecture and its variants. And you’ll get up-to-date with the taxonomy of language models, which can offer insight into which models are better at which tasks.
You’ll learn:
- Clever ways to deal with failure modes of current state-of-the-art language models, and methods to exploit their strengths for building useful products
- How to develop an intuition about the Transformer architecture and the impact of each architectural decision
- Ways to adapt pretrained language models to your domain and use cases
- How to select a language model for your domain and task from among the choices available, and how to deal with the build-versus-buy conundrum
- Effective fine-tuning and parameter efficient fine-tuning, and few-shot and zero-shot learning techniques
- How to interface language models with external tools and integrate them into an existing software ecosystem
If you like the article and would like to support me, make sure to:
- 👏 Clap for the story (50 claps) to help this article be featured
- Subscribe to To Data & Beyond Newsletter
- Follow me on Medium
- 🔔 Follow Me: LinkedIn |Youtube | GitHub | Twitter
Subscribe to my newsletter To Data & Beyond to get full and early access to my articles:
Are you looking to start a career in data science and AI and do not know how? I offer data science mentoring sessions and long-term career mentoring:
- Mentoring sessions: https://lnkd.in/dXeg3KPW
- Long-term mentoring: https://lnkd.in/dtdUYBrM
