2023/2024 Best AI Papers — Must Read
From “Sparks of Artificial General Intelligence” in LLMs to General Purpose Robotics, these papers are a must read for anyone into AI
If you are into AI, you might want also to read my recent blog posts on AI developments, Tools and Code!
Let’s dive into it! Here are the papers I found most inspiring & impactful in 2023.
#1 Sparks of Artificial General Intelligence: Early experiments with GPT-4
- Paper link
- Summary: Fascinating first insights inside GPT-4's mind. “GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4’s performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT.”
#2 Textbooks Are All You Need
- Paper link
- Summary: This paper attempts to answer the question, ‘What are the minimum ingredients required to achieve strong emergent capabilities?’ This work is a natural sequel to GPT-4, where very large models display capabilities that compete with top-tier experts in almost all textual domains. The answer, succinctly, is that even a small volume of high-quality data can lead to very strong reasoning capabilities. The models Phi1, then Phi1.5, and finally Phi2, each with 2.7 billion parameters, are all derivatives of this research.
#3 Alpaca: A Strong, Replicable Instruction-Following Model
- Paper link
- Summary: The development of open-source Large Language Models (LLMs) surged dramatically due to this paper. Previously, it was believed that creating viable LLMs required millions of dollars and extensive computing clusters. However, Stanford demonstrated that 52,000 AI-generated instructions (sourced from GPT-3.5) could develop a model capable of mimicking the behavior of larger models. A crucial prerequisite, though, is the presence of a strong base model. Alpaca, for instance, is a fine-tuned model based on LLaMA 7B, which cost only $600 to train! Meta’s LLaMA, then LLaMA2, models fueld Open Source LLM fine-tuning since then.
#4 Segment anything
- Paper link
- Summary: Meta released the “largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks”. This is a big deal mostly because it can identify all “objects” in an image and generate masks accordingly, out of the box! (no fine-tuning needed).
- Applications: Once you have the mask of an object, you can manipulate the image easily (manually or via API) focusing on that specific object. e.g., fashion virtual try-on, objet counting, prompt based precise editing, and so on! Limitless!
#5 Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Paper link
- Summary: This is a fine-tuning game changer! Direct Preference Optimization (DPO) offers a simpler, more efficient method for fine-tuning unsupervised language models to align with human preferences. Unlike complex traditional methods, DPO uses a straightforward classification loss, avoiding extensive sampling and hyperparameter tuning. This makes DPO not only more stable and lightweight but also effective in tasks like sentiment control and summarization. DPO represents a significant advancement in fine-tuning LMs, being both time-saving and resource-efficient. It replaces traditional methods, like reinforcement learning from human feedback (RLHF)!
#6 QLoRA: Efficient Finetuning of Quantized LLMs
- Paper link
- Summary: With LLaMA models and the Alpaca approach, we still faced a challenge: how can we fine-tune these large language models (LLMs) on a single machine? This is where LoRA and then QLoRA come into play! QLoRA is an efficient finetuning method enabling the training of large language models like 65B parameter models on a single GPU. It utilizes techniques such as 4-bit quantization and Low Rank Adapters, significantly reducing memory usage. QLoRA’s main model, Guanaco, nearly matches ChatGPT’s performance on the Vicuna benchmark with much less resource requirement. The approach facilitates the finetuning of a vast range of models, demonstrating GPT-4 evaluations as effective for chatbot performance assessment. QLoRA’s findings, models, and code are publicly released, enhancing the field of language model development.
#7 RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
- Paper link
- Summary: This is the ChatGPT moment for Robotics! The study explores integrating vision-language models trained on vast Internet data into robotic control, enhancing generalization and emergent semantic reasoning. In a nutshell, it enable general purpose robotics that performs better than all other specialized models!
#8 FunSearch: Mathematical discoveries from program search with large language models
- Paper link
- Summary: Leveraging LLM hallucinations to discover new algorithmic solutions! A method combining a pretrained large language model with a systematic evaluator, has enhanced problem-solving capabilities, leading to groundbreaking discoveries in extremal combinatorics and algorithmic problems by searching for problem-solving programs rather than direct solutions.
#9 GNoME: Scaling deep learning for materials discovery
- Paper link
- Summary: This paper is of the level of AlphaFold, one of the most impactful AI paper for drug discovery. This new paper by DeepMind showcases how large-scale deep learning significantly advances the discovery of new materials, identifying over 2.2 million new stable structures and expanding the known stable materials database tenfold.
- Applications: it opens the way to a wide range of applications! To name a few, lean energy technology (with better solar cells, batteries, and so on), discovering material with unique quantum properties, nanotechnologie, electronics (with better sensors, display and lighting technologies), aerospace and automotive industries (with enhanced strength-to-weight ratios could lead to lighter, more fuel-efficient vehicles and aircraft)
What’s Next?
These papers had a tremendous impact on our AI understanding as well as day to day development. Obvisouly, this list isn’t exhaustive. I’ll keep the list updated as it lacks a few papers (e.g., music generation, video, multimodality, and so on).
Feel free to add your favorite papers in the comment section :)!






