Summary

The website presents a curated list of the most impactful AI research papers from 2023, highlighting significant advancements in AI capabilities, including language models, robotics, and materials science.

Abstract

The website article titled "2023/2024 Best AI Papers — Must Read" provides an overview of groundbreaking AI research papers that have shaped the field in 2023. These papers cover a range of topics from the early experiments with GPT-4, indicating sparks of Artificial General Intelligence (AGI), to the development of instruction-following models like Alpaca, which democratize access to powerful AI tools. The article also discusses advancements in image segmentation with "Segment anything," enabling zero-shot transfer to new tasks, and the introduction of Direct Preference Optimization for more efficient language model fine-tuning. Furthermore, it highlights QLoRA's role in enabling efficient finetuning of quantized LLMs on a single GPU and RT-2's integration of vision-language models into robotic control, marking a significant leap for general-purpose robotics. The article emphasizes the discovery of new mathematical algorithms through program search with large language models, as seen in FunSearch, and the transformative potential of GNoME in materials discovery, expanding the known materials database.

Opinions

The author believes that the papers listed represent the most inspiring and impactful AI research of 2023, signaling significant progress in the field.
The article suggests that even with a small volume of high-quality data, as seen with the Phi models, very strong reasoning capabilities can be achieved, challenging the notion that large datasets are always necessary.
There is an opinion that Alpaca's approach to using AI-generated instructions can lead to models that mimic the behavior of much larger models, which could disrupt the traditional cost and resource barriers in AI development.
The "Segment anything" model is considered a major advancement due to its ability to identify and manipulate objects in images without the need for fine-tuning.
Direct Preference Optimization is presented as a superior alternative to traditional fine-tuning methods like reinforcement learning from human feedback (RLHF), offering simplicity and efficiency.
QLoRA's method for finetuning large language models is seen as a breakthrough for accessibility, allowing training on a single GPU, which was previously unthinkable for models of such scale.
RT-2's success in transferring web knowledge to robotic control is viewed as a pivotal moment for the field of robotics, potentially leading to more general and capable robotic systems.
FunSearch's method of leveraging LLM "hallucinations" is regarded as an innovative approach to problem-solving, leading to new discoveries in mathematics and algorithms.
The impact of GNoME on materials discovery is compared to that of AlphaFold on drug discovery, indicating its potential to revolutionize various industries by enabling the discovery of new materials with desirable properties.
The author acknowledges the non-exhaustive nature of the list and invites readers to contribute their favorite papers, suggesting an ongoing commitment to tracking and sharing AI advancements.

2023/2024 Best AI Papers — Must Read

From “Sparks of Artificial General Intelligence” in LLMs to General Purpose Robotics, these papers are a must read for anyone into AI

If you are into AI, you might want also to read my recent blog posts on AI developments, Tools and Code!

AI in 2023 through 10 Images

“In 2024, Be Prepared for Another Year of Exponential Growth” — Kevin Scott CTO and executive vice president of AI at…

medium.com

Exploring Google’s Gemini: A Leap Beyond GPT-4? Everything you need to know!

Google’s recent announcement of Gemini, a family of highly capable multimodal models, has stirred the AI community…

medium.com

OpenAI’s Q* Model: A Groundbreaking Yet Controversial AI Leap

Can we imagine a self-learning AI capable of reasoning? Everything you need to know!

medium.com

Multimodal GPT-4V, Everything You Need to Know

Potential Applications, Jailbreaking, API Access & Alternatives!

medium.com

Let’s dive into it! Here are the papers I found most inspiring & impactful in 2023.

#1 Sparks of Artificial General Intelligence: Early experiments with GPT-4

Paper link
Summary: Fascinating first insights inside GPT-4's mind. “GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4’s performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT.”

#2 Textbooks Are All You Need

Paper link
Summary: This paper attempts to answer the question, ‘What are the minimum ingredients required to achieve strong emergent capabilities?’ This work is a natural sequel to GPT-4, where very large models display capabilities that compete with top-tier experts in almost all textual domains. The answer, succinctly, is that even a small volume of high-quality data can lead to very strong reasoning capabilities. The models Phi1, then Phi1.5, and finally Phi2, each with 2.7 billion parameters, are all derivatives of this research.

#3 Alpaca: A Strong, Replicable Instruction-Following Model

Paper link
Summary: The development of open-source Large Language Models (LLMs) surged dramatically due to this paper. Previously, it was believed that creating viable LLMs required millions of dollars and extensive computing clusters. However, Stanford demonstrated that 52,000 AI-generated instructions (sourced from GPT-3.5) could develop a model capable of mimicking the behavior of larger models. A crucial prerequisite, though, is the presence of a strong base model. Alpaca, for instance, is a fine-tuned model based on LLaMA 7B, which cost only $600 to train! Meta’s LLaMA, then LLaMA2, models fueld Open Source LLM fine-tuning since then.

#4 Segment anything

Paper link
Summary: Meta released the “largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks”. This is a big deal mostly because it can identify all “objects” in an image and generate masks accordingly, out of the box! (no fine-tuning needed).
Applications: Once you have the mask of an object, you can manipulate the image easily (manually or via API) focusing on that specific object. e.g., fashion virtual try-on, objet counting, prompt based precise editing, and so on! Limitless!

#5 Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper link
Summary: This is a fine-tuning game changer! Direct Preference Optimization (DPO) offers a simpler, more efficient method for fine-tuning unsupervised language models to align with human preferences. Unlike complex traditional methods, DPO uses a straightforward classification loss, avoiding extensive sampling and hyperparameter tuning. This makes DPO not only more stable and lightweight but also effective in tasks like sentiment control and summarization. DPO represents a significant advancement in fine-tuning LMs, being both time-saving and resource-efficient. It replaces traditional methods, like reinforcement learning from human feedback (RLHF)!

#6 QLoRA: Efficient Finetuning of Quantized LLMs

Paper link
Summary: With LLaMA models and the Alpaca approach, we still faced a challenge: how can we fine-tune these large language models (LLMs) on a single machine? This is where LoRA and then QLoRA come into play! QLoRA is an efficient finetuning method enabling the training of large language models like 65B parameter models on a single GPU. It utilizes techniques such as 4-bit quantization and Low Rank Adapters, significantly reducing memory usage. QLoRA’s main model, Guanaco, nearly matches ChatGPT’s performance on the Vicuna benchmark with much less resource requirement. The approach facilitates the finetuning of a vast range of models, demonstrating GPT-4 evaluations as effective for chatbot performance assessment. QLoRA’s findings, models, and code are publicly released, enhancing the field of language model development.

#7 RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Paper link
Summary: This is the ChatGPT moment for Robotics! The study explores integrating vision-language models trained on vast Internet data into robotic control, enhancing generalization and emergent semantic reasoning. In a nutshell, it enable general purpose robotics that performs better than all other specialized models!

#8 FunSearch: Mathematical discoveries from program search with large language models

Paper link
Summary: Leveraging LLM hallucinations to discover new algorithmic solutions! A method combining a pretrained large language model with a systematic evaluator, has enhanced problem-solving capabilities, leading to groundbreaking discoveries in extremal combinatorics and algorithmic problems by searching for problem-solving programs rather than direct solutions.

#9 GNoME: Scaling deep learning for materials discovery

Paper link
Summary: This paper is of the level of AlphaFold, one of the most impactful AI paper for drug discovery. This new paper by DeepMind showcases how large-scale deep learning significantly advances the discovery of new materials, identifying over 2.2 million new stable structures and expanding the known stable materials database tenfold.
Applications: it opens the way to a wide range of applications! To name a few, lean energy technology (with better solar cells, batteries, and so on), discovering material with unique quantum properties, nanotechnologie, electronics (with better sensors, display and lighting technologies), aerospace and automotive industries (with enhanced strength-to-weight ratios could lead to lighter, more fuel-efficient vehicles and aircraft)

What’s Next?

These papers had a tremendous impact on our AI understanding as well as day to day development. Obvisouly, this list isn’t exhaustive. I’ll keep the list updated as it lacks a few papers (e.g., music generation, video, multimodality, and so on).

Feel free to add your favorite papers in the comment section :)!