AlphaDev: A Revolutionary AI System That Discovers Faster Sorting Algorithms

AlphaDev’s Breakthrough in Sorting Algorithms DeepMind’s artificial intelligence system, AlphaDev, has made a significant breakthrough in computer science by discovering faster sorting algorithms. These algorithms have outperformed those developed by scientists and engineers over decades, marking a significant milestone in the evolution of computing. The details of this groundbreaking discovery are outlined in a paper published in Nature.
The Increasing Demand for Computation and Energy
As we continue to push the boundaries of digital society, the demand for computation and energy use is increasing. For the past five decades, we have relied on improvements in hardware to keep up with this demand. However, as microchips approach their physical limits, improving the code that runs on them becomes increasingly critical. This is particularly important for the algorithms that make up the code running trillions of times a day.
AlphaDev’s Role in Discovering Enhanced Algorithms
AlphaDev, an artificial intelligence (AI) system that uses reinforcement learning has been instrumental in discovering enhanced computer science algorithms. One of its most notable achievements is the discovery of a faster algorithm for sorting, a method for ordering data. Sorting algorithms are ubiquitous, underpinning everything from ranking online search results and social posts to how data is processed on computers and phones. The discovery of better algorithms using AI will revolutionize how we program computers and profoundly impact all aspects of our increasingly digital society.
Open-Sourcing the New Sorting Algorithms
To make these new sorting algorithms accessible to the wider community, DeepMind has open-sourced them in the main C++ library. This is the first change to this part of the sorting library in over a decade and the first time an algorithm designed through reinforcement learning has been added.
AlphaDev’s Unique Approach to Algorithm Discovery
AlphaDev’s approach to discovering these faster algorithms was unique. Instead of refining existing algorithms, it started from scratch and looked at the computer’s assembly instructions. Assembly instructions are used to create binary code for computers to execute. While developers typically write in coding languages like C++, or high-level languages, this code must be translated into ‘low-level’ assembly instructions for computers to understand.
The Potential of Low-Level Coding
DeepMind believes many improvements exist at this lower level that may be difficult to discover in a higher-level coding language. Computer storage and operations are more flexible at this level, so there are significantly more potential improvements that could impact speed and energy usage.
The Assembly Game: A Training Environment for AlphaDev
To train AlphaDev to uncover new algorithms, DeepMind transformed sorting into a single-player ‘assembly game.’ In this game, at each turn, AlphaDev observes the algorithm it has generated and the information contained in the central processing unit (CPU). Then it plays a move by choosing an instruction to add to the algorithm.
The Challenges of the Assembly Game
The assembly game is incredibly challenging because AlphaDev has to efficiently search through many possible combinations of instructions to find an algorithm that can sort and is faster than the current best one. The number of possible combinations of instructions is similar to the number of particles in the universe or the number of possible combinations of moves in games of chess and Go. A single wrong move can invalidate the entire algorithm.
AlphaDev’s Success and Novel Approaches
Despite these challenges, AlphaDev successfully discovered faster sorting algorithms that led to improvements in the LLVM libc++ sorting library. These improvements were up to 70% faster for shorter sequences and about 1.7% faster for sequences exceeding 250,000 elements. But AlphaDev didn’t just find faster algorithms; it also uncovered novel approaches. Its sorting algorithms contain new sequences of instructions10. The Impact of New Instruction Sequences The new sequences of instructions discovered by AlphaDev save a single instruction each time they’re applied. This can have a significant impact as these algorithms are used trillions daily.
AlphaDev’s Application to Hashing Algorithms
After its success with sorting algorithms, DeepMind tested whether AlphaDev could generalize and improve a different computer science algorithm: hashing. Hashing is a fundamental algorithm in computing used to retrieve, store, and compress data.
When applied to the 9–16 bytes range of the hashing function, the algorithm that AlphaDev discovered was 30% faster.
This is a significant improvement because hashing is fundamental in many computer systems, including databases and file systems.
Reinforcement Learning and the Assembly Game
AlphaDev’s approach to discovering these new algorithms is a testament to the power of reinforcement learning. Reinforcement learning is a type of machine learning where an agent learns to make decisions by acting in an environment to achieve a goal. The agent receives rewards or penalties for its actions and uses this feedback to improve future decisions. In the case of AlphaDev, the goal was to find the fastest sorting and hashing algorithms, and the environment was the assembly game.
The Complexity of the Assembly Game Environment
The assembly game is a complex environment in which AlphaDev has to make a series of decisions (i.e., choosing assembly instructions) to build an algorithm. The reward is based on how fast the resulting algorithm can sort or hash data. By playing this game millions of times, AlphaDev could learn which sequences of assembly instructions lead to the fastest algorithms.
The Challenge of Exploration and Exploitation in Reinforcement Learning
One of the key challenges in reinforcement learning is the trade-off between exploration and exploitation. In the early stages of learning, AlphaDev needs to explore a wide range of possible algorithms to understand the environment. However, as it learns more about the environment, it must exploit this knowledge to find the fastest algorithms.
Proximal Policy Optimization: Balancing Exploration and Exploitation
DeepMind used Proximal Policy Optimization to balance exploration and exploitation effectively. This technique helps to balance exploration and exploitation by slightly modifying the agent’s decision-making strategy at each step. Instead of completely changing the strategy based on the latest feedback, PPO makes minor adjustments. This way, the agent can gradually improve its strategy while exploring the game world.
Handling Large Action Spaces with Transformer
Another challenge in reinforcement learning is dealing with large action spaces. AlphaDev has to choose from thousands of possible assembly instructions in the assembly game at each step. To handle this ample action space, DeepMind used a technique called Transformer, a type of neural network that has succeeded in natural language processing tasks. The Transformer helps AlphaDev understand the current algorithm’s context and choose the most promising assembly instruction at each step.
AlphaDev’s Generalization and Real-World Impact
AlphaDev has demonstrated its ability to generalize and discover new algorithms with real-world impact by optimizing and launching improved sorting and hashing algorithms. AlphaDev is a step towards developing general-purpose AI tools that could help optimize the entire computing ecosystem and solve other problems that will benefit society.
Conclusion
The Future of AI and Computing: AlphaDev’s discovery of faster sorting and hashing algorithms is a significant milestone in artificial intelligence and computer science. It shows that AI can make meaningful contributions to areas that have traditionally been the domain of human experts. As we continue to develop and refine AI systems like AlphaDev, we can look forward to more breakthroughs that will push the boundaries of what is possible in computing.
