Learn A* (A-star) Algorithm in Python — Code An AI to Play a Game
In this article, we are going to discuss a planning algorithm that’s still used widely in the industry (eg in robotics), has great theoretical guarantees, and can be used as a baseline for many other more complex algorithms (ie reinforcement learning). Say hello to A* :)
(Pss — My video version of this article is now available on youtube)
Simply put, A* is an algorithm for finding the shortest path between some start node and end node.

A* Application Examples
A node can represent states, like states in a game, with the end-state being the winning state. The edges between the nodes represent the moves between the nodes (game states). Together, the nodes and edges make a graph. In this case, A* applied to this graph would find the quickest way to win in the least amount of moves.
A node could also be an intersection in Manhatten (like a 2d Cartesian grid). And the edges could be the roads between each intersection. Our house might be located at point (0,0) and the grocery store at node/coordinate (10,10). In this case, A* would be like google maps and find the min distance route for us to get to the grocery store. Nice!
What A* Does
A* can find the min cost path fast because it has the help of a little thing called a heuristic. A heuristic *is not* the thing we are trying to minimize but rather it gives us a hint of which where we might want to explore in our graph next.
A* is basically Dijkstra’s algorithm if the heuristic is equal (eg 0) for all nodes. Remember that Dijkstra basically just always explores the nodes with the least depth. All A* does is add a heuristic hint to that consideration for where to explore next. Where A* really shines is when the heuristic gives us a really good hint of where we want to go. For example, a heuristic hint for each intersection node n in our driving example could be the Euclidean distance from n to our grocery store (at (10,10)). This will help us find the fastest route in fewer tries, which means Google maps loading faster for us. Nice! Said another way, whether we use Dijkstra or A*, we’ll eventually get to the goal state and will have found the min-dist path. But if we are interested in how many steps our algorithm takes to find that path, we should consider A*.
The A* Algorithm
The algorithm is relatively straightforward. Say I’m the start node. Assign me a distance of 0. Look at all my neighbors. If their total distance (my distance + edge distance to neighbor) is less than the cost the neighbor has previously been assigned (each neighbor starts at an infinite distance), assign the neighbor's parent to me, and assign the neighbor's distance to the new total distance. Keep a to-explore list of all the nodes we want to explore eventually. If the neighbor doesn’t exist in the to-explore set, add it.
Once we are done with the neighbors, let's find a new node to explore from the to-explore list. Up to this point, our algorithm looks exactly like Dijkstra's. The difference is how we choose the next node to explore. In Dijkstra’s, we’d pick the node with the least distance. In A*, we choose the node with the least distance PLUS heuristic cost. This will allow us to choose a better node that is ideally closer to where we are trying to get to — our end state. Nice!
A little technicality. Skip it if you want. But in case you’re curious…. since we’re not keeping track of nodes we’ve already visited/explored, how do we keep from looping? Well if the edge weight is 0, this could happen. So edge weight has to be greater than 0. As long as we have this, we won’t go right back to a node we were just at because the cost will always be greater to go right back. (Note the heuristic value is not used at all in knowing which parent to update and if we should add to the explore list. So heuristic can be 0 or negative. It doesn’t matter). However, it is possible to add a node to the to-explore more than once. We want this actually. As an exercise, try to think of a case when we might want this.
A* Example With Code — 8 Tile Problem
Ok, so now we understand conceptually what is happening. Let’s implement it to build an AI that can play a game!
Game Setup: Given some initial settings of a 9-tiled board (8 of them being non-empty), we want to move the tiles so that they end up in some final configuration. We can slide the tiles around by moving a tile into the empty spot.

Game Goal: We win the game when the board matches some goal configuration. We want to find a way to get our current board into the goal board configuration in as few moves as possible.
We are going to represent our board in code as a matrix of numbers.
Let's implement a helper function that will allow us to manipulate the matrix. I’m using Python type hints for readability.







