Learn A* (A-star) Algorithm in Python — Code An AI to Play a Game

In this article, we are going to discuss a planning algorithm that’s still used widely in the industry (eg in robotics), has great theoretical guarantees, and can be used as a baseline for many other more complex algorithms (ie reinforcement learning). Say hello to A* :)

(Pss — My video version of this article is now available on youtube)

Simply put, A* is an algorithm for finding the shortest path between some start node and end node.

A-star finding the min path (around an obstacle) between some start node and end node. Source: wikipedia

A* Application Examples

A node can represent states, like states in a game, with the end-state being the winning state. The edges between the nodes represent the moves between the nodes (game states). Together, the nodes and edges make a graph. In this case, A* applied to this graph would find the quickest way to win in the least amount of moves.

A node could also be an intersection in Manhatten (like a 2d Cartesian grid). And the edges could be the roads between each intersection. Our house might be located at point (0,0) and the grocery store at node/coordinate (10,10). In this case, A* would be like google maps and find the min distance route for us to get to the grocery store. Nice!

What A* Does

A* can find the min cost path fast because it has the help of a little thing called a heuristic. A heuristic *is not* the thing we are trying to minimize but rather it gives us a hint of which where we might want to explore in our graph next.

A* is basically Dijkstra’s algorithm if the heuristic is equal (eg 0) for all nodes. Remember that Dijkstra basically just always explores the nodes with the least depth. All A* does is add a heuristic hint to that consideration for where to explore next. Where A* really shines is when the heuristic gives us a really good hint of where we want to go. For example, a heuristic hint for each intersection node n in our driving example could be the Euclidean distance from n to our grocery store (at (10,10)). This will help us find the fastest route in fewer tries, which means Google maps loading faster for us. Nice! Said another way, whether we use Dijkstra or A*, we’ll eventually get to the goal state and will have found the min-dist path. But if we are interested in how many steps our algorithm takes to find that path, we should consider A*.

The A* Algorithm

The algorithm is relatively straightforward. Say I’m the start node. Assign me a distance of 0. Look at all my neighbors. If their total distance (my distance + edge distance to neighbor) is less than the cost the neighbor has previously been assigned (each neighbor starts at an infinite distance), assign the neighbor's parent to me, and assign the neighbor's distance to the new total distance. Keep a to-explore list of all the nodes we want to explore eventually. If the neighbor doesn’t exist in the to-explore set, add it.

Once we are done with the neighbors, let's find a new node to explore from the to-explore list. Up to this point, our algorithm looks exactly like Dijkstra's. The difference is how we choose the next node to explore. In Dijkstra’s, we’d pick the node with the least distance. In A*, we choose the node with the least distance PLUS heuristic cost. This will allow us to choose a better node that is ideally closer to where we are trying to get to — our end state. Nice!

A little technicality. Skip it if you want. But in case you’re curious…. since we’re not keeping track of nodes we’ve already visited/explored, how do we keep from looping? Well if the edge weight is 0, this could happen. So edge weight has to be greater than 0. As long as we have this, we won’t go right back to a node we were just at because the cost will always be greater to go right back. (Note the heuristic value is not used at all in knowing which parent to update and if we should add to the explore list. So heuristic can be 0 or negative. It doesn’t matter). However, it is possible to add a node to the to-explore more than once. We want this actually. As an exercise, try to think of a case when we might want this.

A* Example With Code — 8 Tile Problem

Ok, so now we understand conceptually what is happening. Let’s implement it to build an AI that can play a game!

Game Setup: Given some initial settings of a 9-tiled board (8 of them being non-empty), we want to move the tiles so that they end up in some final configuration. We can slide the tiles around by moving a tile into the empty spot.

Wooden 8-tile game. Source: www.aliexpress.com

Game Goal: We win the game when the board matches some goal configuration. We want to find a way to get our current board into the goal board configuration in as few moves as possible.

We are going to represent our board in code as a matrix of numbers.

Let's implement a helper function that will allow us to manipulate the matrix. I’m using Python type hints for readability.

Now let's implement a function to get the adjacent matrices to a given matrix. Basically, the board state after we move the empty tile up one, down one, left, and right.

Cool! We are almost ready to implement the a-star algorithm. Let's just create a few helper classes that will make our code more readable.

The PriorityQueue is like a list that always keeps the highest priority item on top. We implemented it with a min heap which means we can pop the top priority item with log(n) time. Nice! Are you ready for the a-star? Let's start with some pseudo code before we get into the implementation details.

to-explore = PriorityQueue([start_node])
while more nodes to explore:
      # get node with lowest heuristic + distance
      current = to-explore.pop() 
      if current == end_node:
         return end_node

      for n in get_neighbors(current):
         new_dist = current.dist + dist(current, n)
         if new_dist < n.dist:
           n.dist = new_dist
           n.parent = current
           if n not in to-explore:
              to-explore.add(n)

Cool. Spend a couple of minutes really understanding this pseudo code because the actual implementation is going to have a few more details but the same basic algorithm. Here are a couple of implementation details to be aware of. The first is that theoretically, we have the full graph ready for us before we start. However, since our search space is really big (in the case of the 8-board it's 9!=362,880), we are going to lazily build our graph. That is, we only create a new node when we first see it and add it to our graph. I tried to keep the algorithm as problem-agnostic as possible, but for simplicity let some problem-specific details seek in. Hopefully, if you wanted to use this algorithm for another problem, it would require minimal modifications. Also, you’ll notice that we are passing a function “heuristic” to the a-star function (which we can do in Python — Nice!). This allows us to use whatever heuristic we want. We’ll default it to the uniform heuristic which always returns 0. Recall from our discussion above that this heuristic reduces the problem to Dikstraj’s algorithm.

There it is! Let’s add a few tests to make sure it is indeed finding the least moves to win the game (I have verified these are indeed the least-move-answers). We can also print out our solution path.

Ok, now if you run this, you’ll see printed out how many iterations of our algorithm it took to find the min-depth path. So now let's implement a few other heuristics. Stop for a sec. Think about what kind of heuristic you would use to guide the search. If you think you have a good idea, go ahead and implement it! If the search completes in fewer iterations, then you chose a good heuristic! We are going to implement two heuristics to guide our algorithm. The first one is called “misplaced-tile-cost” which takes in two matrices and returns the number of tiles that are different between matching indexes in the arrays.

The next one we implement is “Manhatten-distance” which is a sum of the distance of values in the first matrix and their respective distance in the second matrix.

Problem depth is the min-path to solve the problem (higher means a harder problem). Algorithm iterations are the number of iterations the algorithm took to find that min-path (lower is better).

For example, game which has a min move count (depth) of 12 to solve:
- Uniform Heuristic took 1171 tries
- Misplaced Tile Cost took 85 tries
- Manhatten Distance took 17 tries

These results show that a-star with Manhatten-distance performs better than a-star with misplaced tile cost. And the worst of all is a star with the uniform heuristic (ie Dijkstra's). This should make sense since Manhattan distance is the most informative heuristic! Nice!

Code

You can find the entire source code to the project at my GitHub here.

Conclusion

So, finally, we are done! We have covered the a-star algorithm, some use cases, why we need it, and what it does and we have even implemented it as an AI to solve a real game! Nice! What else will you use A* for? So many possibilities! I wish you the best of luck in your endeavors. If you have applied A* to other problems with success, please let me know in the comment section below :) Also, if you enjoyed this blog and would like to see others on similar topics of AI and ML, please let me know in the comments; I would love to write about it. Till next time, stay awesome!