Sequentially Indexing Permutations: A Linear Algorithm for Computing Lexicographic Rank

Recently I wrote an optimal solver for the Rubik’s Cube that can solve any scrambled cube in 20 moves or fewer. Check it out if you’re interested. Anyway, solving the puzzle programmatically involves creating pattern databases that hold hundreds of millions of values, namely the number of twists required to solve subsets of the cube, like the number of twists necessary to solve the eight corners. Because such large datasets can easily exhaust system resources, the values in the databases are stored sequentially in plain arrays. Indexing into those databases means calculating sequential indexes for permutations, essentially generating a minimal perfect hash for a set of pieces of the puzzle.
In this article I’ll go over some algorithms for calculating sequential indexes for permutations, also known as the lexicographic rank for you math nerds out there. In a nutshell, the algorithm takes a permutation of a set of things and converts it into an integer. I’ll start with a simple algorithm that’s quadratic in complexity, and then present an equivalent linear version. Lastly, I’ll touch on indexing partial permutations.
This piece is intended for programmers. There are plenty of resources that describe in detail the math behind ranking permutations, but this writing focuses on implementing and optimizing code. The examples are in C++, and will compile with g++ using the 2017 standard. I’m also going to reference the Rubik’s Cube a few times since it’s such a great illustration of permutation groups; you should know what a Rubik’s Cube is and how it moves.
A Brief Refresher on Permutations
A permutation is an arrangement of a set of numbers or objects. Unlike a combination, the order of a permutation is important, so (0 1 2) is different than (2 1 0). There are two types of permutations.
- Permutations with replacement. For example, the PIN number of a debit card may have repeating digits.
- Permutations without replacement, like a group of people in line. A line has an intrinsic order to it, and no person can be in line twice.
Indexing permutations with replacement is trivial, so I’ll only be covering the latter. From here on out, “permutation” refers to a permutation without replacement.
Given a set of n items, there are n! possible permutations of that set. So, for example, the set {0 1 2} can be permuted 3! = 3 * 2 * 1 = 6 ways. Those 6 permutations and their lexicographic ranks — the thing we’re after — are: 0: (0 1 2) ;1: (0 2 1) ; 2: (1 0 2) ; 3: (1 2 0) ; 4: (2 0 1) ; 5: (2 1 0) .
For a set of n items picked k ways, there are nPk (read “n pick k”) possible permutations. That’s calculated as n!/(n-k)!. As an example, given a set of 4 things, 2 of them can be picked 4!/(4–2)! = 24 / 2 = 12 ways: (0 1); (0 2); (0 3); (1 0); (1 2); (1 3); (2 0); (2 1); (2 3); (3 0); (3 1); (3 2). Those are ranked 0–11, respectively.
The Lehmer Code
Calculating sequential indexes for permutations is done by computing the Lehmer code of the permutation, and then converting that Lehmer code to a base-10 number. Before diving into that process, let’s take a minute to consider a base-10 number, say 3120. The number can be broken up into constituent parts: 3*10³ + 1*10² + 2*10¹ + 0*10⁰ = 3120. Like that base-10 number, a Lehmer code is just a sequence of digits; however, each digit has a different base. Technically, it’s a mixed-radix numeral system known as a factorial number system. Those same digits, 3, 1, 2, 0, have respective bases of 3!, 2!, 1!, and 0! in a factorial number system. That in mind, converting the Lehmer code 3120 to base 10 is simple: 3*3! + 1*2! + 2*1! + 0*0!.
Computing the Index of a Permutation in O(n²) Time
There’s a straight-forward quadratic algorithm on Wikipedia for calculating the Lehmer code of a permutation, but it’s rather inefficient for a computer since it involves splicing items out of an array. An equivalent algorithm that’s easy to compute is as follow. For each element of the permutation, the associated digit of that permutation’s Lehmer code is that element less the number of permutation elements to its left that are less than it (Korf, et. al.). I know that’s a bit rough to digest, so let’s go through a step-by-step example and calculate the Lehmer code for the permutation (2 0 1).
- The first element of the permutation is 2. There are no elements to its left, so the first digit of the Lehmer code is 2.
- The second element is 0. None of the elements to its left are less than it, so the second digit of the Lehmer code is 0.
- And the last element is 1. It has one element to its left that is less than it, so the final digit of the Lehmer code is (1–1) 0.
(2 0 1) thus has a Lehmer code of 200. Converting that to base-10 (2*2! + 0*1! + 0*0!) yields 4. Indeed, (2 0 1) is the fourth (zero-indexed) permutation of the set {0 1 2} , lexicographically ordered.
It’s worth pointing out that this algorithm works provided the set of numbers in the permutation are zero-based. That is, it will work for the set {0 1 2}, but the set {5 6 7} would need to be translated.
Now, since English is hard to read and code is easy, here’s how that looks in C++.




