In the previous post, we saw how to solve the all-pairs shortest path problem using the Floyd-Warshall algorithm. We also explored several ways to improve the algorithm’s performance by using parallelism and vectorization.
However, if you think about the results of the Floyd-Warshall algorithm, you will quickly realise the interesting moment – we know the distances (values of shortest paths) between vertexes in a graph but we don’t know the “routes” i.e., we don’t know what vertexes contribute to each shortest path, and we can’t rebuild these “routes” from the results we have.
In this post, I invite you to join me and extend the Floyd-Warshall algorithm. This time, we are going to make sure we can calculate the distance and rebuild the “route.”
Before diving into code and benchmarks, let’s revise a theory of how the Floyd-Warshall algorithm works.
Here is a textual description of the algorithm:
We start by representing a graph (G
) of size N
as matrix (W
) of size N x N
where every cell W[i,j]
contains the weight of an edge from vertex i
to vertex j
(see Picture 1). In cases when there is no edge between vertexes, the cell is set to a special NO_EDGE
value (shown as dark cells in Picture 1).
From now on, we say – W[i,j]
contains a distance between vertexes i
and j
.
Next, we take a vertex k
and iterate through all pairs of vertexes W[i,j]
checking if distance i ⭢ k ⭢ j
is smaller than the currently known distance between i
and j
.
The smallest value is stored in W[i,j]
and step #3 is repeated for the next k
until all vertexes of G
have been used as k
.
Here is the pseudo-code:
algorithm FloydWarshall(W) do
for k = 0 to N - 1 do
for i = 0 to N - 1 do
for j = 0 to N - 1 do
W[i,j] = min(W[i,j], W[i,k] + W[k,j])
end for
end for
end for
end algorithm
When done, every cell W[i,j]
of matrix W
either contains a distance of the shortest path between vertexes i
and j
or a NO_EDGE
value, if there is no path between them.
As I mentioned in the beginning – having only this information makes it impossible to reconstruct the routes between these vertexes. So… what should we do to be able to reconstruct these routes?
Well… it may sound obvious but… we need to collect more data!
The essence of the Floyd-Warshall algorithm is a compartment of known distance W[i,j]
with a distance of potentially new path from i
to j
through intermediate vertex k
.
I drag so much attention to this detail because it is the key to how we can preserve route information. Let’s take the previous graph of 5 vertexes (see Picture 2) and execute the Floyd-Warshall algorithm on it.
Initially, we know about all graph edges, which gives us the following paths: 0 ⭢ 1 :2
, 0 ⭢ 4 :10
, 1 ⭢ 3 :1
, 2 ⭢ 4 :1
, 3 ⭢ 2 :1
and 3 ⭢ 4 :3
.
I use the “from” ⭢ “to” :”distance” format from the previous post to write down paths.
- Author’s note
We also know there are no edges leading to vertex 0
, so executing an algorithm for k = 0
makes no sense. It is also obvious, there is a single edge (0 ⭢ 1
) leading to from vertex 0
to vertex 1
, which also makes the execution of all i != 0
(the i
is “from” here) quite meaningless and because vertex 1
has edges with 2
and 4
, it also makes no sense to execute algorithms for any j
which isn’t 2
or 4
(the j
is “to” here).
That is why our first step will be to execute an algorithm for k = 1
, i = 0
and j = 2,4
.
Step |
Path |
Comment |
---|---|---|
1 |
0 ⭢ 1 ⭢ 2 |
Path found. Distance = 3 (was nothing) |
2 |
0 ⭢ 1 ⭢ 4 |
Path found. Distance = 8 (was 10). |
We have found two paths: a new path (0 ⭢ 1 ⭢ 2
) and a shortcut (0 ⭢ 1 ⭢ 4
). Both go through vertex 1
. If we don’t store this information (the fact we got to 2
and 4
through 1
) somewhere right now, it will be lost forever (and that is quite the opposite of what we want).
So what should we do? We will update matrix W
with new values (see Picture 3a) and also store the value of k
(which is k = 1
) in cells R[0,2]
and R[0,4]
of a new matrix R
of same size as matrix W
but initialized with NO_EDGE
values (see Picture 3b).
Right now, don’t focus on what exactly matrix R
is. Let’s just keep going and execute the algorithm for the next k = 2
.
Here, we will do the same analysis (to understand what combinations make sense to execute) as we did for k = 1,
but this time, we will use matrix W
instead of graph G
. Look at the matrix W
, especially in column #2 and row #2 (Picture 4).
Here you can see, there are two paths to vertex 2
from vertex 0
and from vertex 1
(column #2), and two paths from vertex 2
to vertex 3
and to vertex 4
(row #2). Knowing that, it makes sense to execute the algorithm only for combinations of k = 2
, i = 0,1
and j = 3,4
.
Step |
Path |
Comment |
---|---|---|
1 |
0 ⭢ 2 ⭢ 3 |
Path found. Distance = 4 (was nothing) |
2 |
0 ⭢ 2 ⭢ 4 |
Path found. Distance = 6 (was 8) |
3 |
1 ⭢ 2 ⭢ 3 |
Path found. Distance = 2 (was nothing) |
4 |
1 ⭢ 2 ⭢ 4 |
Path found. Distance = 4 (was 6) |
As we have already done previously, we are updating cells W[0,3]
, W[0,4]
, W[1,3]
, W[1,4]
with new distances and cells R[0,3]
, R[0,4]
, R[1,3]
and R[1,4]
with k = 2
(see Picture 5).
There is only k = 3
left to process (because there are no edges leading from vertex 4
to any other vertex in the graph).
Again, let’s take a look at the matrix W
(Picture 6).
According to the matrix W
, there are three paths to vertex 3
from vertexes 0
, 1
and 2
(column #3), and there is one path from vertex 3
to vertex 4
(row #3). Therefore, we have the following paths to process:
Step |
Path |
Comment |
---|---|---|
1 |
0 ⭢ 3 ⭢ 4 |
Path found. Distance = 5 (was 6) |
2 |
1 ⭢ 3 ⭢ 4 |
Path found. Distance = 3 (was 4) |
3 |
2 ⭢ 3 ⭢ 4 |
Path found. Distance = 2 (was 3) |
This was the last iteration of the algorithm. All that is left is to update cells W[0,4]
, W[1,4]
, W[2,4]
with new distances and set cells R[0,4]
, R[1,4]
, R[2,4]
to 3
.
Here is what we have at the end of the algorithm (Picture 7).
As we know from the previous post, matrix W
now contains all pairs of shortest paths in graph G
. But what is stored inside matrix R
?
Every time we found a new shortest path, we were updating the corresponding cell of matrix R
with the current value of k
. While at first, this action might look mysterious it has a very simple meaning – we were storing the vertex, from which we reached the destination vertex: i ⭢ k ⭢ j
(here we are reaching j
directly from k
).
This moment is crucial. Because knowing a vertex we came from allows us to rebuild the route by moving backward (like a lobster) from the vertex j
(the “to”) to vertex i
(the “from”).
Here is a textual description of the algorithm to rebuild the route from i
to j
:
X
.z
from cell R[i,j]
.z
is NO_EDGE
, then the route from i
to j
is found and we should proceed to step #7.z
to a dynamic array X
.R[i,z]
into z
.i
to X.j
to X
.X
contains the route from i
to j
.Please note, the above algorithm works only for existing paths. It also isn’t perfect from a performance point of view and for sure can be optimized. However, in the scope of this post, it is specifically described in such a way as to make it easier to understand and reproduce on a sheet of paper.
- Author’s note
In pseudo-code, it looks even simpler:
algorithm RebuildRoute(i, j, R)
x = array()
z = R[i,j]
while (z ne NO_EDGE) do
x.prepend(z)
z = R[i,z]
end while
x.prepend(i)
x.append(j)
return x
end algorithm
Let’s try it on our graph G
and rebuild a route from vertex 0
to vertex 4
(see Picture 8).
Here is a textual description of the above illustration:
We start by reading the value from R[0,4]
, which results in 3
. According to the algorithm, this means the route goes to vertex 4
from vertex 3
(shown in BLUE).
Because the value of R[0,4]
isn’t NO_EDGE
, we proceed and read the value of R[0,3]
which results in 2
(shown in GREEN).
Again, because the value of R[0,3]
isn’t NO_EDGE
, we proceed and read the value of R[0,2]
which results in 1
(shown in RED).
At last, we read a value of R[0,1]
, which results in the NO_EDGE
value, meaning, there are no more vertexes except 0
and 4
which contribute to the route. Therefore, the resulting route is: 0 ⭢ 1 ⭢ 2 ⭢ 3 ⭢ 4
which if you look at the graph indeed corresponds to the shortest path from vertex 0
to vertex 4
.
A thoughtful reader might ask:
How can we be sure that all data we read from matrix R
belongs to the same path?
- Thoughtful reader
It is a very good question. We are sure because we update matrix R
with a new value when we update the shortest path value in matrix W
. So in the end, every row of matrix R
contains data related to shortest paths. No more, no less.
Now, we are done with the theory, it is implementation time.
In the previous post, besides implementing an original version of the Floyd-Warshall algorithm, we have also integrated various optimizations: support for sparse graphs, parallelism, and vectorization, and in the end, we also combined them all.
There is no reason to repeat all of this today. However, I would like to show you how to integrate route memorization into original and vectorized (with support for sparse graphs) versions of the algorithm.
It might be hard to believe but to integrate route memorization into the original version of the algorithms, all we need to do is to:
Extend function signature to include R
matrix as a separate parameter – int[] routes
.
Save k to routes
every time the shortest path is changed (lines: 2 and 14).
That is it. Just one and a half lines of code:
public void BaselineWithRoutes(
int[] matrix, int[] routes, int sz)
{
for (var k = 0; k < sz; ++k)
{
for (var i = 0; i < sz; ++i)
{
for (var j = 0; j < sz; ++j)
{
var distance = matrix[i * sz + k] + matrix[k * sz + j];
if (matrix[i * sz + j] > distance)
{
matrix[i * sz + j] = distance;
routes[i * sz + j] = k;
}
}
}
}
}
Integration into a vectorized (with support for sparse graphs) version takes a bit more effort to complete, however, the conceptual steps are the same:
Extend function signature to include R
matrix as a separate parameter – int[] routes
.
On each iteration, initialize a new vector of k
values (line: 6).
Save k
values vector to routes
every time the shortest path is changed (lines: 31-32).
Update a non-vectorized part of the algorithm in the same way as we updated the original algorithm (line: 41).
public void SpartialVectorOptimisationsWithRoutes(
int[] matrix, int[] routes, int sz)
{
for (var k = 0; k < sz; ++k)
{
var k_vec = new Vector<int>(k);
for (var i = 0; i < sz; ++i)
{
if (matrix[i * sz + k] == Constants.NO_EDGE)
{
continue;
}
var ik_vec = new Vector<int>(matrix[i * sz + k]);
var j = 0;
for (; j < sz - Vector<int>.Count; j += Vector<int>.Count)
{
var ij_vec = new Vector<int>(matrix, i * sz + j);
var ikj_vec = new Vector<int>(matrix, k * sz + j) + ik_vec;
var lt_vec = Vector.LessThan(ij_vec, ikj_vec);
if (lt_vec == new Vector<int>(-1))
{
continue;
}
var r_vec = Vector.ConditionalSelect(lt_vec, ij_vec, ikj_vec);
r_vec.CopyTo(matrix, i * sz + j);
var ro_vec = new Vector<int>(routes, i * sz + j);
var rk_vec = Vector.ConditionalSelect(lt_vec, ro_vec, k_vec);
rk_vec.CopyTo(routes, i * sz + j);
}
for (; j < sz; ++j)
{
var distance = matrix[i * sz + k] + matrix[k * sz + j];
if (matrix[i * sz + j] > distance)
{
matrix[i * sz + j] = distance;
routes[i * sz + j] = k;
}
}
}
}
}
A small reminder – Vector.ConditionalSelect
operation returns a new vector where values are equal to the smaller of two corresponding values of input vectors, i.e., if the value of vector lt_vec
is equal to -1
, then the operation selects a value from ij_vec
, otherwise, it selects a value from k_vec
.
- Author’s note
Collecting route information seems reasonable enough because… well why not? Especially when it is so easy to integrate into existing algorithms. However, let’s see how practical it is to integrate it by default.
Here are two sets of benchmarks: with and without (both execute original and vectorized versions of the algorithm). All benchmarks were executed by BenchmarkDotNet v0.13.1 (.NET 6.0.4 x64) on a machine equipped with an Intel Core i5-6200U CPU 2.30GHz (Skylake) processor and running Windows 10.
All experiments were executed on the predefined set of graphs used in the previous post: 300, 600, 1200, 2400, and 4800 vertexes.
The source code and experimental graphs are available in the repository on GitHub. The experimental graphs can be found in the Data/Sparse-Graphs.zip
directory. All benchmarks in this post are implemented in the APSP02.cs file.
Below are the benchmark results where Baseline
and BaselineWithRoutes
methods correspond to the original version of the algorithm and SpartialVectorOptimisations
and SpartialVectorOptimisationsWithRoutes
methods correspond to a vectorized (with support for sparse graphs) version of the algorithm.
Method |
Size |
Mean (ms) |
Error (ms) |
StdDev (ms) |
---|---|---|---|---|
Baseline |
300 |
40.233 |
0.0572 |
0.0477 |
BaselineWithRoutes |
300 |
40.349 |
0.1284 |
0.1201 |
SpartialVectorOptimisations |
300 |
4.472 |
0.0168 |
0.0140 |
SpartialVectorOptimisationsWithRoutes |
300 |
4.517 |
0.0218 |
0.0193 |
Baseline |
600 |
324.637 |
5.6161 |
4.6897 |
BaselineWithRoutes |
600 |
321.173 |
1.4339 |
1.2711 |
SpartialVectorOptimisations |
600 |
32.133 |
0.2151 |
0.1679 |
SpartialVectorOptimisationsWithRoutes |
600 |
34.646 |
0.1286 |
0.1073 |
Baseline |
1200 |
2,656.024 |
6.9640 |
5.8153 |
BaselineWithRoutes |
1200 |
2,657.883 |
8.8814 |
7.4164 |
SpartialVectorOptimisations |
1200 |
361.435 |
2.5877 |
2.2940 |
SpartialVectorOptimisationsWithRoutes |
1200 |
381.625 |
3.6975 |
3.2777 |
Baseline |
2400 |
21,059.519 |
38.2291 |
33.8891 |
BaselineWithRoutes |
2400 |
20,954.852 |
56.4719 |
50.0609 |
SpartialVectorOptimisations |
2400 |
3,029.488 |
12.1528 |
11.3677 |
SpartialVectorOptimisationsWithRoutes |
2400 |
3,079.006 |
8.6125 |
7.1918 |
Baseline |
4800 |
164,869.803 |
547.6675 |
427.5828 |
BaselineWithRoutes |
4800 |
184,305.980 |
210.9535 |
164.6986 |
SpartialVectorOptimisations |
4800 |
21,882.379 |
200.2820 |
177.5448 |
SpartialVectorOptimisationsWithRoutes |
4800 |
21,004.612 |
79.8752 |
70.8073 |
Benchmark results aren’t simple to interpret. If you take a closer look, you will find a combination when route collection actually made the algorithm run faster or without any significant effect at all.
This looks very confusing (and if it is – I strongly advise you to listen to this show with Denis Bakhvalov and Andrey Akinshin to better understand what tricky things can affect measurements). My best take here is that “confusing” behavior is caused by great CPU cache performance (because graphs aren’t large enough to saturate caches). Partially, this theory is based on the “bold” sample where we can see significant degradation on the largest graph. However, I didn’t verify it.
In general, the benchmark shows that if we aren’t talking about extremely high-performance scenarios and huge graphs, it is okay to integrate route memorization into both algorithms by default (but keep in mind, it will double memory consumption because we need to allocate two matrices W
and R
instead of one).
The only thing left is an implementation of the route rebuild algorithm.
Implementation of route rebuild algorithms in C# is straightforward and almost completely reflects the previously demonstrated pseudo-code (we use LinkedList<T>
to represent dynamic array):
public static IEnumerable<int> RebuildWithLinkedList(
int[] routes, int sz, int i, int j)
{
var x = new LinkedList<int>();
var z = routes[i * sz + j];
while (z != Constants.NO_EDGE)
{
x.AddFirst(z);
z = routes[i * sz + z];
}
x.AddFirst(i);
x.AddLast(j);
return x;
}
The above algorithm isn’t perfect and definitely can be improved. The simplest improvement we can make is to preallocate an array of sz
size and fill it in reverse order:
public static IEnumerable<int> RebuildWithArray(
int[] routes, int sz, int i, int j)
{
var x = new int[sz];
var y = sz - 1;
// Fill array from the end
x[y--] = j;
var z = routes[i * sz + j];
while (z != Constants.NO_EDGE)
{
x[y--] = z;
z = routes[i * sz + z];
}
x[y] = i;
// Create an array segment from 'y' to the end of the array
//
// This is required because 'sz' (and therefore length of the array x)
// equals to the number of vertices in the graph
//
// So, in cases when route doesn't span through
// all vertices - we need return a segment of the array
return new ArraySegment<int>(x, y, sz - y);
}
While this “optimization” reduces the number of allocations to one, it doesn’t necessarily make the algorithm “faster” or “allocates less memory” than the previous one. The point here is if we need to have a route ordered from i
to j
, we always will have to “reverse” the data we retrieve from the matrix R
, and there is no way to escape it.
But if we can present data in reverse order, then we can make use of LINQ and avoid any unnecessary allocations:
public static IEnumerable<int> RebuildWithReverseYield(
int[] routes, int sz, int i, int j)
{
yield return j;
var z = routes[i * sz + j];
while (z != Constants.NO_EDGE)
{
yield return z;
z = routes[i * sz + z];
}
yield return i;
}
There can be many more variants of how this code can be “changed” or “improved.” The important moment here is to remember – any “change” has downsides in terms of memory or CPU cycles.
Feel free to experiment! The possibilities are almost limitless!
You can find implementations of all route algorithms on GitHub in the Routes.cs file.
In this post, we had a deep dive into the theory behind the Floyd-Warshall algorithm and have extended it with the possibility of memorizing shortest paths “routes.”Then we have updated C# implementations (original and vectorized) from the previous post. In the end, we have implemented a few versions of the algorithm to rebuild the “routes”.
We have repeated a lot, but at the same time, I hope you have found something new and interesting in this one. This isn’t the end of the all-pairs shortest paths problem. This is just the beginning.
I hope you have enjoyed the reading, and see you next time!