T-SQL: Breadth-First shortest-route search
Wednesday, June 18th, 2008A while ago I came across the problem of determining the shortest route in a many-to-many self-join table. The linker table consists of two ID columns to link nodes together and is called `vertices`. This represents an unweighted and undirected node graph.
In an unweighted undirected graph there can be no heuristic searching for there is no clue (weight or direction) which possible leaf node gets you to the target quickest. Therefore an uninformed search is the only option and for this I utilized the breadth-first search algorithm.
The breadth-first algorithm visits the leafnodes per generation first and not by branch first. In practice this is the optimal method most of the time.
T-SQL Implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | CREATE PROCEDURE BFS( @FROM INT, @TO INT ) AS BEGIN DECLARE @Nodes TABLE (Generation INT, p INT, r INT, UNIQUE(p, r)) DECLARE @Generation INT SELECT @Generation = 0 INSERT @Nodes ( Generation, p ) SELECT @Generation, @FROM WHILE @@ROWCOUNT > 0 AND NOT EXISTS (SELECT * FROM @Nodes WHERE p = @TO) BEGIN SELECT @Generation = @Generation + 1 INSERT @Nodes ( Generation, p, r ) SELECT @Generation, bid, aid FROM vertices WHERE aid IN (SELECT p FROM @Nodes WHERE Generation = @Generation - 1) AND bid NOT IN (SELECT p FROM @Nodes) UNION SELECT @Generation, aid, bid FROM vertices WHERE bid IN (SELECT p FROM @Nodes WHERE Generation = @Generation - 1) AND aid NOT IN (SELECT p FROM @Nodes) END -- Backtracing method: Traces the route back from target to start DECLARE @Backtrace TABLE ( p INT ) INSERT @Backtrace VALUES(@TO) WHILE @Generation > 0 BEGIN DELETE FROM @Nodes WHERE Generation = @Generation AND p NOT IN(SELECT p FROM @Backtrace) INSERT @Backtrace ( p ) SELECT DISTINCT r FROM @Nodes WHERE Generation = @Generation SELECT @Generation = @Generation - 1 END END SELECT * FROM @Nodes ORDER BY Generation, r, p |
The procedure returns the following columns:
- Generation: The amount of generations of leaf node expansion
- P: The progressive node ID
- R: The regressive node ID
The regressive ID is in fact the node’s parent and the progressive ID its child. This way we can obtain the (outward) direction in the result set. Results can represent multiple shortest routes as long as they both have the minimum amount of steps (Generations) necessary.
If we would want to obtain the shortest route(s) between two nodes with ID’s 776 and 777 this would be a valid result.
Execution of the Stored Procedure:
EXEC BFS @FROM = 776, @TO = 777
The result set:
| Generation | p | r |
|---|---|---|
| 0 | 776 | |
| 1 | 2881 | 776 |
| 1 | 3198 | 776 |
| 2 | 3362 | 2881 |
| 2 | 1582 | 3198 |
| 3 | 1579 | 1582 |
| 3 | 1262 | 3362 |
| 4 | 777 | 1262 |
| 4 | 777 | 1579 |
As you can see this result set proposes two shortest routes between nodes 776 and 777.
In practice this technique has its limitations. It is not heuristic so brute force is required. With each generation that the leafs expand there is an exponential amount of nodes to be dealt with. Luckily there are ways to chop in two the exponential strain this procedure poses for the DB. One of these is the bidirectional search method. Alas, I will not disclose that particular implementation.
Go code or something