T-SQL: Breadth-First shortest-route search
A while ago I came across the problem of determining the shortest route in a many-to-many self-join table. The linker table consists of two ID columns to link nodes together and is called `vertices`. This represents an unweighted and undirected node graph.
In an unweighted undirected graph there can be no heuristic searching for there is no clue (weight or direction) which possible leaf node gets you to the target quickest. Therefore an uninformed search is the only option and for this I utilized the breadth-first search algorithm.
The breadth-first algorithm visits the leafnodes per generation first and not by branch first. In practice this is the optimal method most of the time.
T-SQL Implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | CREATE PROCEDURE BFS( @FROM INT, @TO INT ) AS BEGIN DECLARE @Nodes TABLE (Generation INT, p INT, r INT, UNIQUE(p, r)) DECLARE @Generation INT SELECT @Generation = 0 INSERT @Nodes ( Generation, p ) SELECT @Generation, @FROM WHILE @@ROWCOUNT > 0 AND NOT EXISTS (SELECT * FROM @Nodes WHERE p = @TO) BEGIN SELECT @Generation = @Generation + 1 INSERT @Nodes ( Generation, p, r ) SELECT @Generation, bid, aid FROM vertices WHERE aid IN (SELECT p FROM @Nodes WHERE Generation = @Generation - 1) AND bid NOT IN (SELECT p FROM @Nodes) UNION SELECT @Generation, aid, bid FROM vertices WHERE bid IN (SELECT p FROM @Nodes WHERE Generation = @Generation - 1) AND aid NOT IN (SELECT p FROM @Nodes) END -- Backtracing method: Traces the route back from target to start DECLARE @Backtrace TABLE ( p INT ) INSERT @Backtrace VALUES(@TO) WHILE @Generation > 0 BEGIN DELETE FROM @Nodes WHERE Generation = @Generation AND p NOT IN(SELECT p FROM @Backtrace) INSERT @Backtrace ( p ) SELECT DISTINCT r FROM @Nodes WHERE Generation = @Generation SELECT @Generation = @Generation - 1 END END SELECT * FROM @Nodes ORDER BY Generation, r, p |
The procedure returns the following columns:
- Generation: The amount of generations of leaf node expansion
- P: The progressive node ID
- R: The regressive node ID
The regressive ID is in fact the node’s parent and the progressive ID its child. This way we can obtain the (outward) direction in the result set. Results can represent multiple shortest routes as long as they both have the minimum amount of steps (Generations) necessary.
If we would want to obtain the shortest route(s) between two nodes with ID’s 776 and 777 this would be a valid result.
Execution of the Stored Procedure:
EXEC BFS @FROM = 776, @TO = 777
The result set:
| Generation | p | r |
|---|---|---|
| 0 | 776 | |
| 1 | 2881 | 776 |
| 1 | 3198 | 776 |
| 2 | 3362 | 2881 |
| 2 | 1582 | 3198 |
| 3 | 1579 | 1582 |
| 3 | 1262 | 3362 |
| 4 | 777 | 1262 |
| 4 | 777 | 1579 |
As you can see this result set proposes two shortest routes between nodes 776 and 777.
In practice this technique has its limitations. It is not heuristic so brute force is required. With each generation that the leafs expand there is an exponential amount of nodes to be dealt with. Luckily there are ways to chop in two the exponential strain this procedure poses for the DB. One of these is the bidirectional search method. Alas, I will not disclose that particular implementation.
Go code or something
Tags: algorithm, Breadth-First, shortest route, Stored Procedure, T-SQL
[...] links: [wikipedia] [boost]Useful examples/applets: [Breadth First Traversal] [lupihno.de] [T-SQL: BFS Implementation]Finding Strongly Connected Components in a graph The easiest algorithm I find to solve this [...]
July 12th, 2008 at 6:20 pmif i had to iterate through your results table, to list out the actual 2 shortest paths like.
path 1: 776,2881,3362,1262,777
path 2: 776,3198,1582,1579,777
what would be the most optimum SQL?
thanks
July 29th, 2008 at 11:37 am@john krasinski:
I’m going to figure that one out the coming week… I’ll be back shortly with a solution.
August 16th, 2008 at 7:46 pmThis is quite a up-to-date info. I think I’ll share it on Digg.
April 15th, 2009 at 5:31 pmYou know, the thing about SQL is, that there is virtually nothing that can replace it.
Does anyone know if a substitute exists for sql? I mean besides MS SQL and Oracle and all that jazz. Thanks.
May 1st, 2009 at 3:37 am