Posts Tagged ‘Breadth-First’

T-SQL: Breadth-First shortest-route search

Wednesday, June 18th, 2008

A while ago I came across the problem of determining the shortest route in a many-to-many self-join table. The linker table consists of two ID columns to link nodes together and is called `vertices`. This represents an unweighted and undirected node graph.

In an unweighted undirected graph there can be no heuristic searching for there is no clue (weight or direction) which possible leaf node gets you to the target quickest. Therefore an uninformed search is the only option and for this I utilized the breadth-first search algorithm.

The breadth-first algorithm visits the leafnodes per generation first and not by branch first. In practice this is the optimal method most of the time.

T-SQL Implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
CREATE PROCEDURE BFS(
	@FROM INT,
	@TO INT
)
AS
 
BEGIN
	DECLARE @Nodes TABLE (Generation INT, p INT, r INT, UNIQUE(p, r))
	DECLARE @Generation INT
 
	SELECT @Generation = 0
 
	INSERT @Nodes
		(
			Generation,
			p
		)
	SELECT	@Generation,
			@FROM
 
	WHILE @@ROWCOUNT > 0 AND NOT EXISTS (SELECT * FROM @Nodes WHERE p = @TO)
		BEGIN
			SELECT @Generation = @Generation + 1
 
			INSERT	@Nodes
				(
					Generation,
					p,
					r
				)
 
			SELECT	@Generation,
					bid,
					aid
			FROM	vertices
			WHERE	aid IN (SELECT p FROM @Nodes WHERE Generation = @Generation - 1)
				AND bid NOT IN (SELECT p FROM @Nodes)
			UNION
			SELECT	@Generation,
				aid,
				bid
			FROM	vertices
			WHERE	bid IN (SELECT p FROM @Nodes WHERE Generation = @Generation - 1)
				AND aid NOT IN (SELECT p FROM @Nodes)
 
		END
 
	-- Backtracing method: Traces the route back from target to start
 
	DECLARE @Backtrace TABLE
	( p INT )
 
	INSERT @Backtrace VALUES(@TO)
 
	WHILE @Generation > 0
		BEGIN
			DELETE FROM @Nodes
			WHERE Generation = @Generation
				AND p NOT IN(SELECT p FROM @Backtrace)
 
			INSERT @Backtrace
				( p )
 
			SELECT DISTINCT r
			FROM @Nodes
			WHERE Generation = @Generation
 
			SELECT @Generation = @Generation - 1
		END
END
 
SELECT * FROM @Nodes ORDER BY Generation, r, p

The procedure returns the following columns:

  • Generation: The amount of generations of leaf node expansion
  • P: The progressive node ID
  • R: The regressive node ID

The regressive ID is in fact the node’s parent and the progressive ID its child. This way we can obtain the (outward) direction in the result set. Results can represent multiple shortest routes as long as they both have the minimum amount of steps (Generations) necessary.

If we would want to obtain the shortest route(s) between two nodes with ID’s 776 and 777 this would be a valid result.

Execution of the Stored Procedure:

EXEC BFS @FROM = 776,  @TO = 777

The result set:

Generation p r
0 776
1 2881 776
1 3198 776
2 3362 2881
2 1582 3198
3 1579 1582
3 1262 3362
4 777 1262
4 777 1579

As you can see this result set proposes two shortest routes between nodes 776 and 777.

In practice this technique has its limitations. It is not heuristic so brute force is required. With each generation that the leafs expand there is an exponential amount of nodes to be dealt with. Luckily there are ways to chop in two the exponential strain this procedure poses for the DB. One of these is the bidirectional search method. Alas, I will not disclose that particular implementation.

Go code or something ;)