A few weeks back I was working on a solution in Visual Studio with about 8 projects or so in it. I attempted to add a reference from Project B to Project A and was greeted with this nice little dialog box:
I had accidentally selected the wrong project reference when adding a reference to one of the project, which caused a circular dependency. Effectively, what I did was this:
Which project would you build first? Of course that was a rhetorical question, please don't answer it. This got me thinking, what algorithm is Visual Studio using to detect circular dependencies? My visual way of thinking immediately settled on a topological sort. Picture the projects in your solution as nodes on a graph. The dependencies, or edges, between projects are directed in such a way that we can visually determine a build order. In the event where our edges do not cause a cycle in the graph we have what is known as a directed acyclic graph or DAG for short.
Sorting a DAG isn't all that difficult really, and this is exactly what a topological sort does. What we want to do is linearize our nodes in such a way where nodes with dependencies are visited after the nodes in which they depend on. You might assume from that rather vague description that a DAG might have more than 1 topological sort, which is true.
Note: In my examples I will be using the edges of a DAG to indicate a "depends on" type of relationship. You can the read the picture above (the one illustrating a circular dependency) as "A depends on B; B depends on C; C depends on A".
There are a few ways to pull off a topological sort. On such way involves identifying nodes without incoming edges and following their outbound edges to other nodes, adding each visited node to some sort of list along the way. Another way involves performing a depth-first search of each node in the graph for any node not previously visited. Adding each node to some list after all outbound edges (dependencies) have also been visited.
I feel the depth-first traversal better fits my interpretation of edges in this example and more beautifully expresses itself in a recursive manner. Since we follow each edge and visit each node only once, this algorithm will run in O(n) where n is the number of nodes in our DAG. To summarize, we will linearly order nodes in a DAG and the algorithm will run in linear time. Sounds great doesn't it? Let's see some pseudo code:
function Visit(Node n, List l)
if n has been visited
return
mark n as visited
for each node c with an edge from n to c
Visit(c, l)
add node n to list l
end
It's easy to spot the depth-first traversal in this pseudo code. The idea is to travel as far down the chain of dependencies until you reach a point where the chain ends. Then, before you wind back up the call stack, add the currently visited node to some list. Nodes which are depended upon will be in the listed at an index less than the nodes that depend upon them. Here is our pseudo code in C#:
public class Node<T>
{
public T ID { get; private set; }
public bool Visited { get; set; }
public List<Node<T>> Dependencies { get; private set; }
public Node(T ID)
{
this.ID = ID;
Dependencies = new List<Node<T>>();
}
}
private static void Visit<T>(Node<T> n, List<Node<T>> list)
{
if (n.Visited)
{
return;
}
n.Visited = true;
foreach (Node<T> dependency in n.Dependencies)
{
Visit(dependency, list);
}
list.Add(n);
return;
}
Since our code works under the assumption that each nodes' Visited property is set to false, we'll need to make sure this assumption is met prior to sorting the DAG. The Visit function also needs to be called for each node in the DAG. Let's introduce a function to kick start the topological sort:
public static List<Node> PerformTopoSort(List<Node<T>> nodes)
{
var list = new List<Node<T>>();
nodes.ForEach(n => n.Visited = false);
nodes.ForEach(n => Visit(n, list));
return list;
}
That's pretty straight forward. Mark each node such that it has not been visited yet, and then visit each node. The resulting list will contain the proposed topological sort, which remember is essentially a post-order listing of the nodes in our depth-first traversal. Here is how you can invoke the topological sort:
// create our "projects"
var consoleApplication = new Node<string>("consoleApp");
var businessLayer = new Node<string>("businessLayer");
var dataLayer = new Node<string>("dataLayer");
// set up the "references"
consoleApplication.Dependencies.Add(businessLayer);
businessLayer.Dependencies.Add(dataLayer);
// perform a topological sort of our projects
var nodes = new List<Node<string>>() { consoleApplication, businessLayer, dataLayer };
List<Node<string>> topoSort = PerformTopoSort(nodes);
// display the results of the topological sort
Console.WriteLine("Proposed topological sort:");
foreach (Node<string> n intopoSort)
{
Console .WriteLine(n.ID);
// prints "dataLayer, businessLayer, consoleApp"
}
Remember that I'm trying to mimic how Visual Studio detects circular dependencies when adding references to projects within a solution. I'm sure this isn't exactly how Visual Studio does it, but its a possibility. In hindsight, a linked list would have served this implementation better than just a List, but that's a minor issue.
Something missing from our implementation is the detection of cycles in a DAG. Perhaps we could detect cycles when we encounter a node we have previously visited. While that notion is on the right track, it is slightly naive. A node on our graph could be depended upon by two different nodes, not necessarily causing a cycle. We need to make the distinction between encountering a previously visited node due to the situation just mentioned, and encountering a previously visited node from the same root. To accomplish this we need to modify the Visit function to keep track of our root node across traversals. We will also extend the Node class to keep a reference to which root node was used when the node was last visited:
public class Node<T>
{
public T ID { get; private set; }
public bool Visited { get; set; }
public List<Node<T>> Dependencies { get; private set; }
public Node<T> Root { get; set; } // keep track of the root for each node
public Node(T ID)
{
this.ID = ID;
Dependencies = new List<Node<T>>();
}
}
private static bool Visit<T>(Node<T> n, List<Node<T>> list, Node<T> root)
{
if (n.Visited)
{
return n.Root != root;
}
n.Visited = true;
n.Root = root;
foreach (Node<T> dependency in n.Dependencies)
{
if (!Visit(dependency, list, root) && n != root)
{
throw new Exception("Cycle detected from " + n.ID + " to " + dependency.ID);
}
}
list.Add(n);
return true;
}
Now we can keep track of the root node during our depth-first traversal. When we encounter a previously visited node, we can check to see if we visited it with the same root. If that's the case, a circular dependency is the result. Since Visit takes another parameter, we have to modify the call site just a bit to account for the root node:
public static List<Node> PerformTopoSort(List<Node<T>> nodes)
{
var list = new List<Node<T>>();
nodes.ForEach(n => n.Visited = false);
nodes.ForEach(n => Visit(n, list, n)); // added n as the root
return list;
}
To test this new functionality, we can create a cycle in our DAG (which really wouldn't be a DAG anymore):
dataLayer.Dependencies.Add(businessLayer); // causes a cycle in our DAG
When you run the code with the cycle added, we can expect an exception to be thrown:
Is this how Visual Studio detects and prevents cycles in project references? Maybe, but I'm not sure. It doesn't matter much anyway since this was a good review of topological sorting. Something missing from this implementation is that it does not propose all possible topological sorts if the DAG has more than 1. While that would be fun to add, I'll save it for another time...
Comments