← Back to Systems & Networking
Documentation

Systems & Networking

Distributed systems and computer networks are fundamentally graph problems. Machines are nodes, connections are edges, and almost every interesting question — how do packets reach their destination, how do services discover each other, how does a build system order its work — reduces to a graph algorithm.

Networks as Graphs

A computer network is a weighted directed graph: routers and hosts are vertices, physical or logical links are edges, and edge weights encode latency, bandwidth, or cost. Every routing protocol — OSPF, IS-IS, BGP — maintains a view of this graph and uses it to compute forwarding tables. The internet is itself a graph of autonomous systems, and BGP path selection is essentially a policy-constrained shortest-path computation over this AS graph. Understanding adjacency lists versus adjacency matrices matters here: a router with 100,000 BGP routes stores them in a compact trie, not a matrix, because the graph is sparse and prefix matching must be fast.

Routing and Shortest Path

OSPF (Open Shortest Path First) runs Dijkstra's algorithm on each router using a priority queue to find the shortest path to every other node in the network. Each router floods link-state advertisements so that all routers converge on the same graph, then independently compute identical shortest-path trees. In data center networks, equal-cost multipath (ECMP) extends Dijkstra by tracking multiple shortest paths and hashing flows across them for load balancing. Bellman-Ford is used in distance-vector protocols like RIP because it works in a distributed setting where no single node has the full graph. The choice between Dijkstra and Bellman-Ford in a routing protocol is a direct trade-off between convergence speed and topology awareness.

Dependency Graphs in Build Systems and Pipelines

Build systems like Bazel, Make, and Gradle model build targets as nodes in a directed acyclic graph (DAG), where edges represent dependencies. Topological sort determines the order in which targets must be built, and parallel build systems walk the DAG's frontier — nodes with no unbuilt dependencies — to maximize CPU utilization. CI/CD pipelines use the same DAG model: stages depend on earlier stages, and the pipeline engine uses topological ordering to schedule jobs on available workers. If a dependency cycle exists in a build graph, the build system must detect it (using DFS cycle detection) and report an error before attempting any compilation. Tools like Buck and Pants cache build artifacts keyed by a hash of the subgraph, so only the affected subgraph is rebuilt after a change.

Graph Traversal in Service Mesh and Observability

In a microservices architecture, services call each other in patterns that form a call graph — a directed graph where an edge from A to B means A makes HTTP or RPC calls to B. Distributed tracing systems like Jaeger and Zipkin reconstruct this call graph from span data, using DFS or BFS to render flame graphs and identify bottlenecks. Service meshes like Istio use the service graph to enforce traffic policies, circuit breaking, and retries at the proxy layer. When a request fails, the observability system traces the path through the call graph to find the root cause — essentially a graph search problem on live telemetry data. Understanding which services form strongly connected components tells operators which sets of services can create cascading failures.

When Graphs Break: Cycles and Deadlocks

Deadlock in a system with multiple resources and multiple processes can be modeled as a cycle in a resource allocation graph, where nodes are processes and resources, and edges represent holds and waits. Deadlock detection algorithms use DFS to find cycles in this graph; if a cycle exists, at least one process must be rolled back or killed. In distributed databases, deadlock detection requires building and searching a wait-for graph across multiple nodes, which is expensive and typically done periodically rather than continuously. Lock ordering — assigning a global order to locks and always acquiring them in that order — is a graph-coloring approach that prevents cycles from forming. These are not theoretical concerns: MySQL's InnoDB deadlock detector runs continuously and rolls back the transaction with the least undo cost when it finds a cycle.