Graphs (Silver)
What's Covered
Fundamental properties/definitions of graphs
Writing graphs in code
Adjacency matrices
Adjacency lists
Tradeoffs
Recursive DFS traversal of graphs
DFS to visit all nodes in one connected component
Floodfill to find all connected components
Prerequisites
Required
Writing functions
Time complexity analysis
Optional
Prior experience with graphs or recursion
What's Explicitly Omitted
BFS with queues (and correspondingly, iterative DFS with stacks), since these topics are not required for silver but rather gold. However, if you code in Python, the overhead of calls in recursive DFS may require you to consider these iterative options
Path algorithms for weighted graphs, spanning trees, and anything beyond
Graphs are best understood graphically (as you might expect); admittedly, yes, this is less rigorous than the approach a graph theory textbook might take, but as far as understanding goes for our intents and purposes, this is superior. This means we begin our study of graphs with a visual representation of what a graph is. You can.follow along by drawing graphs out on paper (as you would in a contest), or using https://csacademy.com/app/graph_editor/, an exceptional website for visualizing graphs.
A graph is essentially a diagram with two types of visual components, nodes and edges, and we note some special properties of each as sub-points:
Nodes (sometimes called vertices in geometry): these are just dots
To refer to nodes, we often number them for convenience. If there are
N
nodes, we could zero-index them like0...N-1
or one-index them like1...N
. Different people have different preferences, but I personally prefer zero-indexing.
Edges (sometimes called segments in geometry): these are just lines that each connect a pair of distinct dots (nodes)
To refer to edges, we can name their two endpoints (the pair of nodes they connect), but similar to nodes,
M
edges might be numbered by zero-index like0...M-1
or one-index like1...M
.Two nodes directly connected to one another by a single edge are called adjacent (or directly connected). Two nodes that can reach one another by some sequence of two or more intermediate edges are called indirectly connected.
Taking connectivity one step further, it can be proven that all graphs are a union of one or several connected components, each of whose nodes are all directly or indirectly connected to one another.
Sometimes edges have a specific direction connecting one node to another, in which case they are directed edges, indicated visually by arrows as you will see in our examples. Edges without specific directions are also known as undirected edges.
Sometimes edges will have numbers, or weights, written on top of them to indicate properties like distance; these are called weighted edges. Edges without weights are unweighted edges.
Although rare, there could be multiple edges between some pairs of nodes
Just based on intuition from these properties alone, we can already begin drawing some examples. Take some time to digest these, and try to spot the properties mentioned above. (I will source these rather than drawing them from scratch; for credit, a simple Google search of "node graph example" should suffice):
But why even use graphs? Graphs can be powerful representations of entities in problems and how they are associated with one another (see the caption under the last image above). For example, an urban city might have homes, shops, and schools as nodes and the streets/roads interconnecting them as edges.
Now, we look at how to write graphs in code. To do this, note that the main challenge with graphs is keeping track of which nodes are adjacent and the weight/direction of the edges directly connecting them. Although there are numerous approaches, the two most common by far are adjacency matrices and adjacency lists.
Imagine a graph G
with N
nodes and M
edges.
We could represent G
with just a single N x N
2D array (think of this like a grid, or more formally, a matrix), where G[i][j]
is just a boolean representing whether or not node i
is adjacent/directly connected to node j
-- that is, whether or not there exists an edge from i
to j
(which we would call edge ij
). This is an adjacency matrix.
If G
is undirected (meaning that all of its M
edges are undirected), then for all node pairs i,j
, it follows that G[i][j] = G[j][i]
(put another way, ij
and ji
would be two names for the same undirected edge). If this is the case, note that we could just force without loss of generality that i < j
(or vice versa), and then not define G[j][i]
. But this is a trivial optimization and really not worth the inconvenience, so when we set up our graph in the undirected case, we usually just set G[i][j]
and G[j][i]
to the same boolean.
If G
is directed (meaning that some of its M
edges are directed), then for some node pairs i,j
, directed edge ij
exists and directed edge ji
does not (or vice versa). Luckily, this is very easy to handle: we could just say, for example, that G[i][j] = 1
but G[j][i] = 0
. As such, in the directed case, we just set up G[i][j]
and G[j][i]
independently.
Last updated