An intro to Data Structures: Graphs and its traversal algorithms by@meetzaveri

July 23rd 2018 39,189 reads

The vast majority of algorithms of interest operate on data. Therefore, there are particular ways of organizing data that play a critical role in the design and analysis of algorithms. From that, we can say that data structures are simply ways of organizing data.

They are either **linear or non-linear**. Arrays and linked lists are examples of linear data structures. On the other hand, graphs and trees are forms of non-linear data structures.

- For example, one common data structure is the
**list**or**array**, which is an ordered sequence of values. Here’s a list of numbers: 0, 1, 1, 2, 3, 5, 8, 13. The concept of the list isn’t particular to one language, and it’s also used outside of programming in everyday life — wish lists, shopping lists, and so on.

Algorithms are recipes for logical execution of the problem. They are not the same as data structures. Algorithms are usually “better” if they work faster or more efficiently (using less time, memory, or both).

But here in this article, it’s all about **looking into non-linear data structures: graphs**

A graph is a system in which there are potentially multiple ways to get from an arbitrary point, A, to another arbitrary point, B.

A graph is normally defined as a pair of sets (V,E). V is a set of arbitrary objects called vertices or nodes, and E is a set of pairs of vertices, which we call edges or (more rarely) arcs. In an undirected graph, the edges are unordered pairs, or just sets of two vertices. I usually write** u v** instead of {u,v} to denote the undirected edge between u and v.

In a directed graph, the edges are ordered pairs of vertices. I will use **u → v**instead of (u,v) to denote the directed edge from u to v and vice versa for all edges in this article.

Graphs can also be undirected or directed, cyclic or acyclic (mostly directed), or weighted.

To visit each node or vertex which is a connected component, tree-based algorithms are used. You can do this easily by iterating through all the vertices of the graph, performing the algorithm on each vertex that is still unvisited when examined.

Two algorithms are generally used for the traversal of a graph: Depth first search (DFS) and Breadth first search (BFS).

Depth-first Search (DFS) is an algorithm for searching a graph or tree data structure. The algorithm starts at the root (top) node of a tree and goes as far as it can down a given branch (path), and then backtracks until it finds an unexplored path, and then explores it. The algorithm does this until the entire graph has been explored.

Many problems in computer science can be thought of in terms of graphs. For example, analyzing networks, mapping routes, scheduling, and finding spanning trees are graph problems. To analyze these problems, graph search algorithms like depth-first search are useful. -Source

The simplest pseudo-code would be:

Depth-first search is a common way that many people naturally use when solving problems like mazes.

First, we select a path in the maze (for the sake of this example, let’s choose a path according to some rule we lay out ahead of time) and we follow it until we hit a dead end or reach the end of the maze. If a given path doesn’t work, we backtrack and take an alternative path from a past junction, and try that path.

To turn this into a graph traversal algorithm, we basically replace “child” with “neighbor”. But to prevent infinite loops, we only want to visit each vertex once. Just like in BFS, we can use marks to keep track of the vertices that have already been visited, so we don’t visit them again. Also, just like in BFS, we can use this search to build a spanning tree with certain useful properties.

dfs(vertex v)

{

visit(v);

for each neighbor w of v

if w is unvisited

{

dfs(w);

add edge vw to tree T

}

}

**Here’s the Python implementation using recursion:**

This was a basic overview of how DFS works. If you want to dive deeper, there is some great stuff available all over the internet and also on Medium.

It starts at the tree root (or some arbitrary node of a graph, sometimes referred to as a ‘search key’), and explores all of the neighbor nodes at the present depth prior to moving on to the nodes at the next depth level.

The only catch here is, unlike trees, graphs may contain cycles, so we may come to the same node again. To avoid processing a node more than once, we use a boolean visited array. For simplicity, it is assumed that all vertices are reachable from the starting vertex.

What we do in a BFS is a simple step-by-step process:

- Start from a vertex
**S**. Let this vertex be at what is called…. “Level 0”. - Find all the other vertices that are immediately accessible from this starting vertex
**S**, that is they are only a single edge away (the adjacent vertices). - Mark these adjacent vertices to be at “Level 1”.
- You might be coming back to the same vertex due to a loop or a ring in the graph. If this happens, your BFS will take
**∞**time. So, you will go only to those vertices who do not have their “Level” set to some value. - Mark which is the parent vertex of the current vertex you’re at, i.e., the vertex from which you accessed the current vertex. Do this for all the vertices at Level 1.
- Now, find all those vertices that are a single edge away from all the vertices which are at “Level 1”. These new set of vertices will be at “Level 2”.
- Repeat this process until you run out of graph.\

See this — https://www.programiz.com/dsa/graph-bfs

**Python Implementation Using Queue**

An *undirected graph* is a graph in which edges have no orientation. The edge (*x*, *y*) is identical to the edge (*y*, *x*). That is, they are not ordered pairs, but unordered pairs — i.e., sets of two vertices {*x*, *y*} (or 2-multisets in the case of loops). The maximum number of edges in an undirected graph without a loop is *n*(*n* − 1)/2.

For any edge `u`

and `v`

in an undirected graph, we call u a neighbor of v and vice versa. The degree of a node is its number of neighbors. In directed graphs, we have two kinds of neighbors. For any directed edge **u — >v**, we call u a predecessor of v and v a successor of u.

Among the many properties of graphs, two are important for a great number of applications : connectivity and acyclicity. Both are based on the notion of a path.

A cyclic graph is a graph containing at least one graph cycle. **Also remember that cyclic graphs cannot be a form of tree because tree’s nodes are only visited once via DFS or BFS(traversal methods).**

An acyclic graph is a graph without cycles (a cycle is a complete circuit). When following the graph from node to node, you will never visit the same node twice.

A directed acyclic graph is an acyclic graph that has a direction as well as a lack of cycles.

A tree is a formation of Directed acyclic graph

The parts of the above graph are:

**Integer**= the set for the the Vertices.**Vertices set**= {1,2,3,4,5,6,7}.**Edge set**= {(1,2), (1,3), (2,4), (2,5), (3,6), (4,7), (5,7), (6,7)}.

A directed acyclic graph has a **topological ordering**. This means that the nodes are ordered so that the starting node has a lower value than the ending node. A DAG has a unique topological ordering if it has a directed path containing all the nodes; in this case the ordering is the same as the order in which the nodes appear in the path.

In computer science, DAGs are also called *wait-for-graphs*. When a DAG is used to detect a deadlock, it illustrates that a resources has to *wait for* another process to continue.

**How can we detect cycles in a graph?**

As it turns out, the reason that the depth-first search algorithm is particularly good at detecting cycles is because of the fact that it is efficient at finding backward edges. We’ll look in to DFS algorithm later in this article

Carrying out graph algorithms using the representation of graphs by lists of edges, or by adjacency lists, can be cumbersome if there are many edges in the graph. To simplify computation, graphs can be represented using matrices. Two types of matrices commonly used to represent graphs will be presented here. One is based on the adjacency of vertices, and the other is based on incidence of vertices and edges.

Given an adjacency matrix, we can decide in Θ(1) time whether two vertices are connected by an edge just by looking in the appropriate slot in the matrix. We can also list all the neighbors of a vertex in Θ(V) time by scanning the corresponding row (or column).

**Understanding Adjacency matrix from above given image…**

Let’s understand how adjacency matrix is constructed from above given image. For simplicity I have discussed this only for vertex “a” . Same applies to all vertex.

Due to page limit, I have not included analysis for ** a → d** and

**And what about adjacency list?**

The adjacency list data structure should immediately remind you of hash tables with chaining;the two data structures are identical.

An adjacency list is an array of linked lists, one list per vertex. Each linked list stores the neighbors of the corresponding vertex.

**For example for ****a**** vertex there are edges leading to neighbors as ****b,d and e**** . So its respective linked list contains vertex that are connected via edge.**

Reminder → For undirected graphs, each edge(u,v) is stored twice, once in u’s neighbor list and once in v’s neighbor list; for directed graphs, each edge is stored only once.

A weighted graph (or weighted digraph) is a graph (or di-graph) with numbers assigned to its edges. These numbers are called weights or costs. An interest in such graphs is motivated by numerous real-world applications, such as finding the shortest path between two points in a transportation or communication network or the** ****traveling salesman problem****.**

Google Maps — it’s just one big graph! Where **Edges represent streets and vertices represent crossings.**

Graph theory underlies the Internet. It’s very heavily used in networking code (building routing tables, etc), but it comes up all kinds of places like building an internet search engine, or a social media platform.

- All trees are graphs. Not all graphs are trees.
- A tree is a kind of graph, Only if it’s connected. For instance, the graph consisting of the vertices A and B and no edges is not a tree, although it is an acyclic graph.
- A single vertex is also considered a tree (no cycles, vacuously connected). So two unconnected vertices makes a forest of two trees.
- A tree is a special kind of graph where there are never multiple paths, that there is always only one way to get from A to B, for all possible combinations of A and B.

Resources:

http://www.cs.uiuc.edu/~jeffe/teaching/algorithms