Applied Graphical Network Analysis using Python
Let us try to understand how we use the graph data-structure to analyze networks using the NetworkX library.

Networks are everywhere. Networks or Graphs are a set of objects (called nodes) having some relationship with each other (called edges). But how can graphs be used in our day-day analysis? I will explain the real-time usage of network analysis using two examples. Don’t missout on the case study at the end! I promise you, it is an interesting one!
Firstly, consider the current challenge our world is facing. It’s the Covid-19 Pandemic. Let us consider we have a network comprising of various airports across the world as shown below. We can find the airports having more number of connections and identify them as hotspots.

Secondly, let us consider a twitter network. We can find out the most influential people by identifying important nodes using network analysis. Nowadays, this is how most of the marketing agencies approach the problem of reachability and spread.
We can clearly notice that network analysis has many applications across various fields like Social networks, financial networks, biological networks, transportation networks, and many more. We will be using the NetworkX library to create graphs in this series of articles. In this part, let us try and understand the basics of Network Analysis.
Basics of NetworkX API
As I already mentioned networks (are called ‘graphs’ mathematically) comprised of nodes and edges. These nodes and edges can have metadata too. Let us consider an example of two friends A and B who met on May 31st, 2020. Let us create a network now.

Querying Data

Types of Graphs

Undirected Networks Edges have no direction. Example — A Facebook social graph. (A sends a friend request to B and both become friends)
Directed Networks Edges have direction. Example — A Twitter social graph. (A follows B but it is not necessary that B follows A)
Weighted Networks Not all relationships(edges) are equal. Some have more weight. Example — Let us consider a network of employees in an organization and the weights on edges represent the number of emails sent to each other. In the above image, A sent 6 emails to B and so on.
Signed Networks Some networks carry information about friendship and enmity which is depicted on the relationship as an attribute. Example — In a website named ‘Epinions’ people can declare a connection as either a friend or a foe.
Multi Networks In some networks, there may exist more than one edge between nodes. Example — Consider the relationship network shown in the image.
Important Nodes
Consider the following network? Which node do you think is important?

You may answer that the central node is important. How did you arrive at this answer? You see that the central node has more neighbors than any other nodes.
Degree Centrality
It is one of the many metrics used to evaluate the importance of a node and is simply defined as —

Examples of nodes with high degree centrality are 1. Twitter broadcasters — (have many followers) 2. Disease Super-spreaders 3. Airport Hubs(AbuDhabi, NewYork, and London)

As expected we can notice that the degree centrality of node 1 is the maximum. The neighbors()
method returns the neighbors of that particular node.
Betweenness Centrality
All shortest paths is a set of shortest paths between all pair of nodes in a given network. Betweenness Centrality of a node is defined as —

This metric captures a different view of the importance of a node. Instead of finding nodes with more number of neighbors, it finds the bottleneck nodes in a network. We can find betweenness centrality of nodes of graph G using the nx.betweenness_centrlality(G)
method. Consider the metro rail map of Hyderabad city shown below. The nodes with higher betweenness centrality are circled.

I guess you understood the intuition behind betweenness centrality. It assumes that the nodes which connect most of the other nodes in the shortest possible way are most important.
Case Study
A question may pop up in your mind while reading this article. How is the dataset going to be for us to build a network? Until now we have explicitly created nodes and added edges between them. But that's not the case in real life. Let us take an edge file and try to build a network.
Mark Sageman studied the lives of some terrorists who belong to The Hamburg Cell (which was behind the 9/11 attacks). He found the most common factor driving them was the social ties within their cell. He created an analysis which consisted of social ties among terrorists. The data is not 100% complete and accurate.

The first two columns denote the persons who are connected and the following two numbers signify the strength of the connection(5=strong connection, 1=weak connection); and the last column denotes level to which the connection has been verified by the government officials (1 = confirmed, 3 = possible and unconfirmed connections ). The last two columns will not be used in our analysis.
We can convert the above edge file into a graph using the code below.