avatarSaiteja Kura

Summary

The website content provides an introduction to applied graphical network analysis using Python's NetworkX library, illustrating its applications in various fields and explaining key concepts such as types of graphs, important nodes, and centrality measures.

Abstract

The article "Applied Graphical Network Analysis using Python" delves into the practical use of graph data structures for network analysis, emphasizing the versatility of the NetworkX library in Python. It outlines the relevance of networks in everyday analysis, exemplifying their use in understanding the spread of Covid-19 through airport connections and identifying influential individuals in Twitter networks. The piece discusses different types of graphs, including undirected, directed, weighted, signed, and multi-networks, and highlights the importance of certain nodes based on centrality measures like degree and betweenness centrality. A case study on the social ties among terrorists involved in the 9/11 attacks further demonstrates the real-world applicability of network analysis. The article concludes by promising future discussions on network connectivity and robustness, inviting readers to explore the NetworkX API for their own analyses.

Opinions

  • The author believes that network analysis is crucial for understanding complex systems across various domains, such as social networks, financial systems, and transportation networks.
  • The article suggests that identifying important nodes, such as hotspots in airport networks or influential people on Twitter, can provide valuable insights for strategic decision-making.
  • The author posits that betweenness centrality is a key metric for identifying bottleneck nodes in a network, which are critical for efficient communication and information flow.
  • The case study on terrorists' social ties implies that network analysis can reveal the underlying structure of covert organizations and potentially aid in counter-terrorism efforts.
  • The author expresses enthusiasm about the potential of network analysis and encourages readers to engage with the NetworkX library to further explore this field.

Applied Graphical Network Analysis using Python

Let us try to understand how we use the graph data-structure to analyze networks using the NetworkX library.

Designed using Canva.

Networks are everywhere. Networks or Graphs are a set of objects (called nodes) having some relationship with each other (called edges). But how can graphs be used in our day-day analysis? I will explain the real-time usage of network analysis using two examples. Don’t missout on the case study at the end! I promise you, it is an interesting one!

Firstly, consider the current challenge our world is facing. It’s the Covid-19 Pandemic. Let us consider we have a network comprising of various airports across the world as shown below. We can find the airports having more number of connections and identify them as hotspots.

Great Circle Mapper / Public domain

Secondly, let us consider a twitter network. We can find out the most influential people by identifying important nodes using network analysis. Nowadays, this is how most of the marketing agencies approach the problem of reachability and spread.

We can clearly notice that network analysis has many applications across various fields like Social networks, financial networks, biological networks, transportation networks, and many more. We will be using the NetworkX library to create graphs in this series of articles. In this part, let us try and understand the basics of Network Analysis.

Basics of NetworkX API

As I already mentioned networks (are called ‘graphs’ mathematically) comprised of nodes and edges. These nodes and edges can have metadata too. Let us consider an example of two friends A and B who met on May 31st, 2020. Let us create a network now.

Querying Data

Types of Graphs

Undirected Networks Edges have no direction. Example — A Facebook social graph. (A sends a friend request to B and both become friends)

Directed Networks Edges have direction. Example — A Twitter social graph. (A follows B but it is not necessary that B follows A)

Weighted Networks Not all relationships(edges) are equal. Some have more weight. Example — Let us consider a network of employees in an organization and the weights on edges represent the number of emails sent to each other. In the above image, A sent 6 emails to B and so on.

Signed Networks Some networks carry information about friendship and enmity which is depicted on the relationship as an attribute. Example — In a website named ‘Epinions’ people can declare a connection as either a friend or a foe.

Multi Networks In some networks, there may exist more than one edge between nodes. Example — Consider the relationship network shown in the image.

Important Nodes

Consider the following network? Which node do you think is important?

You may answer that the central node is important. How did you arrive at this answer? You see that the central node has more neighbors than any other nodes.

Degree Centrality

It is one of the many metrics used to evaluate the importance of a node and is simply defined as —

Examples of nodes with high degree centrality are 1. Twitter broadcasters — (have many followers) 2. Disease Super-spreaders 3. Airport Hubs(AbuDhabi, NewYork, and London)

As expected we can notice that the degree centrality of node 1 is the maximum. The neighbors()method returns the neighbors of that particular node.

Betweenness Centrality

All shortest paths is a set of shortest paths between all pair of nodes in a given network. Betweenness Centrality of a node is defined as —

This metric captures a different view of the importance of a node. Instead of finding nodes with more number of neighbors, it finds the bottleneck nodes in a network. We can find betweenness centrality of nodes of graph G using the nx.betweenness_centrlality(G) method. Consider the metro rail map of Hyderabad city shown below. The nodes with higher betweenness centrality are circled.

I guess you understood the intuition behind betweenness centrality. It assumes that the nodes which connect most of the other nodes in the shortest possible way are most important.

Case Study

A question may pop up in your mind while reading this article. How is the dataset going to be for us to build a network? Until now we have explicitly created nodes and added edges between them. But that's not the case in real life. Let us take an edge file and try to build a network.

Mark Sageman studied the lives of some terrorists who belong to The Hamburg Cell (which was behind the 9/11 attacks). He found the most common factor driving them was the social ties within their cell. He created an analysis which consisted of social ties among terrorists. The data is not 100% complete and accurate.

The first two columns denote the persons who are connected and the following two numbers signify the strength of the connection(5=strong connection, 1=weak connection); and the last column denotes level to which the connection has been verified by the government officials (1 = confirmed, 3 = possible and unconfirmed connections ). The last two columns will not be used in our analysis.

We can convert the above edge file into a graph using the code below.

Now let us find the top three nodes having the highest degree centrality and betweenness centrality.

Similarly, we can find the top 3 betweenness centrality nodes too. We can see the top three betweenness centrality valued nodes in the trap shown below. It is clear that without these nodes the network would become disconnected.

You may think, without the two red circled nodes (in purple) too, the graph is pretty much connected. ie. the node named Imad Eddin Barakat Yarkas still connects the graph. But without the nodes circled in red, time taken for transmission of a message from the center part to each node would be longer. The nodes with high betweenness centrality act as connections that deliver messages in the shortest possible path. Hence the nodes circled in red have the highest betweenness centrality when compared to all other nodes.

Conclusion

In this article, we tried to understand the importance of network analysis across various fields and the basics of networkxAPI. We also learned many real-life situations where network analysis is applied. In continuation, we will discuss about network connectivity and robustness. Stay tuned!

Thanks for reading. Do feel free to share feedback.

References

  1. https://www.oreilly.com/library/view/social-network-analysis/9781449311377/ch04.html
Data Science
Data Analysis
Python
Network Analysis
Data Visualization
Recommended from ReadMedium