Introduction to Graph Neural Networks

The next big thing in Deep Learning

Graph Neural Networks (GNNs) are the next big thing.

As you might know, Deep Learning is good at capturing hidden patterns of Euclidean data (images, text, videos). But let me ask, what happens when data comes from non-Euclidean domains, represented as graphs with complex relationships between objects?

This is where GNNs kick in, they’re experiencing a huge increase in popularity in domains like social networks, recommendation systems, even life sciences!

What is a graph

A graph is a data structure, and it consists of two parts: edges and nodes (or vertices). So, formally, you’ll have a set G as follows.

The edges can be directed if there exist directional dependencies between the vertices, otherwise they’re undirected.

A graph can be represented with its adjacency matrix, A,which will have dimension n x n,where n is the number of nodes. Moreover, you might encounter a node with a set of features (imagine a user profile): so, if a node has f features,thenthe node will be represented with its feature matrix X, with dimension n x f.

Graphs tend to cause problems with the existing Machine Learning algorithms. This is due to the complexity of this data structure. Machine Learning algorithms prefer simpler and structured data, like images. In general, Graphs don’t have a fixed form, the nodes are unordered and they variate in number and number of neighbors (this is why especially CNNs don’t perform well on graphs).

Deep Learning for Graphs

Since the pre-existing algorithms don’t work for Graphs, we introduce the idea of Node embedding.

Node embedding consists in mapping nodes to a d-dimensional embedding space, so that similar nodes in the graph are embedded close to each other. So, the task is to map nodes so that similarity in the embedding space approximates similarity in the network.

Taking u and v as two nodes, we can define x_u and x_v as the two feature vectors associated.

We can now define the encoder function ENC(u) and ENC(v) to convert x_u and x_v in z_u and z_v,in the d-dimensional embedding space.

The encoder function

We need to address three important aspects to define this encoder function.

Locality
Information aggregation
Computation (putting together multiple layers)

The locality can be achieved with a computational graph, which basically is a construction of all the possible connections of a node, thus achieving as well valuable feature information.

We can now start to aggregate, using neural networks.

Forward propagation

How does the information in the input go to the output?

Defining (X_A) the feature vector of node A, we need three steps.

Initialization of the activation units

General structure of a layer in the network

Where B and W are basically trainable weight matrices, and we’re averaging all the neighbors of node v, then the previous layer embedding of node v multiplied with a bias B is added. Finally, we introduce non-linearity via σ.

Last layer

Embedding after K layers of neighborhood aggregation.

We can now feed the embeddings into any loss function and run stochastic gradient descent to train the weight parameters.

Obviously, the training can be supervised or unsupervised:

Supervised: train model for a supervised task like node classification
Unsupervised: here, you only use the graph structure, based on the fact that similar nodes have similar embeddings. One unsupervised loss function can be a loss based on node proximity in the graph, or random walks.

Applications of Graph Neural Networks

Probably a lot of people don’t realize it, but Graphs and graph-structured data are everywhere, and there are different tasks GNNs are able to solve:

Node classification: determine the labeling of samples (the nodes) by looking at the labels of their neighbors, usually it’s a semi-supervised approach (only a part of the graph is labeled).
Graph Classification: this is much like image classification, it consists of classifying an entire graph into different categories (determining whether a protein is an enzyme or not in bioinformatics, categorizing documents in NLP, or social network analysis).
Graph clustering: refers to the clustering of data in the form of graphs.
Link prediction: to understand the relationship between entities in graphs, and it also tries to predict whether there’s a connection between two entities. Just imagine the social network domain, and social interactions, or recommender system problems.

Applications can be found literally anywhere, there are a lot, and here’s some of them.

GNNs in Computer Vision

Here the applications are growing every year. For example, in object detection, GNNs are used to calculate RoI features; in interaction detection, GNN is message-passing tools between humans and objects.

Other applications include human-object interaction, few-shot image classification, and way more.

GNNs in Natural Language Processing

A classic application of GNNs in NLP is Text Classification. GNNs utilize the inter-relations of documents or words to infer document labels. Another application is Relation Extraction, the task of extracting semantic relations from the text, which usually occur between two or more entities (it’s achieved using Graph Convolutional Networks, or Graph Attention Networks).

GNNs in life sciences

We can perform GNN-based reasoning about objects, relations, and physics in an effective way. Interaction networks can be trained to reason about the interactions of objects in a complex physical system, like collision dynamics.

Chemists can use GNNs to research the graph structure of molecules or compounds. In these graphs, nodes are atoms, and edges are chemical bonds. Just think about AlphaFold by DeepMind.

Conclusion

Graph Neural Networks are becoming very powerful and they increased a lot in popularity in those years. They most likely will become the next big thing in many domains, just like CNNs for Computer Vision.

Originally published at https://www.alessandroai.com on September 6, 2021.

Summarize