avatarPrince Canuma

Summary

The provided content discusses Deep Belief Networks (DBNs), their importance in deep learning, and their components, including Boltzmann Machines and Restricted Boltzmann Machines (RBMs), as well as their applications and implementation details.

Abstract

Deep Belief Networks (DBNs) are a class of generative models with deep architectures that play a significant role in deep learning. They consist of multiple layers of Restricted Boltzmann Machines (RBMs), which are stochastic models capable of learning complex probability distributions over their input data. The article outlines the structure and function of Boltzmann Machines, emphasizing the connection restrictions in RBMs that make them more efficient than their fully connected counterparts. It also delves into the training process, including the use of Contrastive Divergence (CD) algorithms, and the ability of DBNs to perform both unsupervised and supervised tasks. The text highlights the flexibility of DBNs in feature learning and classification, and provides resources for further study, including code implementations and references to academic papers.

Opinions

  • The author suggests that DBNs are powerful tools in machine learning, capable of tasks such as image generation and classification.
  • It is noted that RBMs, while less commonly used in their standalone form today, were instrumental in the development of deep learning architectures.
  • The article implies that the layer-by-layer pre-training of DBNs in an unsupervised manner, followed by fine-tuning with back-propagation, is an effective approach for various applications.
  • The author expresses that the energy-based models, like RBMs and DBNs, provide a unified framework for representing many machine learning algorithms.
  • There is an acknowledgment of the historical significance of Geoffrey Hinton's contributions to the field, as he is credited with the invention of both RBMs and DBNs.
  • The text conveys that while Deep Boltzmann Machines (DBMs) have some advantages over DBNs, they also present challenges, such as difficulty in generating samples.
  • The author provides a balanced view by mentioning that despite the effectiveness of DBNs, other deep learning models like CNNs and LSTMs are also widely used for various tasks.

What Are RBMs, Deep Belief Networks and Why Are They Important to Deep Learning?

In this article, we are going to take a look at what are DBNs and where can we use them.

credits

A Deep Belief Network(DBN) is a powerful generative model that uses a deep architecture and in this article, we are going to learn all about it. Don’t worry this is not related to ‘The Secret or Church’, even though it involves ‘Deep Belief’, I promise!

After you read this article you will understand what is, how it works, where to apply and how to code your own Deep Belief Network.

Here is an overview of the points we are going to address:

  • What is a Boltzmann Machine?
  • Restricted Boltzmann Machine
  • Deep Belief Network
  • Deep Boltzmann Machine vs Deep Belief Network

What is a Boltzmann machine?

To give you a bit of background, Boltzmann machines are named after the Boltzmann distribution (also known as Gibbs Distribution and Energy-Based Models — EBM) which is an integral part of Statistical Mechanics and helps us to understand the impact of parameters like Entropy and Temperature on the Quantum States in the field of Thermodynamics. They were invented in 1985 by Geoffrey Hinton and Terry Sejnowski.

There are no output nodes! This may seem strange but this is what gives them this non-deterministic feature. They don’t have the typical 1 or 0 type output through which patterns are learned and optimized using Stochastic Gradient Descent. They learn patterns without that capability and this is what makes them so special!

One thing to note, unlike normal neural networks that don’t have any connections between the input nodes, a Boltzmann Machine has connections among the input nodes. We can see from the image that all the nodes are connected to all other nodes irrespective of whether they are input or hidden nodes. This allows them to share information among themselves and self-generate subsequent data. We only measure what’s on the visible nodes and not what’s on the hidden nodes. When the input is provided, they are able to capture all the parameters, patterns and correlations among the data. This is why they are called Deep Generative Model and fall into the class of Unsupervised Deep Learning.

Restricted Boltzmann machine

RBMs are a two-layered generative stochastic building blocks that can learn a probability distribution over its set of inputs features( i.e. image pixels).

Note: First, they aren’t used as much nowadays if at all and second they aren’t themselves neural networks, they are used as building blocks, more on this on the next section.

RBMs were also invented by Geoffrey Hinton and has many uses cases such as dimensionality reduction, classification, regression, collaborative filtering, feature learning, and topic modelling.

As the name implies, RBMs are a variant of Boltzmann machines with a small difference, their neurons must form a bipartite graph, which means there are no connections between nodes within a group(visible and the hidden) which makes them easy to implement as well as makes them more efficient to train them when compared to Boltzmann Machines.

In particular, this connection restriction allows RBMs to use more efficient and sophisticated training algorithms than the ones available for BM, such as the gradient-based contrastive divergence algorithm.

In simpler terms, this means that we basically have fewer connections.

As shown in the figure above. RBMs hold two sets of random variables (also called neurons): one layer of visible variables/nodes(which is the layer where the inputs go) to represent observable data and one layer of hidden variables to capture dependencies(calculate the probability distribution of the features) of the visible variables.

Forward pass

Example without data
Example using actual data. Image credits

Backward Pass

Example without data
Example with data. Image credits

RBM is a stochastic building block (layer) which means that the weights associated with each neuron are randomly initialized then we perform alternating Gibbs sampling: All of the units in a layer are updated in parallel given the current states of the units in the other layer and this is repeated until the system is sampling from its equilibrium distribution.

Now Given a randomly selected training image 𝑣, the binary state ℎ𝑗 of each hidden unit 𝑗, is set to 1 where its probability is:

𝑃(ℎ 𝑗 = 1|𝒗) = ℊ (𝑏𝑗 + ∑i V𝑖 . W𝑖𝑗 ) — (12)

Where ℊ(𝑥) is the logistic sigmoid function ℊ(𝑥) = 1/(1 + exp(−𝑥)). Therefore 𝑑𝑎𝑡𝑎 can be computed easily.

Where 𝑊𝑖𝑗 represents the symmetric interaction term between visible unit 𝑖 and hidden unit j, 𝑏𝑖 and 𝑎i are bias terms for hidden units and visible units respectively.

Since there are no direct connections between visible units in an RBM, it is very easy to obtain an unbiased sample of the state of a visible unit, given a hidden vector

𝑃(𝑣𝑖 = 1|𝒉) = ℊ (𝑎𝑖 + ∑j ℎ𝑗 W𝑖𝑗 ) — (13)

However computing 𝑚𝑜𝑑𝑒𝑙 is so difficult. It can be done by starting from any random state of the visible units and performing sequential Gibbs sampling for a long time. Finally due to impossibility of this method and large run-times, Contrastive Divergence (CD) method is used.

Contrastive Divergence (CD)

Since Gibbs sampling method is slow, Contrastive Divergence (CD) algorithm is used. In this method, visible units are initialized using training data. Then binary hidden units are computed according to equation (12). After determining binary hidden unit states, 𝑣𝑖 values are recomputed according to equation (13). Finally, the probability of hidden unit activation is computed and using these values of hidden units and visible units, 𝑚𝑜𝑑𝑒𝑙 is computed.

Figure 3: Computation steps in CD1 method.

𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 (𝑒𝑖𝑗) is related to computing 𝑑𝑎𝑡𝑎 for 𝑒𝑖𝑗 connection.

Negative (𝑒𝑖𝑗) is related to computing reconstruction of the data for 𝑒𝑖𝑗 connection.

Although CD1 method is not a perfect gradient computation method, but its results are acceptable. By repeating Gibbs sampling steps, CDk method is achieved. The k parameter is the number of repetitions of Gibbs sampling steps. This method has a higher performance and can compute gradient more exactly.

This method is great at learning features that are very at modelling/reconstructing data input data.

Let’s say you take a binary matrix that is an image of handwritten digit(i.e. number 6), turn it into a binary vector and feed it to trained RBM model, using its trained weights the model will be able to find low energy states compatible with that image and if you give it an image that is not of handwritten digit the model will not be able to find low energy states compatible with that image.

So what is this energy?

An energy function can be defined as a function that we want to minimize or maximize and it is a function of the variables of the system(model weights and bias).

We use energy functions as a unified framework for representing many machine learning algorithms(models).

Deep belief network

A Deep Belief Network(DBN) is a powerful generative model that use a deep architecture of multiple stacks of Restricted Boltzmann machines(RBM).

Each RBM model performs a non-linear transformation(much like a vanilla neural network works) on its input vectors and produces as outputs vectors that will serve as input for the next RBM model in the sequence.

This allows a lot flexibility to DBNs and makes them easier to expand.

Being a generative model allows DBNs to be used in either an unsupervised or a supervised setting. Meaning, DBNs have the ability to do feature learning/extraction and classification that are used in many applications, more on this in the applications section.

Precisely, in feature learning we do layer-by-layer pre-training in an unsupervised manner on the different RBMs that form a DBN and we use back-propagation technique(i.e. gradient descent) to do classification and other tasks by fine-tuning on a small labelled dataset.

Architecture & Fine-tuning

As we already know by now with most of Neural Networks whether CNNs, LSTM, Transformers and etc. Pre-training helps our network generalise better and we can slightly adjust this pre-trained weights to many downstream tasks(i.e. binary classification, multi-class classification and etc) with a small dataset.

Applications

Here are some of the tasks that this family of networks can be used for:

  1. Image generation
  2. Image classification
  3. Video recognition
  4. Motion-capture
  5. And Natural Language Understand(i.e. speech processing), for detailed description, read check out the paper by the creator of DBNs himself Geoffrey Hinton

Deep Boltzmann Machine

After DBNs another moodel called Deep Boltzmann Machine (DBM) was created that trains better and achieves a lower loss, although it had some issues like being hard to generate sample from.

A DBM is a three-layer generative model. They are similar to a Deep Belief Network, but they while DBNs have bidirectional connections in the bottom layer on the other hand DBM has entirely undirected connections.

Now that we are equipped with the theory it is time to dive into the implementation details.

Code implementation

If you looking for a plug and play like implementation of DBN but also Ives lots of flexibility, checkout:

If you a looking for a DIY and step by step tutorial from scratch, checkout:

Conclusion

Deep belief Networks are family of deep architecture networks that uses stacks of Restricted Boltzmann Machines as building blocks. Furthermore, DBNs can be used in a both unsupervised setting for tasks such as image generation and in a supervised setting for tasks such as image classification, and it takes full advantage of great techniques such as unsupervised pre-training and fine tuning on a down stream task.

Acknowledgements

Special thanks to Ms. Esther M Dzitiro for suggesting the topic of this article.

References

Checkout for more detailed explanation: Lecture 12C : Restricted Boltzmann Machines

Lecture 12D : An example of Contrastive Divergence Learning

Gibbs sampling

https://cedar.buffalo.edu/~srihari/CSE676/20.4-DeepBoltzmann.pdf

https://www.cs.toronto.edu/~hinton/absps/fastnc

http://www.robotics.stanford.edu/~ang/papers/icml09-ConvolutionalDeepBeliefNetworks

https://www.cs.toronto.edu/~hinton/absps/ruhijournal.pdf

https://astrostatistics.psu.edu/su14/lectures/CosPop14-2-2-BayesComp-2.pdf

A Tutorial on Energy-Based Learning

Loss Functions for Energy-Based Models With Applications to Object Recognition

Deep Learning
Machine Learning
Artificial Intelligence
Computer Vision
Deep Belief Network
Recommended from ReadMedium