avatarRahul Bhadani

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

6691

Abstract

ure><p id="d2c1">During the neural network training, parameters evolve to maximize the log likelihood. There are a number of choices that can be made in choosing a loss function such as adversarial loss, but the choice entirely depends on the application.</p><p id="c562">I will discuss <b>variational inference</b> separately from a broader context in an upcoming article. Be sure to <a href="https://rahulbhadani.medium.com/subscribe">subscribe to my email list</a> to receive a notification about that. In the meantime, let’s look at some code using Python.</p><h2 id="0f0e">Example</h2><p id="9928">For the example, I will use the Flowtorch library that can be installed using</p><div id="7e6d"><pre>pip <span class="hljs-keyword">install</span> flowtorch</pre></div><p id="cf9c">While in my <a href="https://towardsdatascience.com/stat-stories-variable-transformation-to-generate-new-distributions-d4607cb32c30">previous articles</a>, I derived the transformed density function by hand, we can use Flowtorch’s implementation of Normalizing Flows for learnable transforms and estimate density estimation.</p><p id="14d7">Let’s look at the samples of two concentric circle dataset</p><div id="d458"><pre><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np <span class="hljs-keyword">from</span> sklearn <span class="hljs-keyword">import</span> datasets <span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> StandardScaler

n_samples = <span class="hljs-number">1000</span> X, y = datasets.make_circles(n_samples=n_samples, factor=<span class="hljs-number">0.5</span>, noise=<span class="hljs-number">0.05</span>) X = StandardScaler().fit_transform(X)

plt.title(<span class="hljs-string">r'Samples from p(x_1,x_2)'</span>) plt.xlabel(<span class="hljs-string">r'x_1'</span>) plt.ylabel(<span class="hljs-string">r'x_2'</span>) plt.scatter(X[:,<span class="hljs-number">0</span>], X[:,<span class="hljs-number">1</span>], alpha=<span class="hljs-number">0.5</span>) plt.show()</pre></div><figure id="5bda"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ewHbjlImKQUrVw78p716VQ.png"><figcaption>Samples from concentric circle dataset: Joint distribution (created by the Author)</figcaption></figure><div id="5417"><pre>plt.subplot(1, 2, 1) sns.distplot(X[:,0], <span class="hljs-attribute">hist</span>=<span class="hljs-literal">False</span>, <span class="hljs-attribute">kde</span>=<span class="hljs-literal">True</span>, <span class="hljs-attribute">bins</span>=None,hist_kws={<span class="hljs-string">'edgecolor'</span>:<span class="hljs-string">'black'</span>}, kde_kws={<span class="hljs-string">'linewidth'</span>: 2})</pre></div><div id="324d"><pre>plt.title(<span class="hljs-string">r'p(x_1)'</span>) plt.subplot(<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">2</span>) sns.distplot(X[:,<span class="hljs-number">1</span>], hist=<span class="hljs-literal">False</span>, kde=<span class="hljs-literal">True</span>, bins=<span class="hljs-literal">None</span>, hist_kws={<span class="hljs-string">'edgecolor'</span>:<span class="hljs-string">'black'</span>}, kde_kws={<span class="hljs-string">'linewidth'</span>: <span class="hljs-number">2</span>})</pre></div><div id="86ac"><pre>plt.title(<span class="hljs-string">r'p(x_2)'</span>) plt.show()</pre></div><figure id="e23c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*-z6hvs6b3iYdYWv9t6ttqw.png"><figcaption>Marginal distribution (created by the Author)</figcaption></figure><p id="9b04">We can learn the marginal transform <code>bij.Spline.</code> Knots and derivatives of splines act as parameters that can be learned using stochastic gradient descent:</p><div id="eef4"><pre>dist_x = torch<span class="hljs-selector-class">.distributions</span><span class="hljs-selector-class">.Independent</span>( torch<span class="hljs-selector-class">.distributions</span><span class="hljs-selector-class">.Normal</span>(torch<span class="hljs-selector-class">.zeros</span>(<span class="hljs-number">2</span>), torch<span class="hljs-selector-class">.ones</span>(<span class="hljs-number">2</span>)), <span class="hljs-number">1</span> ) bijector = bij<span class="hljs-selector-class">.Spline</span>() dist_y = dist<span class="hljs-selector-class">.Flow</span>(dist_x, bijector)</pre></div><div id="fd1c"><pre><span class="hljs-attr">optimizer</span> = torch.optim.Adam(dist_y.parameters(), lr=<span class="hljs-number">1</span>e-<span class="hljs-number">2</span>) <span class="hljs-attr">steps</span> = <span class="hljs-number">5000</span></pre></div><div id="66ce"><pre>X = torch.Tensor(X) <span class="hljs-keyword">for</span> <span class="hljs-keyword">step</span> <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(steps): optimizer.zero_grad() loss = -dist_y.log_prob(X).<span class="hljs-built_in">mean</span>() loss.backward() optimizer.<span class="hljs-keyword">step</span>()

<span class="hljs-keyword">if</span> <span class="hljs-keyword">step</span> <span class="hljs-symbol">%</span> <span class="hljs-number">200</span> == <span class="hljs-number">0</span>:
    <span class="hljs-built_in">print</span>('<span class="hljs-keyword">step</span>: {}, loss: {}'.format(<span class="hljs-keyword">step</span>, loss.item()))</pre></div><p id="8d85">Now, we can plot samples from the transformed distribution after learning:</p><div id="5d39"><pre>X_flow = dist_y.sample(torch.Size([<span class="hljs-number">1000</span>,])).detach().numpy()

plt.title(<span class="hljs-string">r'Joint Distribution'</span>) plt.xlabel(<span class="hljs-string">r'x_1'</span>) plt.ylabel(<span class="hljs-string">r'x_2'</span>) plt.scatter(X[:,<span class="hljs-number">0</span>], X[:,<span class="hljs-number">1</span>], label=<span class="hljs-string">'data'</span>, alpha=<span class="hljs-number">0.5</span>) plt.scatter(X_flow[:,<span class="hljs-number">0</span>], X_flow[:,<span class="hljs-number">1</span>], color=<span class="hljs-string">'firebrick'</span>, label=<span class="hljs-string">'flow'</span>, alpha=<span class="hljs-number">0.5</span>) plt.legend() plt.show()</pre></div><figure id="cf57"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*xIbebZlVrl7W-57Com7XRw.png"><figcaption>Samples from newly learned transformed distribution are shown using red dots. (created by the Author)</figcaption></figure><p id="b94a">and we can plot the learned marginal distribution:</p><div id="0c7a"><pre>plt.subplot(1, 2, 1) sns.distplot(X[:,0], <span class="hljs-attribute">

Options

hist</span>=<span class="hljs-literal">False</span>, <span class="hljs-attribute">kde</span>=<span class="hljs-literal">True</span>, <span class="hljs-attribute">bins</span>=None, hist_kws={<span class="hljs-string">'edgecolor'</span>:<span class="hljs-string">'black'</span>}, kde_kws={<span class="hljs-string">'linewidth'</span>: 2}, <span class="hljs-attribute">label</span>=<span class="hljs-string">'data'</span>) sns.distplot(X_flow[:,0], <span class="hljs-attribute">hist</span>=<span class="hljs-literal">False</span>, <span class="hljs-attribute">kde</span>=<span class="hljs-literal">True</span>, <span class="hljs-attribute">bins</span>=None, <span class="hljs-attribute">color</span>=<span class="hljs-string">'firebrick'</span>, hist_kws={<span class="hljs-string">'edgecolor'</span>:<span class="hljs-string">'black'</span>}, kde_kws={<span class="hljs-string">'linewidth'</span>: 2}, <span class="hljs-attribute">label</span>=<span class="hljs-string">'flow'</span>) plt.title(r<span class="hljs-string">'p(x_1)'</span>) plt.subplot(1, 2, 2) sns.distplot(X[:,1], <span class="hljs-attribute">hist</span>=<span class="hljs-literal">False</span>, <span class="hljs-attribute">kde</span>=<span class="hljs-literal">True</span>, <span class="hljs-attribute">bins</span>=None, hist_kws={<span class="hljs-string">'edgecolor'</span>:<span class="hljs-string">'black'</span>}, kde_kws={<span class="hljs-string">'linewidth'</span>: 2}, <span class="hljs-attribute">label</span>=<span class="hljs-string">'data'</span>) sns.distplot(X_flow[:,1], <span class="hljs-attribute">hist</span>=<span class="hljs-literal">False</span>, <span class="hljs-attribute">kde</span>=<span class="hljs-literal">True</span>, <span class="hljs-attribute">bins</span>=None, <span class="hljs-attribute">color</span>=<span class="hljs-string">'firebrick'</span>, hist_kws={<span class="hljs-string">'edgecolor'</span>:<span class="hljs-string">'black'</span>}, kde_kws={<span class="hljs-string">'linewidth'</span>: 2}, <span class="hljs-attribute">label</span>=<span class="hljs-string">'flow'</span>) plt.title(r<span class="hljs-string">'p(x_2)'</span>) plt.show()</pre></div><figure id="67ba"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*fu3GJHgpvarB3XU4ZKviGA.png"><figcaption>Learned marginal distribution (created by the Author)</figcaption></figure><p id="486b">which from the plot, seems close to the actual distribution. Of course, we can do even better but that’s for later.</p><p id="5de9">There are several other libraries available for using the Normalizing Flows method such as <code>normflows</code> , <code>ProbFlow</code>, etc. In addition, I found the following resources to be helpful:</p><ol><li><a href="https://gowrishankar.info/blog/normalizing-flows-a-practical-guide-using-tensorflow-probability/">https://gowrishankar.info/blog/normalizing-flows-a-practical-guide-using-tensorflow-probability/</a></li><li><a href="https://github.com/LukasRinder/normalizing-flows">https://github.com/LukasRinder/normalizing-flows</a></li><li><a href="https://probflow.readthedocs.io/en/latest/examples/normalizing_flows.html">https://probflow.readthedocs.io/en/latest/examples/normalizing_flows.html</a></li><li><a href="https://github.com/VincentStimper/normalizing-flows">https://github.com/VincentStimper/normalizing-flows</a></li><li><a href="https://github.com/tatsy/normalizing-flows-pytorch">https://github.com/tatsy/normalizing-flows-pytorch</a></li><li><a href="https://vishakh.me/posts/normalizing_flows/">https://vishakh.me/posts/normalizing_flows/</a></li><li><a href="https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial11/NF_image_modeling.html">https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial11/NF_image_modeling.html</a></li><li><a href="https://gebob19.github.io/normalizing-flows/">https://gebob19.github.io/normalizing-flows/</a></li></ol><h2 id="90ea">Conclusion</h2><p id="1129">The article provides a concise introduction to the Normalizing Flows method starting from the variable transformation to generating new distribution. The application of this statistical method combined with the neural network ranges from fake image generation to anomaly detection and finding novel molecules and materials. I recommend readers check the resources I mentioned above to get a deeper understanding of the Normalizing Flows. In future articles, I will present new developments in Normalizing Flows.</p><p id="be9e">The notebook associated with the Python code above can be obtained here: <a href="https://github.com/rahulbhadani/medium.com/blob/ec92a9bc7b2aa165df630ed5e268ec58fc0716a2/10_09_2022/normflow.ipynb">https://github.com/rahulbhadani/medium.com/blob/ec92a9bc7b2aa165df630ed5e268ec58fc0716a2/10_09_2022/normflow.ipynb</a></p><h1 id="7f7f">References</h1><ol><li>Clustering and classification through normalizing flows in feature space <a href="https://www.researchgate.net/profile/Martin-Cadeiras/publication/220385824_Clustering_and_Classification_through_Normalizing_Flows_in_Feature_Space/links/54da12330cf2464758204dbb/Clustering-and-Classification-through-Normalizing-Flows-in-Feature-Space.pdf">https://www.researchgate.net/profile/Martin-Cadeiras/publication/220385824_Clustering_and_Classification_through_Normalizing_Flows_in_Feature_Space/links/54da12330cf2464758204dbb/Clustering-and-Classification-through-Normalizing-Flows-in-Feature-Space.pdf</a></li><li>A family of non-parametric density estimation algorithms <a href="https://ri.conicet.gov.ar/bitstream/handle/11336/8930/CONICET_Digital_Nro.12124.pdf?sequence=1">https://ri.conicet.gov.ar/bitstream/handle/11336/8930/CONICET_Digital_Nro.12124.pdf?sequence=1</a></li><li>Kobyzev, I., Prince, S. J., & Brubaker, M. A. (2020). Normalizing flows: An introduction and review of current methods. <i>IEEE transactions on pattern analysis and machine intelligence</i>, <i>43</i>(11), 3964–3979.</li></ol><p id="77d6">I hope this article was helpful to you in getting started with an exciting topic in statistics and data science.</p><p id="115b"><i>Was this helpful? <a href="https://www.buymeacoffee.com/rahulbhadani">Buy me a Coffee</a>.</i></p><p id="dfbf"><i>Love my writing? Join my <a href="https://rahulbhadani.medium.com/subscribe">email list</a>.</i></p><p id="a02c"><i>Want to know more about STEM-related topics? Join <a href="https://rahulbhadani.medium.com/membership">Medium</a>.</i></p></article></body>

Stat Stories: Normalizing Flows as an Application of Variable Transformation

Generative Models for Tractable Distributions

Lake Arrowhead in California, Picture by the Author

In my previous episodes of the Stat Stories series, I talked about methods of variable transformation to generate new distribution. The discussion on the variable transformation, both for univariate and multivariate distributions leads to Normalizing Flows.

I recommend reading the discussion on variable transformation for generating new distributions as a prerequisite to understanding Normalizing Flows.

Introduction

A big challenge in statistical machine learning is to model a probability distribution if we are already given samples from some distribution. Normalizing flows was first coined by Tabak and VandenEijnden [2010] and Tabak and Turner [2013] in the context of clustering, classification, and density estimation.

Definition: Normalizing Flows can be defined as a transformation of a simpler probability distribution such as uniform distribution into a complicated distribution such as one that can give you a random sample of cat images by applying a sequence of invertible transformations.

As a result of a sequence of invertible transformations, we can obtain new families of distributions by selecting an initial density function that is simple and then chaining together a number of parametrized, invertible and differentiable transformations. This way, we can obtain samples corresponding to new densities.

One thing to note is that in the context of Normalizing flows, transformation is parametrized as compared to my initial discussion in https://rahulbhadani.medium.com/stat-stories-variable-transformation-to-generate-new-distributions-d4607cb32c30 where the transformation I used doesn’t contain any parameter. However, the idea stays the same.

Let’s look at the formula of the variable transformation once again:

Equation 1. Transformation formula for a multivariate distribution (created by the Author)

where U is a multivariate random vector for the new distribution and X is the multivariate random vector for the original initial distribution. J is the Jacobian. In the context of Normalizing flows, the new density function fᵤ is called pushforward, and g is called the generator. This movement from the initial simple density to the final complicated density is called the generative direction. The inverse function g⁻¹ moves in the opposite direction called the normalizing direction. That is why the overall process of transformation is called Normalizing flows. To generate a data point corresponding to U, apply the transformation u = g(x).

For a more detailed and formal approach to the definition of Normalizing Flows, I recommend looking at Normalizing Flows: An Introduction and Review of Current Methods (https://arxiv.org/pdf/1908.09257.pdf).

Applications of Normalizing Flows

While other statistical methods such as Generative Adversarial Networks (GAN) and Variational AutoEncoders (VAN) have been able to perform dramatic results on difficult tasks such as learning distributions of images, and other complicated datasets, they do not allow evaluation of density estimation and calculation of probability density of new data points. In such a sense, Normalizing Flows proves to be eloquent. The method can perform density estimation and sampling as well as variational inferences.

Density Estimation and Sampling

Consider a transformation u = g(x ; θ), i.e. , g is parametrized by parameter vector θ. Initial probability density function fₓ is parametrized by a vector φ, i.e. fₓ(x | φ). If we have sample points 𝓓 corresponding to the desired distribution F_U, then we can perform log-likelihood estimation of parameters Θ = (θ, φ) as follows:

Equation 2: log-likelihood estimation (created by the Author)

During the neural network training, parameters evolve to maximize the log likelihood. There are a number of choices that can be made in choosing a loss function such as adversarial loss, but the choice entirely depends on the application.

I will discuss variational inference separately from a broader context in an upcoming article. Be sure to subscribe to my email list to receive a notification about that. In the meantime, let’s look at some code using Python.

Example

For the example, I will use the Flowtorch library that can be installed using

pip install flowtorch

While in my previous articles, I derived the transformed density function by hand, we can use Flowtorch’s implementation of Normalizing Flows for learnable transforms and estimate density estimation.

Let’s look at the samples of two concentric circle dataset

import numpy as np
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

n_samples = 1000
X, y = datasets.make_circles(n_samples=n_samples, factor=0.5, noise=0.05)
X = StandardScaler().fit_transform(X)

plt.title(r'Samples from $p(x_1,x_2)$')
plt.xlabel(r'$x_1$')
plt.ylabel(r'$x_2$')
plt.scatter(X[:,0], X[:,1], alpha=0.5)
plt.show()
Samples from concentric circle dataset: Joint distribution (created by the Author)
plt.subplot(1, 2, 1)
sns.distplot(X[:,0], hist=False, kde=True,
bins=None,hist_kws={'edgecolor':'black'}, kde_kws={'linewidth': 2})
plt.title(r'$p(x_1)$')
plt.subplot(1, 2, 2)
sns.distplot(X[:,1], hist=False, kde=True, bins=None, hist_kws={'edgecolor':'black'}, kde_kws={'linewidth': 2})
plt.title(r'$p(x_2)$')
plt.show()
Marginal distribution (created by the Author)

We can learn the marginal transform bij.Spline. Knots and derivatives of splines act as parameters that can be learned using stochastic gradient descent:

dist_x = torch.distributions.Independent(
  torch.distributions.Normal(torch.zeros(2), torch.ones(2)), 
  1
)
bijector = bij.Spline()
dist_y = dist.Flow(dist_x, bijector)
optimizer = torch.optim.Adam(dist_y.parameters(), lr=1e-2)
steps = 5000
X = torch.Tensor(X)
for step in range(steps):
    optimizer.zero_grad()
    loss = -dist_y.log_prob(X).mean()
    loss.backward()
    optimizer.step()

    if step % 200 == 0:
        print('step: {}, loss: {}'.format(step, loss.item()))

Now, we can plot samples from the transformed distribution after learning:

X_flow = dist_y.sample(torch.Size([1000,])).detach().numpy()
plt.title(r'Joint Distribution')
plt.xlabel(r'$x_1$')
plt.ylabel(r'$x_2$')
plt.scatter(X[:,0], X[:,1], label='data', alpha=0.5)
plt.scatter(X_flow[:,0], X_flow[:,1], color='firebrick', label='flow', alpha=0.5)
plt.legend()
plt.show()
Samples from newly learned transformed distribution are shown using red dots. (created by the Author)

and we can plot the learned marginal distribution:

plt.subplot(1, 2, 1)
sns.distplot(X[:,0], hist=False, kde=True,
             bins=None,
             hist_kws={'edgecolor':'black'},
             kde_kws={'linewidth': 2},
             label='data')
sns.distplot(X_flow[:,0], hist=False, kde=True,
             bins=None, color='firebrick',
             hist_kws={'edgecolor':'black'},
             kde_kws={'linewidth': 2},
             label='flow')
plt.title(r'$p(x_1)$')
plt.subplot(1, 2, 2)
sns.distplot(X[:,1], hist=False, kde=True,
             bins=None,
             hist_kws={'edgecolor':'black'},
             kde_kws={'linewidth': 2},
             label='data')
sns.distplot(X_flow[:,1], hist=False, kde=True,
             bins=None, color='firebrick',
             hist_kws={'edgecolor':'black'},
             kde_kws={'linewidth': 2},
             label='flow')
plt.title(r'$p(x_2)$')
plt.show()
Learned marginal distribution (created by the Author)

which from the plot, seems close to the actual distribution. Of course, we can do even better but that’s for later.

There are several other libraries available for using the Normalizing Flows method such as normflows , ProbFlow, etc. In addition, I found the following resources to be helpful:

  1. https://gowrishankar.info/blog/normalizing-flows-a-practical-guide-using-tensorflow-probability/
  2. https://github.com/LukasRinder/normalizing-flows
  3. https://probflow.readthedocs.io/en/latest/examples/normalizing_flows.html
  4. https://github.com/VincentStimper/normalizing-flows
  5. https://github.com/tatsy/normalizing-flows-pytorch
  6. https://vishakh.me/posts/normalizing_flows/
  7. https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial11/NF_image_modeling.html
  8. https://gebob19.github.io/normalizing-flows/

Conclusion

The article provides a concise introduction to the Normalizing Flows method starting from the variable transformation to generating new distribution. The application of this statistical method combined with the neural network ranges from fake image generation to anomaly detection and finding novel molecules and materials. I recommend readers check the resources I mentioned above to get a deeper understanding of the Normalizing Flows. In future articles, I will present new developments in Normalizing Flows.

The notebook associated with the Python code above can be obtained here: https://github.com/rahulbhadani/medium.com/blob/ec92a9bc7b2aa165df630ed5e268ec58fc0716a2/10_09_2022/normflow.ipynb

References

  1. Clustering and classification through normalizing flows in feature space https://www.researchgate.net/profile/Martin-Cadeiras/publication/220385824_Clustering_and_Classification_through_Normalizing_Flows_in_Feature_Space/links/54da12330cf2464758204dbb/Clustering-and-Classification-through-Normalizing-Flows-in-Feature-Space.pdf
  2. A family of non-parametric density estimation algorithms https://ri.conicet.gov.ar/bitstream/handle/11336/8930/CONICET_Digital_Nro.12124.pdf?sequence=1
  3. Kobyzev, I., Prince, S. J., & Brubaker, M. A. (2020). Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence, 43(11), 3964–3979.

I hope this article was helpful to you in getting started with an exciting topic in statistics and data science.

Was this helpful? Buy me a Coffee.

Love my writing? Join my email list.

Want to know more about STEM-related topics? Join Medium.

Data Science
Machine Learning
Statistics
Python
Recommended from ReadMedium