Free AI web copilot to create summaries, insights and extended knowledge, download it at here
2834
Abstract
/a> as the most robust model (<a href="https://arxiv.org/abs/1706.06083">Madry et al.</a>). This fact highlights just how far away we are from robust recognition models – even for simple handwritten digits.</p><p id="c3ca">In our <a href="https://arxiv.org/abs/1805.09190">recent paper</a>, we introduce a new concept to classify images robustly. The idea is very simple: if an image is classified as a seven, than it should contain roughly two lines – one shorter, one longer – that touch each other at one end. That’s a generative way to think about digits, which is pretty natural for humans and which allows us to easily spot the signal (the lines) even amidst large amounts of noise and perturbations. Having such a model should make it easy to classify the adversarial examples featured above into the correct class. Learning a generative model of digits (say zeros) is pretty straightforward (using a <a href="https://arxiv.org/abs/1606.05908">Variational Autoencoder</a>) and, in a nutshell, works as follows: we start from a latent space of nuisance variables (which might capture things like thickness or tilt of the digit and are learnt from the data) and generate an image using a neural network. We then show examples of handwritten zeros and train the network to produce similar ones. At the end of training, the network has learnt about the natural variations of handwritten zeros:</p><figure id="9127"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Y6O2le5_-9PLg_n4iWN_6w.png"><figcaption>A generative model of zeros learns the typical variations of handwritten digits (right side).</figcaption></figure><p id="3e0c">We learn such a generative model for each digit. Then, when a new input comes along, we check which digit model can best approximate the new input. This procedure is typically called <i>analysis-by-synthesis</i>, because we <i>analyse</i> the content of the image according to the model that can best <i>synthesise</i> it. Standard feedforward networks, on the other hand, have no feedback mechanisms to check whether the input image really resembles the inferred class:</p><figure id="e38b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*qfe00YnTC58Up5hOmVuC8g.png"><figcaption>Feedforward networks directly go from image to class and have no way to check that the classification makes sense. Our analysis-by-synthesis model checks what image features are present and classifies according to which class makes most sense.</figcaption></figure><p id="f1e5">That’s really the key difference: feedforward networks have no way to check their predictions, you have to trust them. Our analysis-by-synthesis model, on the other hand, looks whether certain image features are really present in the input before jumping to a conclusion.</p><p id="031b">We do not need a pe
Options
rfect generative model for this procedure to work. Our model of handwritten digits is certainly not perfect: look at the blurry edges. Nonetheless, our model can classify hand-written digits with high accuracy (99,0%) and its decisions make a lot of sense to humans. For example, the model will always signal low confidence on noise images, because they don’t look like any of the digits it has seen before. The images closest to noise that the analysis-by-synthesis model still classifies as digits with high confidence make a lot of sense to humans:</p><figure id="5507"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*fjhRFQkEFDMWuwwFv2tEaQ.png"><figcaption>We tried to synthesise unrecognisable images that are still classified as zeros with high confidence by our analysis-by-synthesis model. This is the best we got.</figcaption></figure><p id="b7c5">In the current state-of-the-art model by Madry et al. we found that minimal perturbations of clean digits are often sufficient to derail the classification of the model. Doing the same for our analysis-by-synthesis model yields strikingly different results:</p><figure id="f6b0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*aedBhqczyEb_pd4y9ubzEg.png"><figcaption>Adversarial examples for the analysis-by-synthesis model. Can you guess what the original number was?</figcaption></figure><p id="7e30">Note that the perturbations make a lot of sense to humans and it is sometimes difficult to decide into which class the image should be classified. That’s exactly what we expect to happen for a robust classification model.</p><p id="5452">Our model has several other notable features. For example, the decisions of the analysis-by-synthesis model are much easier to interpret as one can directly see which features sway the model towards a particular decision. In addition, we can even derive some lower bounds of its robustness.</p><p id="7ce5">The analysis-by-synthesis model does not quite match human perception yet and there is still a long way to go (see the full analysis in our <a href="https://arxiv.org/abs/1805.09190">manuscript</a>). Nonetheless, we believe these results are extremely encouraging and we hope that our work will pave the way towards a new class of classification models that are accurate, robust and interpretable. We still have to learn a lot about these new models, least of all how to make inference more efficient and how to scale them to more complex data sets (like CIFAR or ImageNet). We are working hard to answer these questions and are looking forward to sharing more results with you in the future.</p><h2 id="7aaa">Towards the first adversarially robust neural network model on MNIST</h2><p id="284d">Lukas Schott, Jonas Rauber, Matthias Bethge, Wieland Brendel
arXiv:1805.09190</p></article></body>