PhD, Carnegie Mellon University Chief Scientist, UnifyID
Let’s begin with a simple introduction into the world of adversarial inputs. These are inputs into a machine learning classifier that have been shrewdly perturbed in such a way that these changes are near damn invisible to the naked eye but can fool the machine learning classifier into predicting either a arbitrary wrong class (Un-targeted) or a specific wrong class (targeted).
There are two defining images that come to my mind when I think of this field at large. The first one is the classic Panda-to-Nematode image from here.
The second one, is this one below that provides a geometrical perspective on where these adversarial inputs actually reside.
Where I work, harnessing adversarial examples in a non-computer vision setting for dataset augmentation (to increase both robustness and generalizatibity) forms a key part of our pipeline. In this regard, we have disseminated a few humble attempts such as Vulnerability of deep learning-based gait biometric recognition to adversarial perturbations, On grey-box adversarial attacks and transfer learning and On Lyapunov exponents and adversarial perturbations.
Recently while dabbling with the idea of using interpolated style transfer to generate mutually adversarial pairs of images, I chanced upon this fuzziness surrounding one of the more fundamental questions of machine learning: What does constitute a true label and how do machine learning companies offering commercial off-the-shelf (OTS) APIs define the same?
1: We describe an experiment that entailed using style transferred images to target mis-classification in the context of a specific popular commercial off-the-shelf (OTS) API (I use the Watson Visual-Recognition- V3 API, version 2016–05–20 API for all the results shown here.)
2: The style transferred images achieved adversarial attack success rates of 97:5 % (195 out of 200).
3: The goal is not to proclaim a new blackbox attack recipe or to berate the commercial API used, but to merely highlight the fuzzing surrounding what constitutes a true label or a true tag. This is one account of the simple observation that while using interpolated style transfer as a method for generating mutually adversarial pairs, the ’raw image’ that is adversarially perturbed is not necessarily a naturally occurring image and is a style-transferred image itself.
4: Pitch the idea of using interpolated style transfer as a recipe of generating mutually adversarial pairs that can be used for model regularization as well as generating challenging co-class images as inputs into training pipelines for Siamese-net like embedding deepnets trained on triplet-loss cost functions.
5: Pitch the idea of using the interpolated weight as the new semantic epsilon in here:
With this prelude in tow, the deep dive now begins.
Let’s start by focusing on the figure below:
What we see is the journey of the image of a cat getting style-transferred into a ‘pattern-style-image’ using the arbitrary image stylization  Magenta project for different interpolation weights monotonically increasing from 0 to 1 (from the left to the right). As seen, with the raw image (interpolation weight (w=0)) or style-transferred images with low interpolation weights (up until interpolation weight w=0.1) as inputs, the commercial OTS classification API has, as expected correctly classified the image as a cat with high confidence scores (0.97 to 0.99). When we increase the interpolation weight slightly to w=0.15, we see a dramatic change in the inferred label landscape. The top guessed classes dramatically change from feline, cat and carnivore to cellophane, moth and invertebrate.
While the two images are virtually indistinguishable for the naked eye and are merely 0.03 apart in terms of the structural similarity distance (which is 1-structural similarity index ) (0.125 apart in terms of the infinity-norm distance), the labels assigned for the two images by the black-box classifier turn out to be wildly different.
Thus, we refer to this pair as constituting a mutually adversarial pair with regards to the black-box classifier and the distance metric used. The local texture based features that the classifier might have learned, has perhaps coaxed it into making erroneous classification, while the image still clearly looks like that of cat. Now emerges a natural query whether the artistically style transferred synthetically generated image (with w=0.1) deserved to be classified as a cat in the first place. This is akin to another related question of what is the normative expected class when the input is a real world figurine rather than an animate being, which brings us to the figure below.
Here, we see the input image ( The image was sourced from here ). We find this specific shopping portal to be an especially good source of such figurine art examples.
literally being that of an artistic cat figurine that results in a high confidence classification of being categorized a cat with high confidence score (0.89).
Specific of the experimentation procedure:
It is indeed legitimate to ask if the cat example discussed above was idiosyncratically chosen. In order to assuage those concerns, we did the following experiment.
The main querying point behind the experiment was as follows:
Is it indeed the case that images that are style transferred with a global low interpolation weight do result in mis-classifications? For this, we extracted 200 randomly chosen cat images from the Kaggle Dogs and Cats dataset. We resized all of them to size 299 x 299 and style transferred each one of them using the same style image extracted from the DTD dataset using the style transfer algorithm detailed in . The figure below showcases this with a specific example.
In order to ensure that the images still looked ‘cat-like’ the interpolation weight was set to a low value of 0.125.
One can sift through all the raw images and the style transferred images as a gif animation here below.
Now, both the raw images and the style transferred images were classified using the Watson Visual Recognition- V3 API, version 2016–05–20 API.
The Accept-Language header string that sets the language of the output class names was set to en.
The owners query array was set to the default option (IBM).
The classifier-ids was set to default that required no training and would Return classes from thousands of general tags. The threshold query parameter that represents the minimum score a class must have to be returned was set to 0.5.
The results are covered in the forthcoming section.
In the figure above, we see the counts of the most probable classes that the API returned. As seen, the top 4 classes that encompassed more than 50% of the test images were crazy quilt, camouflage, mosaic and patchwork.
In the figure below, we see the scores as well as the histogram of scores related to the 200 classification trials.
As seen, we have an overwhelmingly large number of cases where the mis-classifications were made with high confidence scores associated. In the figure below, we see the 5 images that the API classified correctly.
Now, in this figure, we see randomly chosen 6 examples of style transferred images that were classified incorrectly.
Conclusion and Future Work
Due to limitations of API usage for free-tier users, we could not extend the experiment for larger datasets, which is our immediate goal. Besides this, another question that we would like to explore is the choice of the style image. We selected an image for the texture dataset on account of 2 reasons. The first being that a pre-trained style transfer model was readily available. The second reason was based on a hunch that texture, would be in fact be the right aspect of the image to perturb to induce a mis-classification.
As stated in the prelude, our intention is not to proclaim a new black-box attack or to berate the commercial API.
Besides showcasing the potential of looking at style transfer as an adversarial example generating technique, we also wanted to draw attention to the inherent fuzziness that surrounds the definition of what constitutes an image class/category or ‘tags’ in the case of such APIs and what entails an image mis-classification.
The API that we used describes the technology as: Watson Visual Recognition’s category-specific models enable you to analyze images for scenes, objects, faces, colors, foods, and other content. With regards to the specific API documentation, it was stated that upon usage with Pre-trained models (in lieu of a custom trained classifier), the API Returns classes from thousands of general tags.
On the concluding note, we would like to remark that we also ascertained the efficacy of these style-transferred based black-box attacks using the universal adversarial images for different Deep-nets from  as the style image, the results of which we plan to disseminate in the full version of this work.
(This work will be presented at the CV-COPS workshop @ CVPR-2018)
 M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi. Describing textures in the wild. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 3606–3613. IEEE, 2014.
 G. Ghiasi, H. Lee, M. Kudlur, V. Dumoulin, and J. Shlens. Exploring the structure of a real-time, arbitrary neural artistic stylization network. arXiv preprint arXiv:1705.06830, 2017.
 S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations. https://arxiv.org/abs/1610.08401
 Z.Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600– 612, 2004.
Create your free account to unlock your custom reading experience.