Abstract

The visual object category reports of artificial neural networks (ANNs) are notoriously sensitive to tiny, adversarial image perturbations. Because human category reports (aka human percepts) are thought to be insensitive to those same small-norm perturbations – and locally stable in general – this argues that ANNs are incomplete scientific models of human visual perception. Consistent with this, we show that when small-norm image perturbations are generated by standard ANN models, human object category percepts are indeed highly stable. However, in this very same "human-presumed-stable" regime, we find that robustified ANNs reliably discover low-norm image perturbations that strongly disrupt human percepts. These previously undetectable human perceptual disruptions are massive in amplitude, approaching the same level of sensitivity seen in robustified ANNs. Further, we show that robustified ANNs support precise perceptual state interventions: they guide the construction of low-norm image perturbations that strongly alter human category percepts toward specific prescribed percepts. In sum, these contemporary models of biological visual processing are now accurate enough to guide strong and precise interventions on human perception.


How robust is human category perception?

Human category perception has some degree of robustness to small changes to a given image: if one changes the content of an image by a very tiny amount, a person's belief about the category content of the image will remain the same. At some point, however, one can change the pixel content of that image enough (e.g. by adding a lot of random noise) so a person will change their belief.

...
Images can be thought of as points in pixel space. Around any given image, there is some "ball" of pixel perturbations where the category percept of that image will remain the same. Shown are some random perturbations of an example image from the ε≤30 ball (under the ℓ2 norm).

But precisely how small can such a human-category-changing pixel perturbation be? How does this size compare to the size needed for artificial neural networks (ANNs)? We sought to answer these questions in this work.

Our approach

Finding the smallest pixel perturbation that can change a category report is not straightforward. The main source of difficulty is that it is basically impossible to check all the possible perturbations for even a single image, because pixel space is high dimensional.

Given an ANN, this difficulty can be overcome if one can rapidly query or has "white box" access to its image-to-category report function: one can simply deploy efficient search algorithms over the set of pixel perturbations.

On the other hand, one typically only has slow and "black box" access to the image-to-category function used by humans. But recent progress in visual neuroscience has established that "robustified" (adversarially-trained) models1 have strong correspondences with human visual systems,2 so they might serve as sufficiently accurate approximations to guide an efficient search for the small perturbations humans are sensitive to.

...

We did two versions of this search process: one where we searched for small perturbations intended to induce human subjects to change their perception in an untargeted way ("DM"), and another where we searched for perturbations intended to change human perception toward a specific target category ("TM").

What we found

Using the approach above, we report that:

In short, we established novel upper bounds on the robustness of human categorical perception. We anticipate that with future (better) models of the human visual system, even smaller perturbations will be found for any given image.

...
Results for our DM experiments are above. The x-axis (logscale) units are the ℓ2 norm of the perturbations, given 224x224x3 images with values ranging from 0-1. The colored curves are evaluations on "gray box" models (i.e. different from the ones which guided the generation of the perturbations).

See for yourself:

Below, you can explore different pixel perturbations. The base images are a random sampling of the Restricted Imagenet subset. Try both modes (DM / TM); different guide models (vanilla, robust, and Gaussian), and different perturbation norms. For context, perturbations with ε≤3 are typical in studies on adversarial attacks, and the median distance between natural images is ε≈130.3

...
...
...

Learn more:


BibTeX (preprint)

@misc{gaziv2023robustified, title={Robustified ANNs Reveal Wormholes Between Human Category Percepts}, author={Guy Gaziv and Michael J. Lee and James J. DiCarlo}, year={2023}, eprint={2308.06887}, archivePrefix={arXiv}, primaryClass={cs.CV} }


Footnotes
  1. Adversarially-trained models
  2. Robust models and neurons in visual cortex; and see Brain-Score
  3. We estimated the median ℓ2 distance between pairs of natural images (drawn from the subset of Imagenet we used in this study) to be ≈130 (max possible ≈388). This is assuming images are 224x224x3 pixels, with values ranging between 0 and 1.