Robustness 入门

发表于 2024-05-18 分类于可信人工智能，鲁棒性本文字数： 1.2k

这个笔记记录一些零散的或偏基础的知识

鲁棒性的类别

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations (arxiv.org)

Corruption Robustness

损坏鲁棒性，评估的是对于作用于样本的损坏（corruptions） $C$ ，模型的在这些损坏样本上预测的平均性能。

E_{c \sim C} [P_{(x, y) \sim D} (f (c (x) = y))]

Perturbation Robustness

扰动鲁棒性，评估的是在扰动函数 $ \mathcal{E}$ 对样本的作用下，模型在这些受到干扰的样本上做预测的平均性能。

E_{ε \sim E} [P_{(x, y) \sim D} (f (ε (x)) = f (x))]

Adversarial Perturbtion

对抗鲁棒性，评估的是样本受到加性的、微小的（限度为 $b$ ）、以分类器为导向的扰动 $δ$ （即对抗性扰动）后，模型的预测发生变化的概率。

min_{∥ δ ∥_{p} < b} P_{(x, y) \sim D} (f (x + δ) = y)

常见数据集

ImageNet-A and ImageNet-O

ImageNet-A 数据集包含一些天然的对抗样本。它的结构和 ImageNet 类似，没有经过人为的噪声扰动，但天然地难以被正确分类。在论文的摘要中，常见的视觉分类模型在 ImageNet-A 的准确率只有 2%。

ImageNet-O 包含在 ImageNet-1k 数据集中未包含的类别的图像，这个数据集用于测试模型的超分布检测（out-of-distribution detection）能力，即模型面对从未遇到过的类别的时候，是否能够知道这个类别是超分布的。

Note

下图中，黑色代表图片的真正类别，红色代表 ResNet 预测的类别。在 ImageNet-A 中的类别是模型见到过的，但是预测错误的；在 ImageNet-O 中的类别是模型没见过的，但却笃定地预测的。

论文（CVPR 2021）： Natural Adversarial Examples (arxiv.org)

Github（数据集以及代码）：hendrycks/natural-adv-examples: A Harder ImageNet Test Set (CVPR 2021) (github.com)

ImageNet-C and ImageNet-P

ImageNet-C 的目标是测试模型的 corruption robustness，它对ImageNet 1K添加了一些 15 种图像损坏（模糊、噪声、天气、数字变化），以评估分类器在面对这些损坏时的性能。每种损坏类型有五种严重程度（severity），分别代表不同程度损坏。

ImageNet-P 的目标是测试模型的 perturbation robustness，提供了扰动的序列，即提供多张扰动的样本，每一张扰动样本都是基于其前一时刻的扰动结果的叠加。比如在下图的左侧的倾斜扰动中，每一次倾斜都是从上一次倾斜的结果继续倾斜的。包含动态模糊、缩放模糊、雪、亮度、旋转、倾斜等。

论文链接（ICLR 2019）：Benchmarking Neural Network Robustness to Common Corruptions and Perturbations (arxiv.org)

GitHub：hendrycks/robustness: Corruption and Perturbation Robustness (ICLR 2019) (github.com)

下载：ImageNet-C (zenodo.org)

Integrated gradients

I G_{i} (f, x, b) = (x_{i} - b_{i}) \times \int_{ξ = 0}^{1} \frac{\partial f (b + ξ \times (x - b))}{\partial x_{i}} d ξ

集成梯度（Intergrated gradients），计算的是图像 $x$ 中的像素 $i$ 对 $f$ 的重要程度（或显著性得分，Saliency Scores）。其计算方法是从一个基准图像（一般是全黑），逐步转化为一张输入图像（转化过程中的中间图像序列称为路径，也就是 $ξ \times (x - b)$ 中不同的 $ξ$ 下的取值），计算在转化过程中模型输出相对于输入的梯度，并对路径上的所有梯度求积分得到。

Integrated gradients 拥有一个性质，即“实现不变性”（Implementation invariance），不同于直接计算计算梯度的方法，Intergrated gradients 不会受网络具体结构的影响，而仅仅取决于网络的输入和输出（也可以理解为是网络的功能）。我觉得这应该归因于其是多次从网络中取样的，最终对取样的结果做聚合，而不是用一次输入和输出的梯度来作为结果。

Attention Map 可视化

使用 Attention Map 的可视化来分析损坏样本对模型注意力的影响。这种方法可以借鉴。

ngobahung/Visulization_Attention_Map (github.com)

eigensmooth

References

@inproceedings{dan2019benchmarking,
  author       = {Dan Hendrycks and
                  Thomas G. Dietterich},
  title        = {Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},
  booktitle    = {7th International Conference on Learning Representations, {ICLR} 2019,
                  New Orleans, LA, USA, May 6-9, 2019},
  year         = {2019},
}

@inproceedings{dan2021natural,
  author       = {Dan Hendrycks and
                  Kevin Zhao and
                  Steven Basart and
                  Jacob Steinhardt and
                  Dawn Song},
  title        = {Natural Adversarial Examples},
  booktitle    = {{IEEE} Conference on Computer Vision and Pattern Recognition, {CVPR}
                  2021, virtual, June 19-25, 2021},
  pages        = {15262--15271},
  publisher    = {Computer Vision Foundation / {IEEE}},
  year         = {2021},
}