Sprint Project: Adversarial Art Attack

Your mission

You have 60 minutes to fool a pre-trained CNN with imperceptible modifications. Turn a banana into a toaster in the model’s eyes while keeping changes invisible to humans. Then present your attack to the class.

The Challenge

Take a correctly classified image and modify it until a pre-trained ImageNet model confidently misclassifies it as something absurd. The art is making changes imperceptible to humans. Use either algorithmic methods (FGSM, PGD) or manual editing (subtle noise, color adjustments).

Kickstarter Code

import torch
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
import numpy as np

# Load pre-trained model
model = models.resnet50(pretrained=True)
model.eval()

# Load and preprocess image
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                       std=[0.229, 0.224, 0.225])
])

img = Image.open('data/original.jpg')
img_tensor = transform(img).unsqueeze(0)

# Get original prediction
output = model(img_tensor)
probs = torch.nn.functional.softmax(output[0], dim=0)
top5_prob, top5_catid = torch.topk(probs, 5)

print("Original predictions:")
for i in range(5):
    print(f"{top5_catid[i].item()}: {top5_prob[i].item():.4f}")

# Now create adversarial example
# Option 1: FGSM
# Option 2: Manual editing with PIL
# ...

The Rules

Time: 60 minutes of work, followed by presentations.

Pre-trained Model: Use ResNet-50, VGG-16, or Inception as-is. No retraining.

Target Misclassification: Choose starting image and target class. Aim for at least 80% confidence on the wrong class.

Minimal Modification: Changes should be barely noticeable to humans upon side-by-side comparison.

Evaluation

Judges evaluate on two criteria:

Invisibility (50%): How hard is it for humans to spot modifications? Best attacks look identical at first glance.

Confidence (50%): How certain is the model about the wrong classification? 95% beats 51%.

Bonus points for absurd or humorous misclassifications.

Deliverables

Your submission should include:

Images: Original and modified versions with side-by-side comparison
Code: Script or notebook implementing the adversarial attack
Model Predictions: Top-5 predictions before and after modification
Report: A brief markdown document explaining:
- Your attack strategy (algorithmic vs. manual)
- How you minimized visibility while maximizing misclassification
- What this reveals about CNN vulnerabilities

Submission

Use the provided template: https://github.com/sk-classroom/sprint-project-template
Follow the template instructions to create your project repository
Place images in the data folder and code in the notebooks folder
Include image comparisons in your report
Write your report in README.md
Submit the link to your GitHub repository to Brightspace