# CTF Challenge Solution Guide: "Nothing Ever Changes"

## 🎯 The Challenge Paradox

Create image pairs where:
- `img1_bytes == img2_bytes` (same MD5 hash)
- BUT `classify(img1) != classify(img2)` (different predictions)

## ✅ Confirmed Facts

After extensive testing:
1. ✅ Model is 100% deterministic
2. ✅ PIL image loading is 100% deterministic  
3. ✅ Same bytes → same image → same classification
4. ✅ No state/caching bugs found
5. ✅ Preprocessing is deterministic

## 💡 The Solution: MD5 Collision Attack

Since everything is deterministic, the ONLY way to solve this is:

**Create two DIFFERENT PNG files that have the SAME MD5 hash**

This is called an **MD5 chosen-prefix collision attack**.

## 🔧 Tools Needed

### HashClash
The most famous tool for MD5 collisions:
```bash
# Clone HashClash
git clone https://github.com/cr-marcstevens/hashclash.git
cd hashclash
# Follow build instructions
```

### Unicoll / FastColl
Simpler collision tools (but less control):
```bash
# These create identical-prefix collisions
# Not as useful for this challenge since we need chosen-prefix
```

## 📋 Attack Strategy

### Step 1: Understand the Goal

For each pair (0-9), create:
- `pair_XX_img1.png` - classifies as reference class
- `pair_XX_img2.png` - classifies as target class
- Both with IDENTICAL MD5 hashes

### Step 2: Create Base Images

Start with the reference images and create adversarial versions:

```python
# For pair 00: need 0→1
# Start with ref_00.png (digit 0)
# Create adversarial version that looks like 1

# The key: add collision blocks that don't affect visual appearance much
```

### Step 3: Use HashClash for Chosen-Prefix Collision

```bash
# Pseudo-command (actual syntax varies):
hashclash --chosen-prefix \
  --prefix1 <PNG header for img classifying as 0> \
  --prefix2 <PNG header for img classifying as 1> \
  --output collision_blocks.bin
```

### Step 4: Embed Collision in PNG

PNG files have structure:
```
PNG Signature (8 bytes)
IHDR chunk (header)
IDAT chunks (image data)  ← Insert collision here or in ancillary chunks
IEND chunk (end)
```

You can:
1. Add collision blocks in ancillary chunks (tEXt, etc.)
2. Modify IDAT chunks while preserving visual appearance
3. Use chunks that PIL ignores

## 🎨 Alternative: PNG Chunk Manipulation

If MD5 collision is too hard, try:

### Approach: Ancillary Chunks

```python
import struct

def add_text_chunk(png_data, text):
    """Add tEXt chunk to PNG (PIL ignores these)"""
    # Find IEND chunk
    # Insert tEXt chunk before it
    # Text chunks are ancillary - don't affect image data
    pass

def create_collision_pngs():
    """
    Theory: Create two PNGs that differ only in ancillary chunks
    But somehow affect PIL's parsing differently
    """
    pass
```

### Known PNG Tricks

1. **gAMA chunk** - Gamma correction (might affect color conversion)
2. **pHYs chunk** - Physical dimensions (shouldn't affect pixels)
3. **tIME chunk** - Timestamp (ignored by most readers)
4. **Multiple IDAT chunks** - Can split image data across chunks

## 🔬 Testing Approach

```python
def test_collision_attempt(img1_path, img2_path):
    """Test if two PNGs satisfy the requirements"""
    
    with open(img1_path, 'rb') as f:
        img1_bytes = f.read()
    with open(img2_path, 'rb') as f:
        img2_bytes = f.read()
    
    # Check MD5
    md5_1 = hashlib.md5(img1_bytes).hexdigest()
    md5_2 = hashlib.md5(img2_bytes).hexdigest()
    
    print(f"MD5 match: {md5_1 == md5_2}")
    
    # Check classification
    img1 = Image.open(io.BytesIO(img1_bytes)).convert('L')
    img2 = Image.open(io.BytesIO(img2_bytes)).convert('L')
    
    pred1 = predict_top1(img1, bundle)
    pred2 = predict_top1(img2, bundle)
    
    print(f"Class 1: {pred1['id']}, Class 2: {pred2['id']}")
    print(f"Different classification: {pred1['id'] != pred2['id']}")
```

## 📚 Resources

### MD5 Collision Papers
- "Chosen-Prefix Collisions for MD5 and Applications" (2009)
- Marc Stevens' PhD thesis on hash collisions
- "The first collision for full SHA-1" (similar techniques)

### CTF Write-ups
Search for:
- "MD5 collision CTF"
- "PNG collision attack"
- "File format collision"

### Tools
- **HashClash**: https://github.com/cr-marcstevens/hashclash
- **Corkami**: PNG structure documentation
- **TweakPNG**: PNG chunk editor

## 🤔 Why This is Hard

1. **Collision Generation**: MD5 collisions take significant computation
2. **Format Constraints**: PNG has strict format requirements
3. **Visual Requirements**: Images must still classify correctly
4. **10 Pairs**: Need to do this 10 times!

## 💭 Alternative Theory: Is There a Bug?

If MD5 collision seems impossible, consider:

### Potential Bugs to Check

1. **Off-by-one in verification?**
2. **ZIP file parsing issues?**
3. **Unicode/encoding in filenames?**
4. **Python bytecode caching?**

### Debug the Server

```bash
# Check server logs
docker logs tong-nian-app

# Test with malformed inputs
# Try edge cases
```

## 🎓 Learning Points

This challenge teaches:
1. **MD5 is broken** - Collisions are practical
2. **File formats are complex** - Many attack surfaces
3. **Determinism matters** - For security AND debugging
4. **Adversarial ML** - Beyond pixel perturbations

## 🚀 Quick Start for MD5 Collision Approach

```bash
# 1. Install HashClash
git clone https://github.com/cr-marcstevens/hashclash.git
cd hashclash && make

# 2. Create two different images
python create_adversarial_pairs.py

# 3. Generate collision blocks
./hashclash_gen collision_config.txt

# 4. Embed in PNGs
python embed_collision_in_png.py

# 5. Test locally
python test_client.py

# 6. Submit to server
curl -X POST http://localhost:5001/submit -F "file=@solution.zip"
```

## 🏁 Final Verdict

**This is an EXTREMELY DIFFICULT CTF challenge** requiring:
- Advanced cryptography knowledge
- PNG format expertise
- Adversarial ML understanding
- Significant computation time

The title "Nothing Ever Changes" is brilliantly ironic - the bytes don't change (same MD5), but the meaning does (different classification)!

Good luck! 🍀🔐🤖
