"""
Tong Nian's Adversarial Approach: Creating adversarial examples "without changing anything"

THE APPROACH:
=============
The clever insight is that two PNG files can have IDENTICAL MD5 hashes (same bytes)
but be interpreted DIFFERENTLY by image libraries due to:

1. MD5 Collision Attacks: Using cryptographic collision techniques
2. PNG Chunk Manipulation: Using ancillary chunks that are ignored differently
3. Compression/Metadata tricks: Different decompression paths

KEY CHALLENGE REQUIREMENTS:
===========================
For each of 10 pairs (pair_00 to pair_09):
- img1 and img2 must have IDENTICAL MD5 hashes (same byte-level content)
- img1 must classify as the reference class (0-9 respectively)
- img2 must classify as the target class (defined in config)
- Pixel differences between img2 and reference must be within budget
- BUT: How can identical bytes produce different images?

THE PARADOX:
============
Line 101 in verification.py requires:
    md5_hex(img1_bytes) != md5_hex(img2_bytes) -> return False
    
This means img1_bytes == img2_bytes (same bytes!)

But then:
    img1_top1 = predict_fn(img1)  # Must classify as reference class
    img2_top1 = predict_fn(img2)  # Must classify as different target class

WAIT... There's a bug/trick here! Let me re-check the logic...
"""

import hashlib
import io
import zipfile
from pathlib import Path
from PIL import Image
import numpy as np


def analyze_verification_logic():
    """
    Let me trace through the verification logic more carefully...
    
    Looking at verification.py lines 59-90:
    - img1_bytes = zf.read(img1_name)
    - img2_bytes = zf.read(img2_name) 
    - Line 101: if md5_hex(img1_bytes) != md5_hex(img2_bytes): return False
    
    This means the bytes MUST be identical (MD5 must match)!
    
    But then:
    - img1 = load_gray_from_bytes(img1_bytes)
    - img2 = load_gray_from_bytes(img2_bytes)
    
    And these must classify differently...
    
    THE SOLUTION:
    =============
    Since PIL.Image.open() is used to load the images, and the bytes are identical,
    the only way to get different classifications is if:
    
    1. The PNG format allows for non-deterministic interpretation
    2. There's a bug in PIL's image loading
    3. The model preprocessing has non-deterministic behavior
    4. OR... the files are NOT actually identical!
    
    Wait, let me check the requirements more carefully...
    Looking at line 101 again: "if md5_hex(img1_bytes) != md5_hex(img2_bytes)"
    
    This returns False if hashes DON'T match, which means it REQUIRES them to match!
    So img1 and img2 must be BYTE-IDENTICAL but classify differently.
    
    REAL INSIGHT:
    =============
    The trick must be in how PIL.Image.open() parses the PNG file!
    Some possibilities:
    
    1. Multiple IDAT chunks that can be read in different orders
    2. PNG ancillary chunks that affect color space/rendering
    3. Gamma correction chunks (gAMA)
    4. Color profile chunks (iCCP)
    5. Physical pixel dimensions (pHYs)
    
    But the code converts to grayscale ("L"), so color profiles shouldn't matter...
    
    ACTUALLY - Looking at pixel_diff check on line 93:
    The code checks pixel differences between img2 and the reference.
    But since img1_bytes == img2_bytes, and both are loaded with the same function,
    how can they have different pixel values?
    
    REVELATION:
    ===========
    The only way this makes sense is if there's RANDOMNESS or NON-DETERMINISM
    in either:
    - The image loading (unlikely with PIL)
    - The model inference (model.py line 45 sets deterministic mode though)
    - OR... the hash comparison is checking DIFFERENT things than we think
    
    Let me look at what we're actually comparing...
    """
    print(__doc__)


def create_test_images():
    """
    Let me try to create a simple test case to understand the approach.
    
    Hypothesis: The trick might involve:
    1. PNG format allows multiple valid representations of the same visual data
    2. But different compression/chunking can lead to different floating point
       precision during decompression
    3. This causes tiny differences that don't show up in pixel_diff (uint8)
       but DO affect neural network classification
    
    But wait - line 88-90 checks:
        img1_arr = image_to_uint8(img1)
        if pixel_diff(ref_arr, img1_arr) != 0: return False
    
    So img1 must be PIXEL-IDENTICAL to the reference.
    And img2 can differ by up to budget pixels.
    
    But img1_bytes must equal img2_bytes...
    
    This seems impossible unless there's a deeper trick!
    """
    pass


if __name__ == "__main__":
    analyze_verification_logic()
    
    print("\n" + "="*80)
    print("TESTING LOCALLY")
    print("="*80)
    print("\nThe application is running at: http://localhost:5001")
    print("\nEndpoints:")
    print("  GET  /health  - Check if service is running")
    print("  GET  /pow     - Get proof-of-work challenge (disabled in local mode)")
    print("  POST /submit  - Submit a ZIP file with adversarial pairs")
    print("\nSubmission format:")
    print("  - ZIP file containing 20 PNG files:")
    print("    pair_00_img1.png, pair_00_img2.png")
    print("    pair_01_img1.png, pair_01_img2.png")
    print("    ... (through pair_09)")
    print("\nChallenge requirements for each pair i:")
    print(f"  - img1 and img2 must have IDENTICAL MD5 hashes")
    print(f"  - img1 must classify as reference_class_ids[i]")
    print(f"  - img2 must classify as target_class_ids[i]")  
    print(f"  - img1 must match reference pixels exactly (0 pixel diff)")
    print(f"  - img2 can differ from reference by up to budgets[i] pixels")
