Autonomous Machine Unlearning | Justin Minseob Seo

Geometric Audit

We verified the algorithm's integrity by transitioning from a theoretical baseline to production deployment:

Baseline (N=1): Achieved surgical precision with an \(R^2 = 1.0\).
Production (N=20): Scaling revealed Batch Interference, diluting precision to \(R^2 \approx 0.16\).

Explore more about the Audit Visuals

The comparative scatterplots mathematically prove the limits of batch unlearning.

The Dots: Each dot represents one of the 48 distinct financial features (like savings balances or employment duration) evaluated by the model.
The X Axis: The actual normalized feature value of the targeted user.
The Y Axis: The magnitude of the gradient weight update applied during unlearning.
The Meaning: In a perfect scenario, the weight update mirrors the feature vector perfectly (forming a straight line). The messy scatterplot on the right proves that processing multiple people at once forces the algorithm to compromise, ruining precision for the individual.

Isolating the optimization dynamics of convex machine unlearning.

The model calculates the probability of credit risk using the sigmoid function.

\[ \hat{y} = \frac{1}{1 + \exp( - (w^T x + b) )} \]

To unlearn one person, we maximize the error between the prediction and true label.

\[ \nabla w = x \cdot ( \hat{y} - y ) \]

Scaling requires averaging competing gradients, diluting precision for any individual.

\[ \nabla w_{batch} = \frac{1}{N} \sum_{i=1}^{N} x_i \cdot ( \hat{y}_i - y_i ) + \lambda w \]

Read more about the Unlearning Algorithms

Our methodology isolated convex optimization dynamics to provide mathematically verifiable guarantees.

Projected Gradient Ascent (PGA): Instead of minimizing error to learn, PGA intentionally maximizes the Binary Cross Entropy loss specifically on a targeted forget batch. This pushes the model weights away from the targeted feature representations.
L2 Regularization: Applied to maintain global model stability during the ascent process, preventing the weights from exploding into infinite values.
Cosine Similarity: We used this metric to verify that the batch update trajectory was still geometrically aligned with the pure single user update. It remained perfectly locked at 0.427, proving zero random drift.

We identified a 34 step Safety Window where the model maintains peak utility.

Our "Blind Metric" autonomously halts unlearning at Step 31, preventing model degradation.

Explore more about the Paradox Graph

This dual axis line graph illustrates how we established an autonomous stopping rule.

Distance to Gold Standard (Oracle): This measures how close our unlearned model is to a perfectly retrained model. It reaches its optimal state at Step 65.
Validation Loss (Blind Metric): This measures the actual utility of the model on retained data. It begins to degrade catastrophically after Step 31.
The Conclusion: Real world systems cannot use Oracle metrics. By monitoring the validation loss, the system halts unlearning right before catastrophic forgetting occurs, ensuring safe deployment.

MIA audits revealed target confidence only dropped from 99.09% to 98.93% at Step 31.

This "Memorization Trap" proves that standard gradient ascent is insufficient for outlier erasure in high confidence models.

Read more about the MIA and Sigmoid Trap

We verified true amnesia using Membership Inference Attacks.

Membership Inference Attacks (MIA): An auditing technique that analyzes the raw probability output of the model. If the model outputs 99% confidence for a specific profile, it exposes that the user was part of the training data.
The Sigmoid Plateau: Because our targeted users were highly memorized outliers, their probabilities were pushed to the extreme flat ends of the logistic sigmoid curve.
The Deadlock: On the flat edges of the curve, the mathematical gradients vanish. The algorithm cannot push these users down to a safe 50% random guessing threshold without blowing past the 34 step safety window and destroying the model.