Learning from Unlabeled Data for IPS

The Problem

What is Unlabeled Data?

Imagine photographing a flock of birds at two different times. Each snapshot shows where every bird is — but you can't tell which bird is which across photos. With N=10 particles, there are 10! = 3,628,800 possible matchings per pair.

Toggle to see what information is lost

Watch particle identities get scrambled between snapshots

Like nameless parcels at a sorting station — you see all packages at 8 AM and 5 PM, but can't tell which package moved where. Traditional methods need this trajectory information. We don't.

The Challenge

Methods Break at Large Time Steps

Drag the slider to see how observation interval affects accuracy. At large time steps, MLE errors explode while our self-test loss remains stable.

Δt 0.01

At Δt = 0.1, Sinkhorn matching accuracy drops to ~24.5% — near the 10% random baseline for N=10 particles. Trajectory reconstruction becomes futile.

Our Insight

Energy Balance: The System's Ledger

Think of the particle system as a bank account. Energy has three flows: dissipation (spending), diffusion (income), and net energy change (balance). For the correct potentials, the books must balance.

V scale (confinement strength) Φ scale (interaction strength)

Why Not Just Square the Residual?

Squaring the residual creates a degenerate minimum at V=0, Φ=0 (the trivial solution). Our self-test loss has a strictly negative minimum at the true potentials, breaking this degeneracy.

The Models

Potential Explorer

We test on four distinct physical models with different confinement and interaction potentials. Click to explore each.

Potential Curves

Live Simulation

Results

Interactive Results Dashboard

Hover over cells to see details. The NN achieves best ∇V in 7/16 settings despite using only unlabeled data.

Key finding: V is universally learnable (NN ∇V < 5% in 15/16 settings), but Φ is the bottleneck — model-dependent and dimension-sensitive.

Regularization

The L-Curve: Automatic Regularization

Solving the basis regression (A + λI)x = b requires choosing the regularization parameter λ. Too small → noise amplified. Too large → solution over-smoothed. The L-curve method finds the optimal λ at the point of maximum curvature on the log-log trade-off curve between solution norm and residual norm.

λ (regularization strength)

0.0100

Noise level

0.50

Condition number (10^x)

10^4.0

The L-curve's characteristic ‘L’ shape emerges from two regimes: the vertical arm (small λ, noise dominates) and the horizontal arm (large λ, regularization bias dominates). The corner is the sweet spot. Try increasing the condition number to see how ill-conditioning sharpens the L-curve.

Deep Dive

The Mathematics

SDE Formulation

Each particle's motion is governed by three forces: confinement V, mean-field interaction Φ, and Brownian noise σ.

Energy Balance via Ito's Formula

Applying Ito's formula to the energy functional gives the balance equation: energy change = diffusion income - dissipation spending.

Self-Test Loss Function

The key innovation: the (1/2) coefficient on the dissipation term means L(V*, Φ*) = -(1/2) E[J_diss Δt] < 0, while L(0,0) = 0. This breaks the trivial-solution degeneracy.

Neural Network Architecture

Both V and Φ are parameterized by 3-layer MLPs with [64, 64, 64] hidden dimensions and Softplus activations (C² smooth for Laplacian computation). The symmetry Φ(x) = Φ(-x) is enforced architecturally. Total: ~17,000 parameters.