Learning interacting particle systems from unlabeled data

Viska Wei; Fei Lu

Developer API Guide

Everything you need to use the IPS Unlabeled Learning codebase.

Quick Start

From zero to results in 4 steps. The self-test method needs no trajectory labels — just unlabeled snapshots.

Install & Import

git clone https://github.com/ViskaWei/lips_unlabeled_data
cd lips_unlabeled_data && pip install -e .

from core.potentials import HarmonicPotential, GaussianInteraction
from core.sde_simulator import SDESimulator
from lib.basis import get_basis
from lib.solvers import solve_selftest
from lib.eval import evaluate_kde

Generate → Learn → Evaluate

# 1. Simulate particle data
V = HarmonicPotential(k=2.0)
Phi = GaussianInteraction(A=1.0, sigma=0.8)
sim = SDESimulator(V=V, Phi=Phi, sigma=1.0, dt=0.001)
data, t_obs = sim.simulate(N=10, d=2, T=1.0, L=100, M=2000)

# 2. Learn from unlabeled data (no labels needed!)
build_V, build_Phi, K_V, K_Phi, _ = get_basis('oracle', 'model_e')
alpha, beta, info = solve_selftest(
    data, t_obs, sigma=1.0,
    build_V_fn=build_V, build_Phi_fn=build_Phi,
    K_V=K_V, K_Phi=K_Phi, reg='auto'
)

# 3. Evaluate
v_err, phi_err = evaluate_kde('model_e', d=2, alpha=alpha, beta=beta,
                              build_V_fn=build_V, build_Phi_fn=build_Phi)
print("V: %.1f%%, Phi: %.1f%%" % (100 * v_err, 100 * phi_err))

Two Pipelines

Basis Regression

Expand V, Φ in known basis functions. Linear least-squares with a closed-form solve — fast within the chosen basis.

load_data → get_basis → solve_selftest → evaluate_kde

Modules: lib.config, lib.basis, lib.solvers, lib.eval Time: ~seconds

Neural Network

MLP for V, Symmetric MLP for Φ. Gradient descent — flexible, no hand-specified oracle basis.

load_data → create_networks → train_loop(selftest_loss) → evaluate

Modules: core.nn_models, core.selftest_loss Time: ~hours (GPU)

Available Potentials

All potentials implement evaluate(x) and gradient(x).

Class	Formula	Model
`HarmonicPotential(k)`	V(x) = k\|x\|²/2	C, E
`QuadraticConfinement(α1, α2)`	V = α1\|x\|/2 + α2\|x\|²	A
`DoubleWellPotential()`	V = (\|x\|²-1)²/4	B, D
`GaussianInteraction(A, σ)`	Φ(r) = A exp(-r²/2σ²)	E
`PiecewiseInteraction(β1, β2)`	Smoothed indicator Φ	A
`InverseInteraction(γ)`	Φ(r) = γ/(r+1)	B
`LennardJonesPotential(ε, σ)`	4ε[(σ/r)¹² - (σ/r)&sup6;]	C
`MorsePotential(D, a, r0)`	D(1-e^-a(r-r0))²	D

Solver API

`solve_selftest(data, t_obs, sigma, build_V_fn, build_Phi_fn, K_V, K_Phi, reg='auto')`

Trajectory-free learning via the weak form self-test. No labels needed.

data: ndarray (M, L, N, d) — unlabeled snapshot ensemble

t_obs: ndarray (L,) — observation times

sigma: float — diffusion coefficient

reg: 'auto' | float — Tikhonov regularization (auto = Hansen L-curve)

Returns: alpha (K_V,), beta (K_Phi,), info dict

`solve_mle(data_labeled, t_obs, ...)`

Maximum likelihood estimation. Requires labeled trajectory pairs.

`solve_sinkhorn(data_unlabeled, t_obs, ..., eps_factor=0.01)`

Optimal transport label imputation + MLE. Unlabeled input, but degrades at large Δt.

Evaluation

`evaluate_kde(model_name, d, alpha, beta, build_V_fn, build_Phi_fn)`

Compute L²(ρ)-weighted errors against true gradients on a 2000-point grid.

Returns: (V_error, Phi_error) as relative errors in [0, 1]. Multiply by 100 for percentages.

Neural Network Training

Complete NN Pipeline

import torch
from core.nn_models import RadialNet, RadialInteractionNet
from core.selftest_loss import compute_selftest_loss_batch

# Create networks
V_net = RadialNet(hidden_dims=(64, 64, 64), activation='softplus').cuda()
Phi_net = RadialInteractionNet(d=2, hidden_dims=(64, 64, 64)).cuda()

# Train
optimizer = torch.optim.Adam(
    list(V_net.parameters()) + list(Phi_net.parameters()), lr=1e-3
)

for epoch in range(200):
    # Sample batch
    idx = np.random.choice(M, 32)
    X_curr = torch.tensor(data[idx, :-1], device='cuda', dtype=torch.float32)
    X_next = torch.tensor(data[idx, 1:], device='cuda', dtype=torch.float32)

    loss = compute_selftest_loss_batch(V_net, Phi_net, X_curr, X_next, dt, sigma)
    optimizer.zero_grad()
    loss.mean().backward()
    optimizer.step()

Constraints & Gotchas

CRITICAL

Self-test loss converges to a negative value, not zero. L(V*, Φ*) = -½ E[J_diss · Δt] < 0.

CRITICAL

L-curve regularization fails at dt_obs=0.1 (Bakushinskii phenomenon). Use fixed reg=1e-6 instead.

WARNING

NN activation must be C²-smooth (Softplus or Tanh, not ReLU) — second derivatives required for AD-based Laplacians.

WARNING

KDE evaluation only works for radial models (model_a/b/lj/morse/e). Non-radial models (aniso/dipole) use grid evaluation.

INFO

Mean-field energy 1/(2N²) has N(N-1) terms → gradients are O(1), not O(1/N²). This is normalization, not signal suppression.

INFO

Potentials are identifiable only up to additive constants, so equivalent shifted formulas may appear in the paper or visualizations. Reported quantitative errors are gradient errors.