A systematic comparison of deep learning architectures for stellar surface gravity estimation
What attention learns from stellar spectra, where transformers help, and where hybrids still win
SpecViT comes within 0.02 R² of the Fisher Information / CRLB theoretical ceiling at SNR ≈ 4.6. The model extracts nearly all the information that is physically present in the data.
BOSZ-only training achieves σ = 0.098 on APOGEE, beating DESI-360k training (σ = 0.175) by 78%. Clean synthetic spectra provide more useful physical priors than massive noisy catalogs.
A single SpecViT model works across optical (DESI, 710–885 nm) and near-infrared (APOGEE, 1.5–1.7 μm) surveys. One architecture, multiple instruments.
Every star's light encodes its physical properties — gravity, temperature, composition. But extracting a single parameter (like surface gravity log g) from a noisy 4096-pixel spectrum is non-trivial, especially for faint targets at magnitude >20.
Drag the slider to see how the same stellar spectrum degrades as the star gets fainter.
SpecViT applies Vision Transformer architecture to 1D stellar spectra:
Hover over patches to see how the spectrum is divided. Patch colors show attention weight — brighter means higher attention.
Patch embedding:
Self-attention:
Training loss (Huber):
Explore what the model learns to attend to. The deepest layers concentrate attention on the Ca II infrared triplet (λ8498, 8542, 8662 Å) — the strongest surface gravity indicator in this wavelength range.
Each attention head learns to focus on different spectral features. Head 1 specializes in the Ca II infrared triplet, while Head 5 captures positional patterns across the full wavelength range. Hover for details.
Explore SpecViT's performance across multiple axes. Click tabs to switch views.
How do synthetic spectra inform real-world prediction? We compare three strategies:
| Strategy | Training Data | σ (dex) | R² | MAE |
|---|---|---|---|---|
| BOSZ-only (Clean Prior) | 50k BOSZ | 0.098 | 0.956 | 0.071 |
| DESI Direct | 360k DESI | 0.175 | 0.889 | 0.128 |
| Two-Stage | BOSZ + DESI | 0.112 | 0.942 | 0.082 |
Browse 7 example DESI spectra spanning giants, subgiants, and dwarfs. Drag to pan, scroll to zoom.
SpecViT maintains consistent low MAE across stellar evolutionary stages — giants, subgiants, and dwarfs — while LightGBM struggles most with evolved stars where spectral features are subtler.
The Cramér-Rao Lower Bound (CRLB) defines the minimum achievable variance for any unbiased estimator. For a parameter θ estimated from data x with likelihood p(x|θ):
We compute Fisher information from BOSZ synthetic spectra, where the noise model is known exactly. This gives us a theoretical performance ceiling against which we benchmark SpecViT.
SpecViT-S (Small) configuration:
| Model | Params | Train Time | Inference | R² |
|---|---|---|---|---|
| Ridge Regression | 4K | <1 min | <1 ms | 0.507 |
| LightGBM | ~100K | ~2 min | <1 ms | 0.647 |
| SpecViT-S | 7.4M | ~30 min | ~2 ms | 0.731 |