A curriculum learning framework for transformer models using geometrically-structured datasets (polylipses) driven by emergent κζ (kappa-zeta) dynamics.
ZetaFormer implements an adaptive curriculum that progressively trains models on increasingly complex n-focal geometric distributions. The key innovation is that the geometry of each curriculum level is determined by the emergent κζ ratio from the previous level, creating a discovered learning path rather than a predetermined one.
- Polylipse Dataset: Multi-focal geometric distribution where data points cluster around n foci in d-dimensional space
- κζ (Kappa-Zeta) Ratio: M_τ/M_σ metric measuring anisotropy of learned representations
- M_τ: Radial moment (temporal/directional variance)
- M_σ: Angular moment (spatial/spread variance)
- Adaptive Curriculum: Geometry adapts based on emergent κζ values during training
- CGD Solver: Conjugate Gradient Descent with dual-kernel (Gaussian-Poisson) attention
ZetaFormer/
├── polylipse_dataset.py # Dataset generation with focal configuration
├── adaptive_curriculum_trainer.py # Curriculum training with κζ-driven progression
├── checkpoint_visualization.py # Single checkpoint analysis tools
├── curriculum_visualization.py # Multi-level curriculum analysis tools
├── checkpoint_viz_cli.py # Command-line interface for visualization
├── example_adaptive_polylipse.py # Example training and visualization scripts
├── polylipse_visualization.py # Visualization utilities and CGD solver
└── README.md # This file
# Install dependencies
pip install torch numpy matplotlib
# Clone repository
git clone <repository-url>
cd ZetaFormer# Quick demo (1→3 foci, fast)
python example_adaptive_polylipse.py quick
# Full curriculum (1→5 foci)
python example_adaptive_polylipse.py full
# Custom curriculum (1→N foci)
python example_adaptive_polylipse.py full 10# Visualize curriculum progression
python example_adaptive_polylipse.py viz
# Include CGD decision boundaries
python example_adaptive_polylipse.py viz --cgd
# Visualize specific directory
python example_adaptive_polylipse.py viz ./my_curriculum --cgd# Comprehensive curriculum report
python checkpoint_viz_cli.py curriculum ./polylipse_curriculum_results
# Single checkpoint analysis
python checkpoint_viz_cli.py checkpoint ./level_3_checkpoint.pt
# Compare multiple runs
python checkpoint_viz_cli.py compare run1/ run2/ run3/ --names "Baseline" "High LR" "Strong κ"
# Display checkpoint info
python checkpoint_viz_cli.py info ./polylipse_curriculum_resultsCheckpoints now save comprehensive training state (as of latest version):
checkpoint = {
'model_config': {
'd_model': 32,
'n_heads': 4,
'enable_zeta_norm': True,
'kappa_strength': 0.05
},
'model_state_dict': OrderedDict(...),
'classifier_state_dict': OrderedDict(...),
'optimizer_state_dict': OrderedDict(...),
'metrics': {
'epoch_kappa': [...], # Calibrated κζ per epoch
'epoch_kappa_raw': [...], # Raw κζ per epoch
'epoch_offset': [...],
'epoch_beta': [...],
'epoch_t': [...],
'epoch_loss': [...]
},
'dataset_config': {
'n_foci': 3,
'observed_kappa': 1.245,
'focal_angles': [...], # Angles of foci
'focal_weights': [...], # Weights for each focus
'focal_centers': [...], # 2D coordinates
'M_tau': 1.123,
'M_sigma': 0.902,
'd_model': 32
},
'curriculum_info': {
'level': 3,
'is_stable': True,
'stability_variance': 0.012,
'convergence_rate': 0.0345,
'stabilized_kappa': 1.248
},
'training_config': {...},
'viz_config': {...},
'metadata': {
'version': '2.0',
'timestamp': '2025-01-08T...',
'pytorch_version': '2.1.0'
}
}Legacy checkpoints (pre-enhancement) are automatically supported with graceful fallback.
Analyze individual curriculum levels:
from checkpoint_visualization import CheckpointVisualizer
vis = CheckpointVisualizer("./level_3_checkpoint.pt")
# Generate all visualizations
vis.plot_polylipse_geometry(save_path="geometry.png")
vis.plot_training_history(save_path="history.png")
vis.plot_kappa_evolution(save_path="kappa.png")
vis.plot_cgd_decision_boundary(save_path="cgd.png")Analyze complete curriculum progressions:
from curriculum_visualization import CurriculumVisualizer
cv = CurriculumVisualizer("./polylipse_curriculum_results")
# Curriculum progression grid
cv.plot_curriculum_progression(save_path="progression.png")
# κζ trajectory across levels
cv.plot_kappa_trajectory(save_path="trajectory.png")
# Detailed evolution per level
cv.plot_kappa_evolution_detailed(save_path="evolution.png")
# Training metrics across curriculum
cv.plot_training_metrics(save_path="metrics.png")
# CGD boundaries for all levels
cv.plot_cgd_curriculum(save_dir="./cgd/")
# Generate comprehensive report
cv.generate_curriculum_report("./report/", include_cgd=True)- Level 1 (Isotropic): Train on circular (1-focus) distribution, κζ starts at 1.0
- Stabilization: Monitor κζ convergence with sliding window variance
- Level Transition: When stable, use emergent κζ to configure next dataset
- Geometric Progression: Each level's focal configuration is solved from observed κζ
- Repeat: Continue until max_n_foci reached
For n foci with target κζ, we solve:
M_τ = Σᵢ wᵢ cos²(θᵢ)
M_σ = Σᵢ wᵢ sin²(θᵢ)
κζ = M_τ / M_σ
Subject to:
- Σᵢ wᵢ = 1 (weights sum to 1)
- 0 ≤ θᵢ < 2π (angles)
- wᵢ ≥ 0 (non-negative weights)
This is solved via constrained optimization in solve_focal_config().
A 32-level curriculum run (1→32 foci) reveals striking emergent behavior in the κζ dynamics:
Key observations:
- Phase transition at n≈5-8: κζ peaks at ~1.42 before settling into a lower attractor
- Self-organized stability: The system converges to κζ ≈ 1.15-1.25 for n > 15 foci
- Target-emergent tracking: The bar chart shows how well the emergent κζ matches the target from the previous level
The full training dynamics across ~100,000 epochs reveal:
- Transition spikes: Each curriculum level change (red dashed lines) produces characteristic perturbations
- Rapid recovery: The model quickly adapts to new geometric complexity after each transition
- Oscillatory regime: The smoothed signal (MA-20) shows rhythmic patterns that dampen over time
The focal configurations evolve from simple to complex:
- Early levels (n=1-4): Irregular, asymmetric focal arrangements
- Mid levels (n=5-15): Increasingly regular polygonal structures
- Late levels (n>20): Nearly-circular arrangements with uniform angular spacing
This progression suggests the model discovers that symmetric configurations minimize κζ variance, driving the geometry toward uniform distributions at high focal counts.
A striking finding is the sparse weight allocation during the κζ peak:
| Level 5 (Peak κζ=1.42) | Level 20 (Stable κζ=1.20) |
|---|---|
![]() |
![]() |
| Weights: [0.60, 0.31, 0.00, 0.00, 0.10] | Weights: ~0.05 per focus (uniform) |
| Sparse: Only 3 of 5 foci are active | Dense: All 20 foci contribute equally |
The system achieves high κζ through strategic weight concentration on a subset of foci, then transitions to uniform distributions as complexity increases.
The Conjugate Gradient Descent solver with dual-kernel (Gaussian-Poisson) attention learns decision boundaries that respect the focal geometry. The CGD score field (blue = positive, red = negative) shows how the model partitions space based on learned κζ representations. The black contour marks the decision boundary.
📊 Full 32-Level CGD Progression (click to expand)
| Level 1 | Level 2 | Level 3 | Level 4 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
| Level 5 | Level 6 | Level 7 | Level 8 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
| Level 9 | Level 10 | Level 11 | Level 12 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
| Level 13 | Level 14 | Level 15 | Level 16 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
| Level 17 | Level 18 | Level 19 | Level 20 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
| Level 21 | Level 22 | Level 23 | Level 24 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
| Level 25 | Level 26 | Level 27 | Level 28 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
| Level 29 | Level 30 | Level 31 | Level 32 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Key observations from the CGD progression:
- Early levels (1-4): Simple linear or curved decision boundaries separating 1-4 focal regions
- Peak κζ levels (5-8): Complex, asymmetric boundaries reflecting sparse weight allocation
- Transition levels (9-16): Boundaries become more regular as focal weights equalize
- Stable levels (17-32): Nearly radial partitioning as foci approach uniform circular distribution
| Phenomenon | Description | Curriculum Range |
|---|---|---|
| Phase Transition | κζ peaks then drops to stable attractor | n=5-8 |
| Self-Organization | κζ converges to narrow band (~1.15-1.25) | n>15 |
| Symmetry Breaking | Early levels show asymmetric focal configs | n=2-6 |
| Symmetry Restoration | Late levels approach uniform circular | n>20 |
| Transition Dynamics | Spike-recovery pattern at level boundaries | All |
These results demonstrate that the learning dynamics themselves shape the geometry, creating a feedback loop where emergent κζ values determine future training distributions.
# Train quick curriculum
python example_adaptive_polylipse.py quick
# Visualize results
python example_adaptive_polylipse.py viz ./polylipse_demo
# Generate comprehensive report with CGD
python checkpoint_viz_cli.py report ./polylipse_demo -o ./analysis --cgd# Show how focal configs map to κζ values
python example_adaptive_polylipse.py math# Train multiple configurations
python example_adaptive_polylipse.py quick # Creates ./polylipse_demo
# Compare results
python checkpoint_viz_cli.py compare \
./run1 ./run2 ./run3 \
--names "Baseline" "High LR" "Strong κ" \
-o ./comparisonfrom adaptive_curriculum_trainer import train_with_adaptive_curriculum
model, classifier, curriculum = train_with_adaptive_curriculum(
start_n_foci=1,
max_n_foci=5,
epochs_per_level=100,
stability_window=40,
stability_threshold=0.015,
n_samples=2000,
d_model=32,
n_heads=4,
batch_size=64,
lr=1e-3,
device="cuda",
enable_zeta_norm=True,
kappa_strength=0.05,
save_dir="./results"
)from polylipse_dataset import make_polylipse_dataset, solve_focal_config
# Generate n-focal dataset with specific κζ
X, y, mask, info = make_polylipse_dataset(
n_foci=3,
observed_kappa=1.5,
n_samples=1000,
d_model=32,
return_focal_info=True
)
# Solve focal configuration for target κζ
angles, weights = solve_focal_config(n_foci=3, kappa=1.5)from checkpoint_visualization import visualize_checkpoint
from curriculum_visualization import visualize_curriculum_from_checkpoints
# Quick single checkpoint viz
visualize_checkpoint("./checkpoint.pt", output_dir="./viz")
# Quick curriculum viz
cv = visualize_curriculum_from_checkpoints(
"./curriculum_dir",
include_cgd=True
)After running python checkpoint_viz_cli.py report ./curriculum --cgd:
./curriculum_visualization/
├── curriculum_progression.png # Grid of all levels
├── kappa_trajectory.png # κζ across curriculum
├── kappa_evolution_detailed.png # Per-level κζ evolution
├── training_metrics.png # Loss, κζ, offset, etc.
├── curriculum_summary.json # Statistics and metadata
└── cgd/
├── level_1_cgd.png # CGD boundaries per level
├── level_2_cgd.png
└── level_3_cgd.png
Visualize learned decision boundaries using dual-kernel (Gaussian-Poisson) attention:
from polylipse_visualization import plot_cgd_polylipse
fig, f, info = plot_cgd_polylipse(
n_foci=3,
observed_kappa=1.5,
sigma=0.5, # Gaussian bandwidth (τ-kernel)
t=0.5, # Poisson scale (σ-kernel)
w=0.5, # Mellin mix (0.5 = balanced)
eta=1e-2, # Regularization
save_path="cgd_3focal.png"
)from curriculum_visualization import compare_curriculum_runs
fig = compare_curriculum_runs(
run_dirs=["./baseline", "./experiment1", "./experiment2"],
run_names=["Baseline", "High κ", "Low κ"],
output_dir="./comparison"
)from checkpoint_visualization import load_checkpoint_for_inference
model, classifier, config, dataset_info = load_checkpoint_for_inference(
"./level_3_checkpoint.pt",
device="cuda"
)
# Use model for inference
model.eval()
with torch.no_grad():
output = model(X)The CurriculumMetrics object tracks:
- Per-level history: κζ evolution, loss, stability metrics
- Convergence rates: How quickly each level stabilized
- Focal configurations: Angles, weights, centers for each level
- Transition points: When and why levels changed
Access via:
curriculum.summary() # Print summary
curriculum.get_kappa_trajectory() # [1.0, 1.23, 1.45, ...]
curriculum.levels # List of level dictionaries- Legacy checkpoints: Older checkpoints lack full metadata but are still supported
- High focal counts: Visualizations optimized for n ≤ 10 foci (adaptive sizing for higher)
- CGD computation: Can be slow for large curricula (use
--cgdselectively) - Memory: Loading full curriculum with models can be memory-intensive
- Ensure checkpoint directory contains
level_*_checkpoint.ptfiles - Check that training completed successfully
- Delete incomplete
level_0_checkpoint.ptfiles - These are created but not completed during interrupted training
- Update to latest version with
.squeeze()fixes - Legacy checkpoints may need dataset regeneration
- Reduce
n_samplesparameter (default 800) - Skip CGD for high focal counts (n > 10)
Areas for improvement:
- 3D visualization for higher-dimensional embeddings
- Interactive plots using plotly/bokeh
- Video generation showing curriculum progression over time
- Multi-run aggregation with confidence intervals
- Export to TensorBoard format
@software{zetaformer2025,
title={ZetaFormer: Adaptive Polylipse Curriculum Learning},
author={Noetic Eidos Project},
year={2025},
url={https://github.com/Sarhamam/ZetaFormer}
}- 📊 Experimental results section documenting 32-level curriculum run
- 📈 Emergent structure analysis with phase transitions and self-organization
- 🖼️ Results gallery including progression, evolution, and geometry visualizations
- 📝 Weight sparsity documentation showing sparse→dense transition
- 🗺️ CGD boundary examples demonstrating dual-kernel decision surfaces
- ✨ Enhanced checkpoint format with comprehensive metadata
- ✨ Checkpoint visualization system (
CheckpointVisualizer) - ✨ Curriculum visualization system (
CurriculumVisualizer) - ✨ CLI interface for visualization (
checkpoint_viz_cli.py) - ✨ Legacy checkpoint support with automatic upgrade
- ✨ CGD decision boundary visualization
- ✨ Multi-run comparison tools
- 🐛 Fixed tensor indexing for visualization
- 🐛 Added
project_to_2dfunction for PCA projection - 📝 Comprehensive documentation
- Polylipse dataset generation
- Adaptive curriculum training
- Basic visualization tools




































