Non-record: 10L Int6 QAT + SmearGate + SWA (val_bpb=1.1575) by dentity007 · Pull Request #273 · openai/parameter-golf

dentity007 · 2026-03-20T21:38:53Z

Summary

val_bpb = 1.1575 (single seed 1337, self-verified)

Builds on @baudrillardsgh0st's technique stack (PR #194). Contribution: 10-layer configuration that trades one layer for improved step throughput (9,156 steps vs 7,472 at 11L), informed by systematic analysis across 17 experiments.

10 layers, 512 dim, 8 heads, 4 KV heads, 3x MLP
Int6 QAT (STE), per-dim SmearGate, SWA/50, Muon WD=0.038
Sliding window eval stride=64, zstd-22
14.73MB artifact (1.27MB headroom)
9,156 steps at 65ms/step on 8×H100

Key finding

10L outperforms 11L under the 10-minute wall-clock constraint. The faster step time (65ms vs 80ms) yields 22% more training steps, more than compensating for the reduced model capacity.

Submission checklist

val_bpb and submission.json included
Artifact under 16MB (14.73MB)
Wallclock < 600s on 8×H100
Train log included
Reproducible train_gpt.py included
README with detailed explanation

Non-record: 10L Int6 QAT + SmearGate + SWA (val_bpb=1.1575)

d7f459c

dentity007 mentioned this pull request Mar 22, 2026

Non-record: 11L XSA4 + EMA + SDTTT (3-seed mean val_bpb=1.1287) #406

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: 10L Int6 QAT + SmearGate + SWA (val_bpb=1.1575)#273

Non-record: 10L Int6 QAT + SmearGate + SWA (val_bpb=1.1575)#273
dentity007 wants to merge 1 commit intoopenai:mainfrom
NathanMaine:submission/10L-SmearGate-SWA-NathanMaine

dentity007 commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dentity007 commented Mar 20, 2026

Summary

Key finding

Submission checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant