Optimized models tiering#
For each of the TPU platforms listed below, we present a list of optimized models[^1] [^2] for pre-training. If you’re getting started with MaxText, or want to push performance, we recommend choosing a Gold model, with an accompanying pre-training recipe.
Gold Tier: Fully Optimized Models certified to run with maximum efficiency on Cloud TPUs. They are thoroughly refined for the highest possible performance, making them ideal for production-critical workloads requiring peak throughput.
Silver Tier: High Performance Models that are well-optimized to deliver high, reliable performance on Cloud TPUs. They are effective for most use cases but may offer opportunities for expert tuning to achieve peak (Gold Tier) performance.
Trillium (v6e)#
Gold#
Silver#
v5p#
Gold#
Model |
Recipe |
Benchmark Configuration |
MFU |
Approx tokens/sec/device |
|---|---|---|---|---|
Llama 2 70B |
512 Chips, BF16, SL=4096 |
65.4% |
692 |
Silver#
Model |
Recipe |
Benchmark Configuration |
MFU |
Approx tokens/sec/device |
|---|---|---|---|---|
Mixtral 8X7B |
256 Chips(8x4x4), bf16, SL=4096 |
52.56% |
2,909 |
[^1]: Performance results are subject to variations based on system configuration, software versions, and other factors. These benchmarks represent point-in-time measurements under specific conditions. [^2]: Some older TFLOPS/s results are impacted by an updated calculation for causal attention (PR #1988), which halves the attention FLOPs. This change particularly affects configurations with large sequence lengths. For more details, please refer to the performance metrics guide.