Optimized models tiering

Optimized models tiering#

For each of the TPU platforms listed below, we present a list of optimized models[^1] [^2] for pre-training. If you’re getting started with MaxText, or want to push performance, we recommend choosing a Gold model, with an accompanying pre-training recipe.

  • Gold Tier: Fully Optimized Models certified to run with maximum efficiency on Cloud TPUs. They are thoroughly refined for the highest possible performance, making them ideal for production-critical workloads requiring peak throughput.

  • Silver Tier: High Performance Models that are well-optimized to deliver high, reliable performance on Cloud TPUs. They are effective for most use cases but may offer opportunities for expert tuning to achieve peak (Gold Tier) performance.

Trillium (v6e)#

Gold#

Model

Recipe

Benchmark Configuration

MFU

Approx tokens/sec/device

Llama 2 70B

Link

256, BF16, SL=4096

43.8%

900

Llama 3.1 8B

Link

256 Chips, BF16, SL=8192

45.46%

7,207

Llama 3.1 70B

Link

256 Chips, BF16, SL=8192

50.33%

960

Silver#

Model

Recipe

Benchmark Configuration

MFU

Approx tokens/sec/device

Llama 3.1 405B

Link

256 Chips, BF16, SL=8192

38.55%

123

Mixtral 8X7B

Link

256 Chips, BF16, SL=4096

35.23%

3,899

Mixtral 8X22B

Link

256 Chips, BF16, SL=4096

36.2%

1,326

v5p#

Gold#

Model

Recipe

Benchmark Configuration

MFU

Approx tokens/sec/device

Llama 2 70B

Link

512 Chips, BF16, SL=4096

65.4%

692

Silver#

Model

Recipe

Benchmark Configuration

MFU

Approx tokens/sec/device

Mixtral 8X7B

Link

256 Chips(8x4x4), bf16, SL=4096

52.56%

2,909

[^1]: Performance results are subject to variations based on system configuration, software versions, and other factors. These benchmarks represent point-in-time measurements under specific conditions. [^2]: Some older TFLOPS/s results are impacted by an updated calculation for causal attention (PR #1988), which halves the attention FLOPs. This change particularly affects configurations with large sequence lengths. For more details, please refer to the performance metrics guide.