MATE (MUSA AI Tensor Engine) is a centralized library for Generative AI workloads on MUSA. It provides high-performance attention and GEMM operators, and compatibility wrappers for CUDA-oriented Python APIs.
- High-performance attention and GEMM operators for MUSA
- Compatibility wrappers for
flash_attnanddeep_gemm - CLI tools for environment checks, configuration inspection, and replay
- CLI documentation: docs/mate_cli.md
- FlashAttention compatibility summary: docs/flash_attention.md
- FlashAttention wrapper: wrappers/flash-attention/README.md
- DeepGEMM wrapper: wrappers/deep_gemm/README.md
| Component | Requirement |
|---|---|
| MUSA Toolkit | 4.3.6 or later |
| TorchMUSA | 2.7 or later |
| Architecture | Pinghu (MP31) |
git clone https://github.com/MooreThreads/mate.git --recursive
cd mate
bash build.sh| Path | Purpose |
|---|---|
mate/ |
Core Python package and public APIs |
wrappers/ |
Compatibility wrapper packages for existing Python ecosystems |
docs/ |
Markdown docs and Sphinx sources |
tests/ |
Correctness and integration tests |
benchmarks/ |
Performance and benchmarking scripts |
MATE provides a command-line interface for configuration, debugging, diagnostics, and replay.
| Command | Purpose |
|---|---|
mate check |
Validate the runtime environment |
mate show-config |
Display installation and runtime configuration |
mate env |
Show relevant environment variables |
mate replay --dir PATH |
Replay API calls from Level 10 dumps |
mate list-dumps PATH |
List recorded dump directories |
Example:
mate check
mate show-config
mate env
mate replay --dir mate_dumps/
mate list-dumps mate_dumps/See docs/mate_cli.md for full CLI documentation.
MATE uses the packages under wrappers/ as a compatibility layer for CUDA-oriented software stacks on MUSA. These wrappers preserve familiar package names and high-level APIs while routing execution to MATE operators and kernels on MUSA, which helps existing integrations migrate with smaller code changes.
| Wrapper | Package | Import Path | Purpose | Documentation |
|---|---|---|---|---|
wrappers/flash-attention |
mate-flash-attention |
flash_attn |
FlashAttention-compatible APIs on top of MATE attention operators on MUSA | wrapper README, compatibility summary |
wrappers/deep_gemm |
mate-deep_gemm |
deep_gemm |
DeepGEMM-compatible APIs on top of MATE GEMM operators on MUSA | wrapper README |
After installing mate, build the Sphinx docs with:
pip install sphinx furo
cd docs
make htmlMATE is inspired by FlashInfer, FlashAttention, cutlass, FlashMLA, and DeepGemm.