[Feat] FP8 per tensor quant support by Datta0 · Pull Request #4043 · unslothai/unsloth

Datta0 · 2026-02-13T06:55:48Z

Fixes: #3862
We basically use the same code we have for block quant matmul but change the block size to tensor size

for more information, see https://pre-commit.ci

gemini-code-assist · 2026-02-13T06:56:04Z

Summary of Changes

Hello @Datta0, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive support for per-tensor FP8 quantization within the unsloth framework. By enabling the use of a single scaling factor for an entire weight tensor, it simplifies the quantization process for specific models and broadens the applicability of FP8. The changes ensure that the dequantization and linear layer operations correctly interpret and apply these new per-tensor scales, integrating seamlessly with existing block and row quantization methods.

Highlights

Per-tensor FP8 Quantization Support: Implemented support for per-tensor FP8 quantization, allowing a single scale value to be applied across an entire weight matrix for simplified quantization scenarios.
Enhanced Weight Dequantization: Modified the weight_dequant function to correctly handle per-tensor scales, distinguishing them from existing row-quantized and block-quantized scales.
Updated FP8BlockQuantLinear Forward Pass: The FP8BlockQuantLinear forward pass was updated to detect per-tensor scales and expand them into the appropriate block scale format for internal processing, while preserving the original scale for the backward pass.
Refined FP8 Linear Dispatch Logic: Adjusted the fp8_linear dispatch logic to correctly route per-tensor quantized operations through the fp8_block_quant_linear path, ensuring proper handling alongside block-quantized operations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

unsloth/kernels/fp8.py
- Added explicit handling for per-tensor scales in the weight_dequant function.
- Refactored w8a8_block_fp8_matmul_triton to improve argument passing and introduce default block size handling.
- Modified FP8BlockQuantLinear.forward to expand per-tensor scales to a block scale shape and preserve the original scale for the backward pass.
- Updated fp8_linear to correctly dispatch per-tensor and block-quantized FP8 operations based on scale properties.

Activity

The pull request addresses issue [Bug] Unable to train Devstral2 #3862.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for FP8 per-tensor quantization by updating the logic in unsloth/kernels/fp8.py. The changes correctly handle scalar weight scales in weight_dequant and FP8BlockQuantLinear, and route per-tensor quantized operations to the appropriate functions. While the implementation for the forward pass is sound, I've identified a critical issue in the backward pass where the block_size is not being used, potentially leading to incorrect gradients for block-quantized weights with non-default block sizes. I've also noted a minor point of confusion in a comment.

I am having trouble creating individual review comments. Click here to see my feedback.

unsloth/kernels/fp8.py (351-353)

The block_size is no longer saved to the context. While it's true that ctx.block_size was unused in the backward pass, this points to a potential bug. The backward pass calls weight_dequant, which in turn calls weight_dequant_block with a hardcoded default block_size of 128. If a non-default block_size is used in the forward pass (e.g., from weight.block_size), the dequantization in the backward pass will be incorrect, leading to wrong gradients.

To fix this, block_size should be saved to the context and the backward pass should be updated to use it for correct dequantization. This might require changes to weight_dequant and weight_dequant_block to accept and use the block_size.

unsloth/kernels/fp8.py (309-310)

The comment at line 309 is a bit misleading. It states that the original scale is saved before any transformation, but original_weight_scale is updated on line 332 if the scale is transposed. This can be confusing. A more accurate comment would clarify that this variable holds the scale to be used in the backward pass.

        # Save the scale for the backward pass.
        original_weight_scale = weight_scale

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 400591fcf6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

unsloth/kernels/fp8.py

Datta0 and others added 2 commits January 9, 2026 12:49

FP8 per tensor quant support

313a973

[pre-commit.ci] auto fixes from pre-commit.com hooks

0d6d571

for more information, see https://pre-commit.ci

Merge branch 'main' into fp8_per_tensor

400591f

gemini-code-assist bot reviewed Feb 13, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Feb 13, 2026

View reviewed changes

unsloth/kernels/fp8.py Show resolved Hide resolved

unsloth/kernels/fp8.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] FP8 per tensor quant support#4043

[Feat] FP8 per tensor quant support#4043
Datta0 wants to merge 3 commits intounslothai:mainfrom
Datta0:fp8_per_tensor

Datta0 commented Feb 13, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Datta0 commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Feb 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

unsloth/kernels/fp8.py (351-353)

unsloth/kernels/fp8.py (309-310)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Datta0 commented Feb 13, 2026 •

edited

Loading