Spaces:

huggingface
/

InferenceSupport

Running

nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

#8661

by tjkim02 - opened 11 days ago

React to this comment with an emoji to vote for nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 to be supported by Inference Providers.

(optional) Which providers are you interested in? (Novita, Hyperbolic, Together…)

If anyone wants to test Nemotron on real workloads - Doubleword just made Nemotron 3 Super (120B) FREE during GTC.

Useful for eval pipelines, dataset generation, or large-scale async inference.

You can run it here for free: [https://app.doubleword.ai ]

I've been running Nemotron 3 Super 120B A12B (MoE, 12B active) and wanted to share real serving benchmarks from my POC setup.

Setup: Single node, 16 concurrent agents, 128K context window

Results (POC — production expected 2x+):

The model handles multi-agent orchestration surprisingly well. MoE with 12B active keeps inference efficient while maintaining 120B-level quality.

I'm currently building this into a service and experimenting with flat-rate inference models.

If anyone is working on Nemotron serving or multi-agent workloads, would love to compare notes or share more detailed benchmarks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment