Description

Technical Specifications

Product Model | NVIDIA B300 (Blackwell Ultra / GB300 SXM5)

Manufacturer | NVIDIA Corporation

Product Type | Data Center AI Accelerator Module (SXM5 Form Factor)

Architecture / Process | Blackwell Ultra, TSMC 4NP, Dual-Die Package with NV-HBI (10 TB/s)

CUDA Cores | 20,480 (160 SM × 128 CUDA cores per SM)

Tensor Cores | 640 Fifth-Generation Tensor Cores with 2nd-Gen Transformer Engine

On-Board Memory | 288 GB HBM3e (12 × HBM3e 12-Hi stacks), ECC Protected

Memory Bus Width / Bandwidth | 8192-bit / 8 TB/s

Compute Performance (Dense/Sparse) | FP4: 15 PFLOPS / 30 PFLOPS; FP8: 9 PFLOPS; FP16: ~4.5 PFLOPS; TF32: ~1.1 PFLOPS

Intra-GPU Interconnect | NVLink 5.0 — 1.8 TB/s bidirectional per GPU (full 8-GPU mesh via NVSwitch)

Host Interface | PCIe Gen 6 x16 — 256 GB/s bidirectional

Thermal Design Power (TDP) | Up to 1400 W (requires cold-plate liquid cooling)

Form Factor / Mounting | SXM5 Module — installs on HGX B300 or DGX B300 baseboard only (not a PCIe add-in card)

Main Features and Advantages

Massive HBM3e capacity for large model residency: The defining advantage of the NVIDIA B300 is its 288 GB HBM3e frame buffer—50 % larger than the B200’s 192 GB and more than double the H200’s 141 GB. For LLM inference, this means a 70B-parameter model at FP8 or NVFP4 precision, together with a generous KV-cache for 128K+ token contexts, can reside entirely on a single GPU. Eliminating cross-GPU tensor-parallel sharding for medium-sized models reduces NVLink traffic, cuts first-token latency, and raises batch throughput per rack unit. In training, the enlarged memory permits larger micro-batch sizes or gradient accumulation strategies that would trigger OOM on prior-generation accelerators.

Native FP4 with second-generation Transformer Engine: The NVIDIA B300 introduces production-grade support for NVFP4 (NVIDIA 4-bit floating point) via its fifth-generation Tensor Cores and second-gen Transformer Engine. Dense FP4 throughput reaches approximately 15 PFLOPS per GPU (30 PFLOPS sparse), representing ~1.5–1.7× the FP4 dense performance of the B200 and enabling substantially higher inference tokens-per-watt for post-trained quantized LLMs. Each SM also includes 256 KB of dedicated Tensor Memory (TMEM) that feeds Tensor Cores directly, reducing L2 cache contention during GEMM and multi-head attention kernels.

Doubled attention-layer acceleration: Special Function Units (SFU) for exponential operations used in softmax and scaled dot-product attention have been doubled versus the original Blackwell B200. This yields up to 2× faster attention-layer compute for transformer models—a tangible benefit for long-context inference, retrieval-augmented generation (RAG), and chain-of-thought reasoning pipelines. Framework integration is transparent through cuDNN 9.x+, FlashAttention-v3, and the NVIDIA TensorRT-LLM optimized kernels, which automatically leverage the hardware improvements of the NVIDIA B300.

High-bandwidth scale-out with NVLink 5 and PCIe Gen 6: Each NVIDIA B300 connects to the NVLink switch fabric at 1.8 TB/s bidirectional, enabling an 8-GPU HGX B300 node with 14.4 TB/s aggregate NVLink bandwidth—critical for data-parallel and pipeline-parallel training of trillion-parameter MoE models. On the host side, PCIe Gen 6 support doubles the CPU↔GPU transfer rate compared to PCIe Gen 5, beneficial for CPU-offloaded preprocessing and high-speed checkpoint writes to NVMe-oF storage tiers.

Datacenter-grade reliability and ecosystem alignment: Like its predecessors, the NVIDIA B300 incorporates ECC on all SRAM arrays and HBM, GPU page retirement, and in-band telemetry via NVML and DCGM (Data Center GPU Manager). It is fully supported by NCCL for collective communication, TensorRT-LLM for optimized inference serving, NeMo for LLM fine-tuning, and NVIDIA Run:ai or NVIDIA Dynamo for GPU resource orchestration—ensuring drop-in compatibility with existing MLOps workflows built around the CUDA ecosystem.

Application Field

The NVIDIA B300 is purpose-built for next-generation AI factories and high-end HPC installations where memory capacity, attention-layer throughput, and scale-out bandwidth—rather than raw ALU count alone—are the limiting factors. In large language model inference deployments, particularly those serving 70B to 405B parameter models via FP8 or NVFP4 quantization, the NVIDIA B300 allows a single HGX B300 node (8× NVIDIA B300) to hold a 400B-parameter model with KV-cache, or to serve multiple smaller models concurrently at high batch sizes. This capability directly translates to reduced time-to-first-token (TTFT) and higher tokens-per-second-per-megawatt in production AI serving infrastructures operated by cloud service providers and private enterprise AI platforms.

For frontier-model pre-training and fine-tuning, the NVIDIA B300 is deployed in multi-node DGX B300 clusters or NVIDIA GB300 NVL72 rack-scale systems, where the combination of 288 GB GPU memory, 8 TB/s HBM3e bandwidth, and 1.8 TB/s NVLink 5 per GPU sustains the all-reduce and all-gather collectives required by ZeRO, FSDP, and pipeline parallelism frameworks. Research institutions and national labs also leverage the NVIDIA B300 for scientific computing workloads such as molecular dynamics (GROMACS, AMBER with CUDA support), computational fluid dynamics (using GPU-accelerated solvers), and climate modeling—domains that benefit from the expanded HBM capacity for large grid simulations and from the mature CUDA-X library ecosystem.

In multi-modal AI pipelines spanning text, vision, and audio—where embedding tables and intermediate activations can be memory-intensive—the NVIDIA B300 reduces the need for CPU-side offloading or gradient accumulation tricks that degrade performance. Enterprise customers building on-premise AI infrastructure for regulated industries (finance, healthcare, automotive) select the NVIDIA B300 when their existing H100/H200 clusters are constrained by memory footprint for longer context windows or by FP4 inference throughput for cost-sensitive high-QPS services. The NVIDIA B300 is not intended for desktop workstations, gaming, or cryptocurrency mining; it requires a compatible HGX B300 server chassis with liquid cooling, 48 V or 54 V DC power distribution, and rack-level thermal management planning.

Related Products

NVIDIA B200 (Blackwell SXM5) — Base Blackwell-architecture GPU with 192 GB HBM3e and ~9 PFLOPS dense FP4; the NVIDIA B300 offers 50 % more HBM, ~67 % higher dense FP4 throughput, and 2× attention-layer acceleration, suitable when the B200’s memory is insufficient for larger models.
NVIDIA H200 (Hopper H200 SXM5) — Previous-generation flagship with 141 GB HBM3e; lacks native FP4 support and has lower memory bandwidth; often used as a baseline when evaluating upgrade justification to the NVIDIA B300.
NVIDIA H100 (Hopper H100 SXM5) — 80 GB HBM3, no FP4 support; serves as the legacy baseline for ROI calculations when migrating workloads to the NVIDIA B300.
HGX B300 Baseboard (HGX Blackwell Ultra Platform) — Server baseboard hosting 8× NVIDIA B300 SXM5 modules with NVLink 5 switches, power delivery, and PCIe Gen 6 host connectivity; required for any NVIDIA B300 deployment.
NVIDIA DGX B300 System — Factory-integrated 8-GPU rack server with dual Intel Xeon 6 (Granite Rapids) CPUs, 2.1 TB aggregate HBM, ConnectX-8 VPI networking, and pre-installed NVIDIA AI Enterprise; turnkey solution for deploying the NVIDIA B300 at scale.
NVIDIA GB300 NVL72 (Blackwell Ultra NVL72) — Rack-scale system pairing 72× NVIDIA B300 GPUs (as Grace Blackwell Ultra Superchips) with NVLink spine switches for trillion-parameter model training and massive inference farms; represents the largest scale-out form factor for the same GPU silicon.
NVIDIA ConnectX-8 SuperNIC (800Gb/s) — Recommended network adapter for HGX B300 / DGX B300 nodes to match the NVIDIA B300 cluster’s all-reduce bandwidth requirements in multi-node training.

Installation and Maintenance

Pre-installation preparation: Before installing the NVIDIA B300 into an HGX B300 baseboard, confirm that the server chassis is equipped with a certified cold-plate liquid cooling loop (CDU, quick-disconnect fittings, and flow-rate monitoring) rated for a minimum 1400 W heat load per GPU and ~11–14 kW total for a fully populated 8-GPU node. Verify that the baseboard firmware and BMC are updated to the revision specified in the HGX B300 Hardware User Guide—older firmware may fail to recognize the NVIDIA B300 or incorrectly report TDP limits. Power down the system, relieve liquid cooling pressure, disconnect the cold plate from the existing module (if replacing), and carefully seat the NVIDIA B300 SXM5 module into the gold-finger connector on the HGX baseboard, applying even downward pressure until the retention latches engage. Reattach the cold plate ensuring proper thermal interface material (TIM) coverage per NVIDIA’s torque specification, reconnect coolant lines, and restore power. Upon first boot, use the system BMC or nvidia-smito confirm all eight GPUs are enumerated, the VBIOS/firmware versions match the supported matrix for the NVIDIA B300, and GPU temperature stabilizes within expected idle ranges under liquid cooling.

Maintenance recommendations: The NVIDIA B300 has no user-serviceable internal components; do not remove the heatsink/cold plate except for module replacement. Periodically monitor GPU health via DCGM (temperature, ECC error counts, power draw, NVLink error counters) and review system logs for throttling events indicating insufficient coolant flow or air-intake blockage. If a NVIDIA B300 module reports persistent ECC uncorrectable errors or fails to train the NVLink fabric, follow the HGX B300 service manualfor FRU (Field Replaceable Unit) swap procedures—typically a cold-swap operation after draining coolant pressure from that specific cold plate loop. Retain the original packaging for any NVIDIA B300 module removed from service to prevent electrostatic discharge damage during transport. Because the NVIDIA B300 operates at elevated power and thermal stress, ensure annual inspection of liquid cooling seals, particulate filters, and CDU coolant quality per the rack manufacturer’s maintenance schedule.