FP4

An extremely low-precision 4-bit floating-point format used for aggressive model quantization. FP4 reduces model size by 8x compared to FP32, enabling very large models to fit on consumer hardware at the cost of some accuracy. NVIDIA’s Blackwell architecture (RTX 50-series) adds native FP4 tensor core support, making it a practical option for inference on next-gen GPUs where hardware can compensate for the reduced precision.

Related Products

More Terms