FP16 / FP32 / FP8

Floating-point precision formats that determine how many bits represent each number in a model’s weights. FP32 (32-bit) is full precision, FP16 (16-bit) halves memory usage with minimal accuracy loss, and FP8 (8-bit) cuts it further for inference. Lower precision means you can fit larger models in less VRAM and run them faster — modern GPUs like the RTX 5090 have dedicated FP8 tensor cores specifically for this reason.

More Terms

Fine-tuning Flash Attention FP4 GDDR6X / GDDR7 / HBM GGUF

Back to Glossary

FP16 / FP32 / FP8

Related Products

Related Articles

More Terms