Pipeline Parallelism

A multi-GPU strategy that splits a model’s layers across GPUs in sequence — GPU 1 handles layers 1–20, GPU 2 handles layers 21–40, and so on. Each GPU processes a different micro-batch simultaneously, creating a pipeline. This is simpler than tensor parallelism but introduces pipeline “bubbles” (idle time). For consumer multi-GPU setups, pipeline parallelism over PCIe is the most practical approach since it requires less inter-GPU bandwidth than tensor parallelism.

More Terms

Ollama PCIe Pruning QLoRA RAG (Retrieval-Augmented Generation)

Back to Glossary

Pipeline Parallelism

Related Products

More Terms