Unified Memory
Apple Silicon’s shared memory pool used by both CPU and GPU, eliminating the need to copy data between separate memory banks. This is why a Mac with 128 GB unified memory can load models that would require a multi-GPU setup on discrete NVIDIA hardware. The trade-off is lower memory bandwidth compared to dedicated VRAM, so token generation is slower per-GB — but the ability to fit huge models in a single machine is a unique advantage.