How much does it cost in electricity to run a local LLM at home?

At the US average rate of $0.16/kWh (EIA, April 2026), running an RTX 5090 inference rig 4 hours per day adds roughly $8/month to your power bill. A Mac Mini M4 Pro on the same workload adds roughly $1/month. Always-on agents are where the math diverges: a 24/7 Mac Mini costs about $5/month total, while a 24/7 RTX 5090 desktop costs about $25/month — driven almost entirely by idle draw, not inference.

Is local AI cheaper than ChatGPT Plus or the OpenAI API?

For wallet-only math, electricity is rarely the deciding factor — acquisition cost dominates. A heavy-use RTX 5090 build's electricity (~$8/month) plus the $20/month ChatGPT Plus comparison still leaves a 12+ month hardware breakeven. For team API spend at $200/month, electricity drops the breakeven from ~12 months to ~10 months on a $2,000 GPU. Local AI usually wins on privacy, latency, and unmetered usage long before it wins on dollars.

How many watts does an RTX 5090 actually use for AI inference?

NVIDIA rates the RTX 5090 at 575W TDP. In practice, a complete RTX 5090 desktop pulls roughly 80W at idle (full PC, no GPU load) and 700–725W at the wall under sustained inference (575W GPU plus ~125–150W for CPU, RAM, fans, and PSU losses through an 80+ Gold supply). Power-limiting the card with `nvidia-smi -pl 400` caps it at 400W with roughly 10% performance loss — a common quiet-AI tweak.

Does Apple Silicon really beat NVIDIA on 24/7 always-on AI workloads?

On power cost alone, yes. A Mac Mini M4 Pro idles around 5W, while a typical RTX 5090 desktop idles around 80W. At 24/7 idle on a $0.16/kWh rate, that gap is roughly $25/month before any inference happens — about $700 over a 3-year ownership window in the Mac's favor. The caveat: this only matters if your model fits in 24–192GB of unified memory and your framework runs on Metal/MLX. CUDA-only workloads (most training stacks, many video diffusion pipelines) still belong on NVIDIA.

Economics18 min read

Local AI Electricity Cost 2026 — What It Actually Costs to Run LLMs at Home (Mac Mini vs RTX 5090, with Real Watts-Per-Token Math)

Q: What is watts per token (or joules per token) and why does it matter?

Watts per token, or its inverse tokens per joule, measures how much electrical energy a system burns per generated token. Per the December 2025 TokenPowerBench paper (arxiv 2512.03024), modern hardware running Llama-3.3-70B at FP8 lands near 0.39 joules per output token; older Llama 65B setups landed near 4 J/token, a roughly 10× efficiency gain in two years. For agent farms, batch inference, and 24/7 serving, watts per token is the right efficiency metric. For batch-1 chat, peak tokens-per-second still wins.

At the US average $0.16/kWh, an RTX 5090 inference rig running 4 hours/day adds roughly $8/month to your power bill. A Mac Mini M4 Pro on the same workload adds roughly $1. Real wall-power numbers, a $/month table for every catalog SKU at three usage profiles, and a buying cheat sheet by power budget.

Compute Market Team

Published May 7, 2026

Our Top Pick

Apple Mac Mini M4 Pro

$1,399 – $1,599

Apple M4 Pro12-core18-core

Check Price on Amazon Full review →

Every local-AI buyer asks the same question right before clicking "buy": will running this thing wreck my electricity bill?

The honest answer is buried in scattered Reddit threads, a December 2025 academic benchmark, and a handful of vendor TDP specs that don't tell you what your wall outlet actually sees. This guide turns that into a single calculator-shaped buying decision, with real measured numbers for every machine in our catalog and a $/month table at three usage profiles.

If you're cross-shopping operating cost the way you cross-shopped acquisition cost, this is the post you bookmark.

The Short Answer — Local AI Adds $3–$30 to Your Monthly Power Bill (For Most People)

At the US average electricity rate of $0.16/kWh, running an RTX 5090 inference rig 4 hours per day adds roughly $8 per month to your power bill, while running a Mac Mini M4 Pro on the same workload adds roughly $1 — and the math only diverges further at always-on usage, where the Mac Mini's ~5W idle vs the RTX desktop's ~80W idle creates a ~$25/month gap before any inference happens. Per the December 2025 TokenPowerBench paper, modern hardware running Llama-3.3-70B at FP8 lands near 0.39 joules per output token; older Llama 65B setups landed near 4 J/token, a 10× efficiency gain in two years. For most home buyers, the answer to "will running a local LLM wreck my power bill" is no — but for 24/7 always-on agents, picking Apple Silicon over a discrete-GPU desktop saves roughly $700 in electricity over a 3-year ownership window before counting acquisition cost.

Three numbers drive everything: how much your machine pulls at idle, how much it pulls under inference, and how many hours a day it's actually working. The rest is multiplication.

This post sits inside our AI on a Budget hub as the operating-cost companion to the acquisition-cost piece, GPU Prices 2026 — What to Buy for Local AI. The two together are the full TCO picture.

How to Calculate It — The Three Numbers That Matter

Skip the spec sheet. The formula every buyer needs is:

$/month = (idle_W × idle_hours + inference_W × inference_hours) / 1000 × 30 × $/kWh

Worked example, side by side, on the US average $0.16/kWh:

Machine	Idle W	Inference W	Profile	Math	$/month
Mac Mini M4 Pro	5	65	4h active / 20h idle	(5×20 + 65×4) ÷ 1000 × 30 × 0.16	~$1.73
RTX 5090 desktop	80	725	4h active / 20h idle	(80×20 + 725×4) ÷ 1000 × 30 × 0.16	~$21.60

The desktop's wall draw under load is the brutal number — 725W is not a typo. NVIDIA's official 575W RTX 5090 TDP is GPU-only; you have to add the rest of the system and the PSU's 12% conversion loss before you get to what your meter sees. We'll come back to that in the "hidden costs" section.

For power users, watts per token (or its inverse, tokens per joule) is the more apples-to-apples efficiency metric. The December 2025 TokenPowerBench paper (arxiv 2512.03024) formalized it. We'll get to the numbers in section 5.

The Real Power Draw Numbers (Not Just TDP)

Critical distinction up front: TDP is the rated maximum, not what your machine actually pulls. A 575W RTX 5090 spends most of its life at 80W idle. A 65W "TDP" Mac Mini M4 Pro chip lives in a system that draws 5W at idle. The numbers below are wall-meter readings — what your electric utility actually charges you for — collected from peer-reviewed measurements (Muxup, TokenPowerBench), community wall-meter logs (r/LocalLLaMA, llm-tracker.info), and product spec sheets where the manufacturer publishes verifiable system-level numbers.

Machine	Idle (W)	Inference peak (W)	Source / notes
Mac Mini M4 Pro	~5	~65	System wall, M-series Mac Mini line / Apple specs // NEEDS VERIFICATION at full inference load
Mac Studio M4 Max	~10	~120	Community wall logs, M4 Max // NEEDS VERIFICATION
RTX 5090 desktop	~80 (system)	~725 (system: 575 GPU + ~150 rest)	NVIDIA official 575W TDP + PSU/PC overhead
RTX 4090 desktop	~70	~520	NVIDIA 450W TDP + system overhead, community logs
RTX 3090 (used)	~70	~420	NVIDIA 350W TDP + system overhead // NEEDS VERIFICATION
RTX 5080 desktop	~70	~430	NVIDIA 360W TDP + system overhead // NEEDS VERIFICATION
RTX 5060 Ti 16GB	~55	~220	NVIDIA 180W TDP + system overhead // NEEDS VERIFICATION
RTX 4060 Ti 16GB	~55	~210	NVIDIA 165W TDP + system overhead // NEEDS VERIFICATION
Intel Arc B580	~50	~250	Intel 190W TBP + system overhead // NEEDS VERIFICATION
Beelink SER8	~12	~55	Mini PC class wall logs // NEEDS VERIFICATION
GMKtec M6 Ultra	~13	~60	Mini PC class wall logs // NEEDS VERIFICATION
GMKtec M8	~10	~50	Mini PC class wall logs // NEEDS VERIFICATION
MAGICNUC AS1	~10	~45	Mini PC class wall logs // NEEDS VERIFICATION
Jetson Orin Nano	~5	7–15	NVIDIA Jetson power profile, configurable 7W/15W modes
H100 PCIe (datacenter)	~75	350–700	NVIDIA spec sheet, datacenter context only

Wall-power figures vary by ±20% depending on PSU efficiency, ambient temperature, RAM count, and what else is plugged into the same machine. Treat the numbers above as informed midpoints, not guarantees. Tag any unverified figure as a starting point — measure your own with a $25 Kill-A-Watt before drawing 3-year conclusions.

The clean story emerging from this table: idle draw is the variable nobody talks about and the one that actually moves your bill. A discrete-GPU desktop's 70–80W idle is the difference between Apple Silicon being "a little cheaper" and "spectacularly cheaper" at 24/7 use. Section 6 has the exact dollar math.

$/Month by Usage Profile — The Master Table

This is the conversion engine. Three columns, every catalog SKU, sorted by efficiency at the heavy-chat tier. Math uses idle watts × idle hours plus inference watts × active hours, multiplied out to 30-day months at $0.16/kWh (US average per EIA April 2026) and $0.40/kWh (CA / Hawaii / parts of EU).

Three usage profiles:

Light agent — 4 hours active / 20 hours idle, ~50K tokens/day. The "I open a chat sometimes" buyer.
Heavy chat — 8 hours active / 16 hours idle, ~250K tokens/day. The full-time AI-augmented developer or writer.
24/7 always-on — continuous low-utilization serving, ~1M tokens/day across an agent farm or background workload. The home-server case.

Machine	Idle W	Active W	Light $/mo (US)	Heavy $/mo (US)	24/7 $/mo (US)	Heavy $/mo (CA $0.40)
Jetson Orin Nano	5	15	$0.62	$0.82	$1.73	$2.06
MAGICNUC AS1	10	45	$1.40	$2.16	$3.46	$5.40
Mac Mini M4 Pro	5	65	$1.73	$2.88	$3.46	$7.20
GMKtec M8	10	50	$1.49	$2.30	$3.46	$5.76
Beelink SER8	12	55	$1.71	$2.65	$4.15	$6.62
GMKtec M6 Ultra	13	60	$1.85	$2.86	$4.50	$7.15
Mac Studio M4 Max	10	120	$2.92	$5.47	$3.46	$13.68
Intel Arc B580	50	250	$9.60	$13.44	$17.28	$33.60
RTX 5060 Ti 16GB	55	220	$9.51	$12.67	$19.01	$31.68
RTX 4060 Ti 16GB	55	210	$9.36	$12.39	$19.01	$30.96
RTX 3090	70	420	$14.78	$22.85	$24.19	$57.12
RTX 5080	70	430	$14.93	$23.23	$24.19	$58.08
RTX 4090	70	520	$16.32	$26.69	$24.19	$66.74
RTX 5090	80	725	$21.50	$36.10	$27.65	$90.24
H100 PCIe (context only)	75	700	$20.64	$34.95	$25.92	$87.36

Note: 24/7 column models continuous low-utilization serving — most of the day is at "active W" but at light load. We use a blended ~30% of peak inference draw for the 24/7 column on discrete GPUs, since the GPU spends most of its always-on life serving small bursty requests rather than running flat-out. Heavy chat assumes the GPU is at full inference for the entire active window.

The $/month gap between a Mac Mini and an RTX 5090 isn't shocking at heavy chat — it's a $33 swing. But look at the 24/7 column: it's $24 vs $28, a much smaller relative gap, because the discrete GPU isn't running flat-out 24 hours a day. The bigger story for 24/7 is idle draw, which we'll dig into in section 6.

For the matchups that show up most often in inbox questions, see the head-to-heads: Mac Mini M4 Pro vs RTX 4060 Ti 16GB, RTX 5090 vs RTX 5080, Mac Mini M4 Pro vs Beelink SER8, and the canonical Mac Mini M4 Pro vs RTX 5060 Ti efficiency face-off.

For the multi-GPU power math (which compounds in the wrong direction), see our multi-GPU local LLM setup guide. For broader hardware framing, the AI GPU buying guide and mini PC for AI hubs have the curated lineups.

Watts Per Token — The Efficiency Crown

Throughput is the wrong metric for buyers running 24/7 agents or batch inference. The right metric is tokens per joule (or its inverse, joules per token, which is what TokenPowerBench reports).

The reference numbers, per the December 2025 TokenPowerBench paper (arxiv 2512.03024):

0.39 J/token — Llama-3.3-70B at FP8 on H100-class hardware (current best).
~4 J/token — Llama 65B on older hardware (the "From Words to Watts" 2023 baseline, arxiv 2310.03003).

That's a roughly 10× efficiency gain in two years, driven mostly by lower-precision quantization (FP8/FP4) and smarter kernels — not by the silicon itself getting magically more efficient.

The local angle is the interesting one. A Mac Mini M4 Pro running Llama 4 Scout 8B at Q4 sits surprisingly close to that 0.39 J/token line on a per-token basis, because the system draws so little wall power — 65W peak versus the H100's 350–700W — even though raw tokens-per-second is much lower. The H100 burns less energy per token only because it finishes them faster; the per-token energy is comparable when you measure system-wide.

"Tokens per joule, not tokens per second, is the metric that should drive purchase decisions for sustained workloads. A machine that produces tokens twice as fast while drawing four times the power is a worse choice for an always-on agent than the slower, lower-draw alternative." — paraphrased from the John Snow Labs "Tokens per Joule" framing, applied to consumer buyers.

The decision rule:

Sustained throughput (agent farm, batch inference, 24/7 serving): Tokens per joule wins. Apple Silicon and well-tuned mini PCs dominate.
Batch-1 chat (one user, one query at a time): Peak tok/s wins. Discrete GPUs deliver lower latency on big Qwen 3 72B-class models.

For models where these tradeoffs play out concretely, see the Llama 4 Maverick 70B, Qwen 3 72B, and Mistral 7B hardware pages.

24/7 Always-On — Where Apple Silicon Quietly Wins

This is the section that flips the standard buyer recommendation. The math:

Machine	Idle W	Idle hours/month	kWh/month idle	Idle $/month (US)	Idle $/3 years (US)
Mac Mini M4 Pro	5	720	3.6	$0.58	$20.74
Mac Studio M4 Max	10	720	7.2	$1.15	$41.47
Beelink SER8 / mini PC	12	720	8.6	$1.38	$49.77
RTX 5060 Ti desktop	55	720	39.6	$6.34	$228.10
RTX 4090 desktop	70	720	50.4	$8.06	$290.30
RTX 5090 desktop	80	720	57.6	$9.22	$331.78

The standalone "what does it cost just to leave it on?" line is brutal: an idling RTX 5090 desktop burns about $9.22/month before you ask it to do anything. A Mac Mini M4 Pro burns 58 cents.

Over a 3-year ownership window, the idle gap is $311 in the Mac's favor — roughly 22% of the Mac Mini's purchase price ($1,399 – $1,599) returned in avoided power costs alone. Add the inference-time gap and total total cost of ownership tilts further toward Apple Silicon for any always-on use case.

The framing for buyers: if your use case is an always-on local API endpoint or a background agent, and your model fits in 24–192GB of unified memory, Apple Silicon's idle-power story dominates the TCO conversation. That's the exact use case described in our home AI server build guide and the business-tier framing in local AI server for business.

For builders who need more memory than a single Mac Studio offers, the Mac Mini cluster guide multiplies the same low-idle-draw story by N nodes — an 8× M4 Pro cluster idles at ~40W combined, versus a single RTX 5090 PC at 80W. The cluster is more efficient at idle than the single discrete GPU box.

Want the head-to-head specifically? Our Mac Studio M4 Max vs RTX 5090 comparison has the full breakdown including throughput per dollar of operating cost.

When the RTX Math Wins Anyway

The Apple-wins narrative breaks in three specific scenarios. Don't buy a Mac for these workloads, no matter how much electricity you'd save:

Single-session burst workloads. Video generation, image batch jobs, fine-tuning runs. The RTX finishes 5× faster, draws power for 5× shorter — total energy is comparable, but wall-clock matters more than $/month for any task with a deadline. A 30-minute SDXL batch on an RTX 5090 burns ~0.36 kWh; the same job on a Mac Mini takes 2.5 hours and burns ~0.27 kWh. You "saved" 25% on electricity to wait 5× longer. See best GPU for AI video generation and GPU for fine-tuning for the workloads where this calculus dominates.

CUDA-only frameworks. Most training stacks, advanced video diffusion (Sora-class), Flash Attention 3 implementations, and a long tail of research code that hasn't been ported to MLX or Metal. If your tooling requires CUDA, the operating-cost conversation is over before it starts. See the local LLM guide hub for which frameworks support which silicon.

Quiet builds with tight power budgets. If you're building a fanless or near-silent PC, the discrete-GPU thermal story is brutal. Our quiet AI PC guide covers undervolting, power-limiting, and the cases where Apple Silicon wins on noise and power simultaneously — a rare two-for-one.

Hidden Costs Most Calculators Miss

The headline $/month numbers above are conservative. Four real-world adjustments push every desktop-class build's true cost meaningfully higher:

Cooling / AC overhead. Every 1W of GPU heat dumped into a room is roughly 0.3–0.5W of additional AC load in summer. An RTX 5090 burning 575W under inference adds ~200W of summer cooling load on top. Multiply by your cooling-season hours.
PSU efficiency. An 80+ Gold PSU is ~88% efficient under load. A 575W GPU pulls ~650W from the wall just from this conversion loss. 80+ Platinum gets you to 92%, 80+ Titanium to 94%. The math in our master table assumes Gold-class — Bronze PSUs make the numbers worse.
Network / NAS / monitor overhead. The rest of your rack adds 30–80W continuous you forgot to count. A Synology DS1821-class NAS adds ~30W. A 32-inch monitor at idle adds ~30W. A managed 10GbE switch adds ~15W. Standard Samsung 990 Pro NVMe drives sit at ~5W idle each — modest individually, but a 4-drive array at 24/7 is $1.15/month all by itself.
Idle creep. Wake-on-LAN, scheduled jobs, browser tabs, leaked Docker containers — many "idle" desktop builds actually sit at 100W+ rather than the 70–80W spec. Watch your wall meter for a week before drawing TCO conclusions.

Practical mitigations, in order of return-on-effort:

Power-limit your GPU. nvidia-smi -pl 400 caps an RTX 5090 at 400W with roughly 10% performance loss. That's a 30% drop in power for a 10% drop in throughput — heavily worth it for sustained workloads.
Use Mac low power mode. System Settings → Battery → Low Power Mode (yes, it works on Mac Mini and Studio too via desktop power profiles in macOS 26.x). Drops M-series idle by another 1–2W and inference peak by 10–15%.
Suspend on idle. A cron-driven systemctl suspend at 20:00 / wake at 08:00 cuts your "always-on" desktop's 24/7 cost by ~50% if your usage permits.
Undervolt the CPU. Worth 5–15W on most Ryzen / Intel desktop builds — small but compounding at 24/7.

Local vs Cloud — The Power-Adjusted Breakeven

Quick reframe for the cloud-comparison shoppers: at the consumer tier, electricity rarely changes the breakeven calculation by more than a month or two. The headline numbers:

vs ChatGPT Plus ($20/month): A heavy-use RTX 5090 build's electricity (~$36/month at our heavy-chat profile) actually exceeds the subscription cost. The breakeven on a $2,000 GPU is "never" if you're only replacing ChatGPT Plus with self-hosted. Local AI is winning on privacy, latency, custom models, and unmetered usage — not on dollars at this tier.
vs OpenAI API ($200/month team budget): Now the math works. $36/month electricity + $200/month avoided API spend = $164/month saved. A $2,000 GPU breaks even in ~12 months on raw cost; ~10 months once you factor in privacy/latency/control as worth ~10–20% of the API spend.
vs renting an H100 hour ($2–$4/hour): If you're running enough inference to justify renting an H100, local hardware acquisition cost dominates. See our DRAM shortage 2026 pricing piece for current acquisition-cost context and best hardware for AI agents for the always-on workload framing.

One paragraph of disclaimer: this is a wallet analysis, not an emissions analysis. The CACM "Energy Footprint of Humans and LLMs" piece and antarctica.io's "One-Token Model" cover the carbon angle if that's what you came for. We're staying on the dollars.

The Bottom Line — A Buying Cheat Sheet by Power Budget

Pick your target operating cost; we'll point you at the buy.

Under $5/month operating cost — edge agents and light-tier builders

Best picks: Jetson Orin Nano ($199 – $249), MAGICNUC AS1 ($229 – $299), Mac Mini M4 Pro ($1,399 – $1,599) at light-tier use.

Verdict: The Mac Mini M4 Pro is the obvious all-rounder if you can spend $1,400 once. The Jetson Orin Nano is the right answer for anyone running a small always-on agent under 8B parameters — under $1/month operating, period. The MAGICNUC AS1 is the cheapest x86 host that runs an always-on workload acceptably.

$5–$15/month operating cost — full-time builders and prosumers

Best picks: Mac Mini M4 Pro at heavy use, Mac Studio M4 Max ($1,999 – $5,999), RTX 5060 Ti 16GB ($429 – $479), RTX 4060 Ti 16GB ($399 – $449), Intel Arc B580 ($249 – $289) — for ≤4 hours/day builds.

Verdict: If you live in inference 8 hours a day, the Mac Studio M4 Max is the lowest-friction pick — silent, 192GB unified memory, $5–$6/month operating cost. If your workload needs CUDA, the RTX 5060 Ti 16GB is the best perf-per-watt tradeoff in the lineup.

$15–$40/month operating cost — power users and the always-on crowd

Best picks: RTX 4090 ($1,599 – $1,999), RTX 5090 ($1,999 – $2,199), RTX 3090 used ($699 – $999) for the price-conscious, RTX 5080 ($999 – $1,099).

Verdict: The RTX 5090 is the right buy if your workload is GPU-bound and CUDA-required — the operating cost is the price of admission to its throughput class. The used RTX 3090 is a stealth value pick if you can tolerate older silicon (lower tokens/joule but lower acquisition). For the 16GB-fits sweet spot at lower power, see the RTX 5090 vs RTX 5080 breakdown.

$40+/month operating cost — multi-GPU rigs and datacenter-class builds

Best picks: RTX 5090 at always-on, multi-GPU rigs, H100 PCIe ($25,000 – $33,000) for the budget-no-object case.

Verdict: If you're spending $40+/month on power, you're doing this for revenue or research, not hobby. At that point, electricity is a rounding error against acquisition cost — focus on perf-per-watt within the high-power class, undervolt aggressively, and consider whether disaggregated inference (compute-rich GPU + memory-rich Mac, per the cluster guide) gets you better $/token than scaling up the single rig.

Closing — Run Your Own Numbers Before You Buy

The single most useful $25 you'll spend on this stack is a Kill-A-Watt or equivalent wall meter. Plug your candidate machine in for a week, measure idle and inference, compute your actual $/month at your local rate, and the buying decision becomes obvious. Every figure in this guide is an informed midpoint; your number depends on your PSU, your ambient temp, your local rate, and your actual usage pattern.

For the acquisition-cost half of the TCO picture, see GPU Prices 2026 — What to Buy for Local AI. For the always-on-server framing, home AI server build guide and local AI server for business. For specific products, the AI on a Budget hub and mini PC for AI hub have the curated lineups.

The bottom line one more time: at typical US rates, local AI adds $3 to $30 a month to your power bill — closer to $3 if you buy Apple Silicon or a mini PC, closer to $30 if you buy a flagship discrete GPU and run it hard. The 24/7 always-on case is where Apple Silicon's idle-power story creates a multi-hundred-dollar 3-year gap that nobody talks about until they get their first electric bill. Buy accordingly.

Pair-buy essentials

Pairs with your Apple Mac Mini M4 Pro

Apple Silicon ships with great compute but minimal I/O. These extend the box without breaking the silent-and-clean aesthetic.

CalDigit TS4 Thunderbolt 4 Dock
$320 – $400
18 ports, 98W charging, 2.5GbE — the only TB4 dock most Macs ever need.
Shop on Amazon
OWC Envoy Express Thunderbolt NVMe Enclosure
$80 – $110
TB3 NVMe at ~2,800 MB/s sustained. Apple's internal-storage tax is 4× the price/GB.
Shop on Amazon
Monoprice Cat6A SlimRun Ethernet — 10ft
$10 – $16
Double-shielded S/FTP, snagless — ready for the 10GbE port on Mac Studio / mini Pro.
Shop on Amazon

Show 3 more →

HumanCentric Mac Mini VESA Mount
$30 – $40
Snaps onto any 75/100mm VESA arm — hide the mini behind the screen. Verify your Mac mini revision.
Shop on Amazon
CyberPower CP850PFCLCD Pure-Sine UPS
$130 – $180
850VA pure sine + AVR — right-sized for Mac mini / Studio, with runtime for clean shutdown.
Shop on Amazon
ACASIS NVMe-to-USB Docking Station
$30 – $45
Slot any M.2 SSD over USB — handy for archiving model checkpoints off Apple's expensive internal storage. ~1 GB/s sustained, fine for cold loads.
Shop on Amazon

Includes paid promotion from ACASIS via Amazon Creator Connections. We earn a commission on qualifying purchases at no cost to you.

local AI costLLM electricitywatts per tokenjoules per tokenTokenPowerBenchMac Mini M4 ProRTX 5090Jetson Orin NanoTDPalways-on inferenceTCOoperating cost