The RTX 50 Super Refresh Is Delayed Indefinitely — What Local AI Builders Should Buy Instead (2026)
NVIDIA has quietly told board partners the RTX 50 SUPER refresh — the 18GB/24GB cards local AI builders were waiting for — is delayed indefinitely over the GDDR7 shortage. Here's why waiting is now a losing trade, and the exact GPU to buy instead at every budget.
Compute Market Team
Our Top Pick

The verdict, up front: NVIDIA's GeForce RTX 50 SUPER refresh — the cards that would have brought 24GB of VRAM to the RTX 5070 Ti and 5080 — has been delayed indefinitely, and for local AI builders that changes the math entirely. Do not keep waiting. The right move now is to buy the correct card for your VRAM tier today: a used RTX 3090 ($699 – $999) or an RTX 4090 ($1,599 – $1,999) if you need 24GB, an RTX 5090 ($1,999 – $2,199) if you need 32GB, and a new 16GB card only if you were never going to touch 32B-class models anyway.
If you saw the CES-era leaks, decided to "wait for the 24GB Super," and have now been waiting five-plus months with nothing to show for it — this post is the verdict you came for. Here is exactly what happened, why it matters far more to AI builders than to gamers, and the specific SKU to buy at every budget.
Quick Answer: Should You Wait for the RTX 50 Super?
No. NVIDIA's RTX 50 SUPER refresh — which would have brought 24GB cards to the RTX 5070 Ti and 5080 — has been delayed indefinitely due to the GDDR7 memory shortage, so the best 24GB GPU for local AI in 2026 is a used RTX 3090 ($699 – $999) or an RTX 4090 ($1,599 – $1,999) bought today, not a Super card waited for.
Three-line decision summary, keyed to budget:
- Under $1,000 and you want 24GB: buy a used RTX 3090 now. The Super was never going to be cheaper.
- $1,000–$2,000 and you want 24GB done right: buy an RTX 4090. It is the real winner of this delay.
- You need 32GB+ for 70B-class models: buy an RTX 5090, or go Apple Silicon for raw model fit.
What the RTX 50 SUPER Refresh Was Supposed to Be
When the RTX 50 SUPER refresh leaked around CES 2026, it was the most-anticipated hardware on the local-AI calendar. The rumored lineup — and every spec below is leaked/rumored, never officially announced by NVIDIA — looked like this:
| Rumored card | Rumored VRAM | Replaces | Memory tech |
|---|---|---|---|
| RTX 5070 SUPER | 18GB (vs 12GB) | RTX 5070 | Dense 3GB GDDR7 modules |
| RTX 5070 Ti SUPER | 24GB (vs 16GB) | RTX 5070 Ti | Dense 3GB GDDR7 modules |
| RTX 5080 SUPER | 24GB (vs 16GB) | RTX 5080 | Dense 3GB GDDR7 modules |
Source: leaked specifications aggregated by VideoCardz and wccftech, CES-2026 era. NVIDIA never officially confirmed these cards. Treat all figures as rumored.
The headline was a roughly 50% VRAM bump on the same Blackwell silicon, achieved by swapping standard 2GB GDDR7 modules for denser 3GB ones, paired with a modest ~5–10% raw performance gain from slightly higher core counts and clocks. For gamers, that is a minor mid-cycle refresh. For local AI, the 24GB tier on a sub-$1,100 card would have been the single most important value release of the year — which is exactly why its disappearance hurts.
What Actually Happened: "Delayed Indefinitely"
In mid-May 2026, multiple board-partner outlets reported the same thing within days of each other: NVIDIA had quietly informed its AIB (add-in-board) partners that the RTX 50 SUPER refresh is delayed indefinitely. There was no press release, no roadmap update, no public statement — just a backchannel notice to the companies that build the cards.
- TweakTown reported the SUPER series is "delayed indefinitely" as NVIDIA informed its partners.
- VideoCardz corroborated with its own board-partner sourcing, framing the release as "put on hold."
- guru3d independently confirmed the AIB-channel notice that the refresh had slipped.
One important distinction for anyone parsing the headlines: "delayed indefinitely" is not the same as "cancelled." Some roadmap leaks point toward outright cancellation, but NVIDIA has officially confirmed neither. For a buyer, though, the practical effect is identical. As we put it in our companion analysis on whether to wait for the RTX 5090 Ti / Titan: a card with no announced date, no confirmed price, and no confirmed specs cannot be planned around. It is a phantom. You cannot run inference on a rumor.
The purchasing logic is blunt: an indefinite delay with no replacement date is, for buying purposes, a cancellation. The only people who should still be "waiting" are those who don't actually need a GPU yet.
Why It Was Delayed — the GDDR7 Shortage
The cause is not manufacturing trouble with the GPU dies. The Blackwell silicon is fine. The bottleneck is memory.
The entire point of the SUPER refresh was the VRAM bump, and that bump depended on dense 3GB GDDR7 modules. Those exact modules are in ferocious demand from the AI datacenter buildout — the same structural squeeze we documented in depth in our 2026 DRAM shortage analysis. When memory makers can sell every wafer of dense, high-margin memory into datacenter accelerators, allocating it to a mid-cycle consumer refresh becomes a low priority.
TechPowerUp reported directly that the GDDR7 shortage could stop the RTX 50-series SUPER rollout. Two factors make the delay rational from NVIDIA's side:
- The memory is worth more elsewhere. Datacenter demand for dense memory is effectively unlimited at current AI-buildout pace. A consumer SUPER card is the least profitable home for a scarce 3GB GDDR7 module.
- There is no competitive pressure. AMD is not contesting the high end of the consumer market in 2026. With nobody forcing NVIDIA's hand, there is no strategic reason to burn scarce memory to ship a refresh that mostly helps buyers, not margins.
The takeaway: this delay is not a temporary hiccup that resolves next quarter. It is downstream of a structural memory shortage with no clean end date — which is precisely why "just wait" is the wrong instinct.
Why This Matters More for Local AI Than for Gamers
Here is the point no gaming-focused outlet is making. Every TweakTown, VideoCardz, and wccftech writeup frames this as an FPS story. For a gamer, losing the SUPER refresh means losing roughly 5–10% of frame rate — annoying, forgettable, not decisive.
For a local AI builder, the delayed 24GB RTX 5070 Ti SUPER / 5080 SUPER would have been the value sweet spot of the entire year. The reason is the way model sizes map to VRAM. The jump from 16GB to 24GB is not incremental — it crosses a hard threshold:
- 16GB runs 13B–14B models comfortably at Q4 quantization, and squeezes a 27B model in only at tight Q4 with little room for context.
- 24GB unlocks the entire 32B-parameter class with breathing room: Gemma 3 27B (~16GB at Q4), CodeLlama 34B (~20GB at Q4), and Qwen 32B-class models all become genuinely usable.
So when the Super refresh disappears, gamers lose a single-digit FPS percentage. AI builders lose a whole model tier. The card that would have democratized 32B-class local inference at a sub-$1,100 price simply does not exist anymore. That is the framing this post owns, and it is the reason your buying decision can't wait on the gaming-press timeline.
The VRAM Tiers You're Actually Choosing Between Now
With the phantom Super out of the picture, here are the real cards on the table and what each one runs. VRAM-per-model is the only spec that decides which models you can load at all — see our full VRAM guide for the complete breakdown.
| VRAM tier | Cards | What it runs (Q4) | Price range |
|---|---|---|---|
| 12GB | Intel Arc B580 | 7B–8B comfortably (Llama 4 Scout 8B) | $249 – $289 |
| 16GB | RTX 5060 Ti, RTX 4060 Ti, RTX 5080 | 13B–14B comfortably (Phi-4 14B); 27B only at tight Q4 | $399 – $1,099 |
| 24GB | used RTX 3090, RTX 4090 | 32B-class at Q4; 70B at heavy quant | $699 – $1,999 |
| 32GB | RTX 5090 | 70B at Q4 with headroom (DeepSeek R1 70B) | $1,999 – $2,199 |
| Unified memory | Mac Mini M4 Pro (24GB), Mac Studio M4 Max (up to 192GB) | Sidesteps the GDDR7 shortage entirely | $1,399 – $5,999 |
VRAM-per-model figures based on model weight sizes at Q4; actual usable headroom depends on context length and KV cache. Card prices are current MSRP/street ranges and move with the ongoing memory shortage.
Should You Wait? The Decision Framework
Three buyer profiles, one hard decision rule. The rule: if the card you're waiting for has no announced date and prices are rising, waiting is a losing trade. The GDDR7 shortage means GPU prices are drifting up, so every month you wait, the thing you eventually buy costs more — and the thing you're waiting for still doesn't exist.
Profile 1: You need 16GB or less
Buy now. The Super refresh wouldn't have helped you — the 18GB RTX 5070 SUPER was a marginal step, and you were never targeting 32B-class models. A new RTX 5060 Ti 16GB or RTX 4060 Ti 16GB covers the 13B–14B tier today. Compare them directly in our RTX 5060 Ti vs RTX 4060 Ti breakdown.
Profile 2: You want 24GB
Do not wait — this is the profile the delay actually hurts. The 24GB Super has no date, and GDDR7-era pricing means even an eventual launch would not be cheap. Buy a used RTX 3090 or an RTX 4090 today. You get the 24GB tier now, and — critically — a used 24GB card holds resale value, so your downside is capped (more on that below).
Profile 3: You want 32GB or more
Buy an RTX 5090 — the only new consumer card above 24GB — or step to Apple Silicon for raw big-model fit. The Super refresh topped out at a rumored 24GB; it was never going to serve the 70B-class buyer anyway.
Best Buys Right Now, by Budget
Concrete picks, honest caveats, current pricing. For the broader buyer's guide this funnels into, see our best consumer GPU for local LLMs and the AI GPU buying guide hub.
Under $500: Intel Arc B580, or a used RTX 3090
The Intel Arc B580 ($249 – $289) is the budget floor: 12GB of VRAM, a low 150W TDP, and a community-reported ~28 tok/s on Llama 3 8B Q4 (LM Studio Community, needs verification). The caveat is real — Intel's OpenVINO and IPEX stack is less mature than CUDA, so expect occasional friction. See our used RTX 3090 vs RTX 5060 Ti deep-dive for how the budget tiers stack up.
But the smarter sub-$1,000 buy — and the headline value pick of this entire delay — is a used RTX 3090 ($699 – $999). It is the exact card the Super refresh was meant to dethrone, and it is sitting on the shelf right now. 24GB of GDDR6X, ~48 tok/s on Llama 3 8B Q4 and ~9 tok/s on Llama 3 70B Q4 (LM Studio Community, needs verification), drawn from a memory supply chain the GDDR7 shortage barely touches. It runs the entire 32B-class tier today. Best for: maximum VRAM-per-dollar.
$500–$1,000: 16GB new, or 24GB used
If you genuinely never needed 24GB, the RTX 5060 Ti 16GB ($429 – $479) is the best new 16GB card under $500 — Blackwell tensor cores with FP4 support and ~42 tok/s on Llama 3 8B Q4 (LM Studio Community, needs verification). The RTX 4060 Ti 16GB ($399 – $449) is the cheaper Ada alternative. But if 24GB is the goal, a used RTX 3090 still wins this bracket outright — that is the trade-off our RTX 5080 vs RTX 3090 comparison lays out.
$1,000–$1,500: RTX 4090 — the real winner of this delay
The RTX 4090 ($1,599 – $1,999) is the card the delayed Super was supposed to make redundant — and now it isn't. 24GB of GDDR6X, 16,384 CUDA cores, ~62 tok/s on Llama 3 8B Q4 and ~12 tok/s on Llama 3 70B Q4 (LM Studio Community, needs verification), full CUDA support, and a mountain of community documentation. It runs 32B-class models with room to spare and 70B-class models at heavy quant. The alternative new option here is the 16GB RTX 5080 ($999 – $1,099) — current-gen Blackwell, but the same 16GB ceiling a 5080 SUPER would have lifted. The RTX 5080 vs RTX 4090 comparison is the core decision: newer 16GB silicon, or the 24GB model tier. For local AI, 24GB wins.
$2,000+: RTX 5090 — the only new card above 24GB
If you want 70B-class models at Q4 with genuine headroom and refuse to buy used, the RTX 5090 ($1,999 – $2,199) is the answer: 32GB of GDDR7, ~95 tok/s on Llama 3 8B Q4 and ~18 tok/s on Llama 3 70B Q4 (LM Studio Community, needs verification). It is the only new consumer card above 24GB and the only one that comfortably runs Qwen 3 72B at Q4. The catch is a 575W TDP demanding a 1000W+ PSU. For buyers chasing 32GB specifically, our cheapest 32GB GPU guide compares the alternatives.
The Apple Silicon Escape Hatch
Here is the option most GPU-focused coverage ignores entirely: Apple Silicon's unified memory is not made of GDDR7. It uses LPDDR5X from a separate supply chain, so Apple's memory capacity is untouched by the shortage that killed the Super.
The Mac Mini M4 Pro ($1,399 – $1,599) gives you 24GB of unified memory in a silent, palm-sized box — the same effective VRAM tier as the rumored 5070 Ti SUPER, available today. Step up to the Mac Studio M4 Max ($1,999 – $5,999) and you can configure up to 192GB of unified memory — large-model capacity no consumer GPU, Super or otherwise, can touch. Both run local models cleanly via Ollama, MLX, and GGUF-based tooling.
The trade-offs are honest: lower raw tok/s than an equivalent NVIDIA card, and no CUDA, so some training and fine-tuning frameworks won't run. But for pure inference and big-model fit, it sidesteps the entire problem. The Mac Studio M4 Max vs RTX 5090 comparison covers this decision in full, and our local LLM guide has setup walkthroughs.
What If the Super Eventually Launches?
Plan for the realistic scenario, not the hopeful one. Even if NVIDIA revives the SUPER refresh in early 2027, it would launch into the GDDR7 shortage, not after it. That means inflated, memory-shortage-era pricing — not the clean MSRP the CES leaks implied. A "$799" 24GB card that ships at $1,100-plus street is not the deal you were waiting for.
This is where the used-24GB-as-a-hedge argument matters, and it is the genuinely useful financial insight pure-news sites won't give you. When you buy a used RTX 3090 today, you are buying an asset with a real resale market. If the Super somehow launches at a compelling price, you sell the 3090 for close to what you paid and upgrade. Your downside is capped. Meanwhile you had a working 24GB card the entire time instead of an empty PCIe slot.
The principle to carry out of this: don't price a phantom card into a real budget. Buy the hardware that exists, keep the receipts, and treat any future Super launch as an optional upgrade — not a plan.
Bottom Line — the Decision Tree
The RTX 50 SUPER refresh is delayed indefinitely, the GDDR7 shortage that caused it has no clean end date, and prices are rising while you wait. Stop waiting. Here is the decision tree:
- Want 24GB for under $1,000? Buy a used RTX 3090 ($699 – $999) today.
- Want 24GB done right, $1,000–$2,000? Buy an RTX 4090 ($1,599 – $1,999) — the real winner of this delay.
- Only ever needed 16GB? Buy a new RTX 5060 Ti 16GB ($429 – $479) — the Super wouldn't have changed your build.
- Need 32GB+ for 70B-class models? Buy an RTX 5090 ($1,999 – $2,199).
- Want shortage immunity and big-model fit? Go Apple Silicon — Mac Mini M4 Pro ($1,399 – $1,599) or Mac Studio M4 Max ($1,999 – $5,999).
For deeper context on where prices go from here, read our 2026 GPU pricing guide and GPU market trends analysis. If you're sizing system RAM alongside VRAM, our how much RAM for local AI guide is the companion piece, and the best local LLMs for the RTX 50 series covers what the current Blackwell lineup actually runs. Budget-focused builders should also check the AI on a budget hub.
Last updated: May 22, 2026. Delay reporting sourced from TweakTown, VideoCardz, guru3d, and TechPowerUp; rumored SUPER specifications from VideoCardz and wccftech leak aggregation and are not officially confirmed by NVIDIA. Performance figures are community-sourced and marked "needs verification." Prices reflect current MSRP/street ranges and move with the ongoing memory shortage.