The $4,000 “AI supercomputer” that John Carmack called half-baked just got a second chance. Here’s what actually happened.
Three weeks after unboxing my DGX Spark, I genuinely considered returning it.
Not because the hardware was ugly. NVIDIA nailed the industrial design. The champagne-gold Founder’s Edition chassis with metal foam panels echoes the design of NVIDIA’s DGX A100 and H100 server line. It looked incredible on my desk. Like owning a piece of AI history.
My first real workload, a 30B parameter model for a client prototype, thermal throttled within 20 minutes. The fan screamed. The chassis got hot enough to make me nervous about the wood underneath it. And the inference speed? Let’s just say my M3 Pro was keeping up for the exact same task. A $2,000 laptop matching a $4,000 “supercomputer.”
I wasn’t alone.
The October Massacre
John Carmack (yes, that John Carmack, the guy who built Doom) went public with his DGX Spark experience on X. His findings were brutal:
- Power draw maxing out at 100W instead of the rated 240W
- Roughly half the quoted performance
- The device getting “quite hot” even at reduced power
- Reports of spontaneous rebooting during long runs
That post set off a chain reaction. Tom's Hardware reported that NVIDIA's developer forums were flooding with crash reports and unexpected shutdowns under sustained load. ServeTheHome confirmed they couldn't hit the 240W power ceiling in any workload. AMD's VP of AI Software publicly offered Carmack an alternative. Framework jumped in with a Strix Halo box.
The internet consensus formed fast: the DGX Spark was overpriced, undercooled, and underdelivering. A $4,000 golden paperweight.
I'll be honest. I agreed. The 273 GB/s memory bandwidth sounds impressive on paper, but for chatbot-style inference on models that fit in 36GB, my M3 Pro was generating tokens at comparable speeds. The AMD Strix Halo, at half the price, was benchmarking within spitting distance.
The DGX Spark sat on my shelf for two weeks. I went back to my Mac.
What Everyone (Including Me) Got Wrong
Here's the thing I didn't understand in October: I was evaluating the Spark like a GPU.
It's not a GPU.
The DGX Spark is a capacity play, not a throughput play. It's not trying to generate tokens faster than your RTX 5090. It can't. The memory bandwidth physically won't allow it. What it can do is hold entire models that would crash your 24GB graphics card. A 120B parameter model in NVFP4 fits in memory. Try that on an RTX 5090.
But I only understood this after the January update forced me to rethink everything.
What NVIDIA Actually Changed (CES 2026)
In January, NVIDIA dropped a software update that didn't get nearly enough attention. No new hardware. Just software. Here's what landed:
- Performance: Up to 2.5x improvement on key workloads. The headline number is real, but context matters. Running Qwen-235B with TensorRT-LLM, switching from FP8 to NVFP4 quantization with Eagle3 speculative decoding, throughput more than doubled. Qwen3-30B under CUDA and Stable Diffusion 3.5 Large under llama.cpp saw about 1.4x gains. Fine-tuning improvements were smaller but noticeable.
- The critical nuance The Register caught: these gains are on the compute-intensive parts of the pipeline. Prefill, batch processing, quantized inference. Token generation (the decode phase) is bandwidth-limited. That's not going to change with software. The Spark cannot get much faster at spitting out tokens one at a time. If your only use case is chatbot-style Q&A, this update doesn't fix your problem.
- But if your use case is building things? Different story entirely.
- 30+ playbooks. This is what actually changed my mind. Not a 2.5x number on a slide. Actual, working, step-by-step guides that turn the Spark from "I guess I can run Ollama" into a legitimate AI development platform.
- Brev hybrid routing. Route sensitive tasks (email, proprietary data) to your local Spark while sending general reasoning to cloud frontier models. This is the privacy play that enterprises actually need.
- NVIDIA AI Enterprise support. The Spark is now part of NVIDIA's certified systems program. That's not a marketing checkbox. It means enterprise procurement teams can actually justify buying these.
- February system update: The stuff that actually annoyed people. Hot-plug support for the ConnectX-7 saves up to 18W when you're not using the network adapter. Better monitor compatibility (multi-monitor, non-native resolutions). Bluetooth audio finally works. And for enterprise deployments, you can now disable Wi-Fi and Bluetooth at the UEFI level. None of this is flashy. All of it matters for daily use.
The 4 Playbooks That Made Me Keep It
I went through about 15 of the 30+ playbooks. Here are the four that justified the $4,000 investment.
1. ComfyUI: Image Generation That Actually Uses the Hardware
The ComfyUI playbook gets you from zero to generating images in about 10 minutes. Clone the repo, set up a virtual environment, run the server.
What's different on the Spark: you can load models that would crash a consumer GPU. Qwen Image 20B ran without flinching. FLUX.1-dev with NVFP4 quantization produced results in a fraction of the time my Mac needed.
The real flex: NVIDIA demoed a workflow pairing a MacBook Pro M4 Max with a DGX Spark via ComfyUI. The Mac handles the interface, the Spark does the heavy compute. Their 4K video generation workflow that took 8 minutes on the Mac alone dropped to about 1 minute with the Spark handling it.
(That's the use case nobody talks about. The Spark isn't replacing your Mac. It's augmenting it.)
2. Nemotron-3-Nano: A Local LLM That Doesn't Suck
Nemotron-3-Nano is NVIDIA's 30B parameter mixture-of-experts model optimized for the Spark. It's small enough to run fast on limited bandwidth but smart enough for real work. Code generation, summarization, data extraction.
The playbook sets it up as a local API endpoint. I pointed my development environment at it and started using it for code review on a private codebase. No cloud. No data leaving my network. Response quality that's genuinely competitive with GPT-4 class models for focused tasks.
This is where the ConnectX-7 NIC starts making sense. That $1,500 networking component everyone ignores enables RDMA. You can serve this model to your entire local network at near-wire speed. My laptop, my desktop, a colleague's machine. All hitting the same local LLM without perceptible latency.
3. Connect Two Sparks: The 256GB Play
This is the playbook that made my jaw drop.
Two DGX Sparks connected via the ConnectX-7 NICs create a unified 256GB memory pool over 200 Gbps RDMA. You can run Llama 3.1 405B locally. Full distributed inference across both nodes using NCCL.
(I don't own two Sparks. Yet. But I tested this at a friend's setup. Running a 405B model on your desk, on two small boxes, is genuinely surreal when you remember that this required a server rack eighteen months ago.)
The fine-tuning playbook for dual-Spark is equally impressive: distributed fine-tuning of 70B parameter models using FSDP and LoRA. Across a desk, not a datacenter.
4. RAG + Web Search: Actually Useful Agent Behavior
The RAG playbook sets up retrieval-augmented generation with web search capability. Combined with the Nemotron model, you get a local agent that can search the web, index your documents, and answer questions grounded in real data.
I loaded my project documentation, connected the web search, and had a functional research assistant running entirely on my desk. The 128GB unified memory means the LLM, the embedding model, and the vector store all fit simultaneously without memory pressure.
Is it faster than using Claude or GPT-4? No. Is the data staying on my hardware? Yes. For certain clients and certain projects, that's not a nice-to-have. It's a requirement.
The Honest Limitations (They're Still Real)
I'm not here to pretend the January update fixed everything.
- Memory bandwidth is still 273 GB/s. This is a physics problem, not a software problem. For single-user chatbot inference with large models, token generation tops out around 35-40 tokens per second. Competing systems with higher memory bandwidth hit similar speeds at lower price points. That gap narrows when you factor in CUDA, RDMA, and the 128GB unified pool, but for pure token throughput, the Spark doesn't lead. (And it's not trying to.)
- The thermal design is still aggressive. The 150mm chassis is compact and beautiful, and it still runs hot under sustained load. The February update helps (18W savings when the ConnectX-7 isn't active), but I still keep a small USB fan pointed at mine. It's not elegant. It works.
- The price/performance ratio for pure inference is tough to justify. If chatbot-style Q&A is your only use case, you'll notice that Apple Silicon's higher memory bandwidth (546 GB/s on the Studio) translates to faster token generation. The Spark's value clicks more when you need what bandwidth alone can't give you: CUDA, RDMA networking, containerized enterprise workflows, or 128GB of unified memory for concurrent models.
- DGX OS is Ubuntu-based and still maturing. The CES update included kernel patches and security fixes, and NVIDIA says they're committed to long-term support. The Jetson community has reason to be skeptical here, since some Jetson products stayed on older Ubuntu versions longer than anyone wanted. The real confidence test will be whether DGX OS tracks Ubuntu 26.04 LTS when it releases.
- NVFP4 gains come with precision tradeoffs. The 2.5x improvement assumes you're running maximum quantization. That's fine for many workloads. It's not fine for all of them. Read the footnotes on every benchmark.
Who Should Actually Buy This
After four months of ownership, including two weeks of wanting to return it and two months of genuinely productive use, here's my honest assessment:
Buy it if:
- You need to run 70B-120B+ parameter models locally (no consumer GPU can do this)
- Data privacy requirements prevent cloud inference
- You're building multi-agent systems that need concurrent models in memory
- You want CUDA compatibility (I wanted ROCm to be ready. It wasn't.)
- You plan to pair it with your existing Mac/PC workflow as a compute offload
- You're considering eventually connecting two for a 256GB/405B-capable setup
Don't buy it if:
- Your primary use is chatbot-style single-user inference (higher-bandwidth systems will feel faster)
- You need maximum tokens-per-second on smaller models (a high-end consumer GPU will outrun it)
- You're not comfortable with Linux and terminal-based workflows
- You expect plug-and-play performance without running playbooks
- You're optimizing purely for tokens-per-dollar on inference (competing hardware gets close at lower price points)
The Verdict: A Platform, Not a Product
The DGX Spark is the most misunderstood piece of hardware in AI right now.
When Carmack posted his critique in October, he was right about the thermals, right about the power draw discrepancy, and right to be frustrated. But he was evaluating a sprint car on a drag strip. The Spark wasn't designed to win on tokens-per-second. It was designed to put a DGX-aligned development environment on your desk. The same software stack, the same containerized workflows, the same CUDA ecosystem that runs in NVIDIA's billion-dollar datacenters.
The January update didn't change the hardware. It changed what you can do with it. (And the February update kept refining it. Bluetooth audio, better monitor support, power savings. The boring stuff that makes a device livable.)
Thirty playbooks that actually work. (Actually work. Not "follow these 47 steps and pray.") Hybrid local/cloud routing. Enterprise certification. A model ecosystem (Nemotron, Qwen, FLUX, Llama) that's been specifically optimized for the platform.
Is it perfect? No. The thermals are aggressive, the bandwidth is limited, and $4,000 is a lot of money for a device that doesn't win on raw inference benchmarks.
But four months in, my Spark runs local models for three different projects, generates images for content creation, and serves as the private AI backbone for work I can't put on someone else's servers.
The golden paperweight turned into the most useful machine on my desk. I just had to learn what it was actually for.
If this helped, give it 50 claps. It tells Medium to show it to more people.
Resources mentioned:
- DGX Spark Playbooks (GitHub)
- DGX Spark Documentation
- Build.nvidia.com/spark
- NVIDIA CES 2026 Technical Blog
