Okuqala njengomthombo we-AI ukufinyelela kwe-democratized ngokusebenzisa ama-cloud providers iye yakhululwa kwimvavanyo emangalisayo ye-performance emangalisayo, i-censorship emangalisayo, kanye nemikhiqizo emangalisayo. Ukuze abasebenzisi abavelele we-AI, isixazululo esithathwe ngokuvamile ku-self-hosting. I-cost hidden ye-cloud AI performance Umphakeli we-cloud AI wahlanganyela isampula esithakazelisayo: ukulethwa nge-performance esiyingqayizivele ukukhuthaza abalandeli, bese ukunciphisa ngokushesha izinga lokusebenza. Abasebenzisi we-OpenAI wabhalisela ukuthi i-GPT-4o isibuyekeza ngokushesha kakhulu, kodwa uma i-context kanye nezinqubo zihlanganisa ukuze inikeze impendulo esisheshayo, isixhobo akufanele isetshenziswe. " —Ukhiqizo lapho abaphakeli zihlanganisa izicelo ezininzi abasebenzisi ngokubambisana ngenxa yokusebenza kwe-GPU, okwenza izicelo ezivamile zihlanganisa kuze kube 4x isikhathi eside uma ubukhulu be-batch zihlanganisa. Token batching I-performance degradation ifakwe ngaphezu kwamahhala amancane. I-batching ye-static batching ivimbela zonke izigaba ezisebenzayo ukufinyelela ngokugcwele, okwenza ukuthi isibuyekezo sakho se-rapid ivimbele isizinda se-other's long generation. Futhi i-"batching engapheliyo" ivimbela i-overhead enikeza imibuzo ye-individual. I-cloud providers i-optimize for overall throughput ekhoneni kwebhizinisi lakho - i-comprom-off enezingeni yobuchwepheshe yayo yebhizinisi, kodwa i-user experience. Censorship: lapho ukhuseleko kwangaphakathi Ukubuyekezwa ku-Google Gemini ibheka ukubuyekeza imibuzo angu-10 kusuka ku-20 ama-combat but legitimate - ngaphezulu kuka-competitor. Imibuzo ye-survivors ye-sex assault ibhekwa njenge-"inkonto ebonakalayo." Imibuzo ye-roleplay ebonakalayo ibheka ngokushesha emva kwe-updates. Imibuzo ye-support ye-mental health ibheka amafutha zokhuseleko. I-Anthropic's Claude iye yaba "i-borderline ebonakalayo" ngokuvumelanisa abasebenzisi abalandeli nge-censorship ebonakalayo ebonakalayo. Izinzuzo zendawo I-AI ye-self-hosted ivimbela izixazululo zayo ngokuphelele. Nge-hardware efanelekayo, i-inference yendawo ivimbela i-1900+ i-token / i-second—10-100x i-time-to-first-token okusheshayo kunezinsizakalo ze-cloud. Uyakhazi ukulawula ngokuphelele kwe-model versions, ukunciphisa ama-updates ezingenalutho ezivuthayo ezivuthayo. Akukho ama-filters ye-censorship zihlanganisa impendulo efanelekayo. Akukho ama-rate limits zihlukanise umsebenzi wakho. Akukho ama-surprise bills kusuka ku-usage spikes. Ngaphezu kwama-five, ama-cloud subscriptions zithengise i-$1,200+ Imininingwane ye-hardware: Ukwakha I-AI Powerhouse yakho Ukubala Usayizi Model Futhi Quantization I-key yokuphumelela kwe-self-hosting iyatholakala ekubeni amamodeli ngokuvumelana nezidingo zakho ze-hardware. Izindlela ezivamile ze-quantization zihlanganisa amamodeli ngaphandle kokuphumelela kwekhwalithi ephezulu: I-Quantization ikunciphisa ukucindezeleka kwama-model we-weights kusuka ku-floating-point representation yayo yokuqala kuya kuma-bit ezingaphakeme. Yenziwe njenge-compressing isithombe ye-high-resolution—ukudlulisela amaningi imibala ye-file sizes amancane kakhulu. E-networks ye-neural, lokhu kubalulekile ukugcina ngamaparamitha ngamunye usebenzisa ama-bits amancane, okuyinto ngqo ukunciphisa ukusetshenziswa kwe-memory nokushisa ukucindezeleka. What is Quantization? Ngaphandle kwe-quantization, amamodeli amabhizinisi amancane akuyona abasebenzisi amaningi. Umodeli we-parameter ye-70B ngexabiso ephelele kufuneka i-memory ye-140GB – ngaphandle kwe-GPU amaningi yabasebenzisi. I-quantization ikakhulukazi i-AI ngokwenza amamodeli amakhulu eyenziwe kumadivayisi yomhlaba, okwenza ukwesekwa komhlaba, ukunciphisa izindleko ze-cloud, nokuphucula ijubane le-inference nge-memory access amamodeli ezingcono kakhulu. Why Quantization Matters I-FP16 (Full Precision): Izinga le-model yokuqala, izidingo ze-memory I-8-bit Quantization: ~50% ukunciphisa ikhadi, ubuningi umphumela umgangatho I-Quantization ye-4-bit: Ukunciphisa i-memory ye- ~75%, i-compromise ye-quality enhle I-2-bit Quantization: ~87.5% Ukunciphisa I-memory, ukucindezeleka kwekhwalithi Ngokuba imodeli ye-7B parameter, lokhu kubhalwe ku-14GB (FP16), 7GB (8-bit), 3.5GB (4-bit), noma 1.75GB (2-bit) ye-memory ebonakalayo. I-Open-Source Models ezivamile nezidingo zabo Small Models (1.5B-8B parameters): Qwen3 4B / 8B: Umzila wokuqala we-hybrid thinking modes. I-Qwen3-4B inikeza imiklamo eminingi we-72B ku-programming tasks. Imininingwane we-3-6GB ku-quantization ye-4-bit I-DeepSeek-R1 7B: Izinzuzo ezinhle zokuxhumana, i-RAM ye-4GB ephakeme Mistral Small 3.1 24B: imodeli entsha ye-Apache 2.0 ne-multimodal capabilities, i-context window ye-128K, kanye ne-150 i-token/sec ukusebenza. Isebenza ku-Single RTX 4090 noma 32GB Mac Medium Models (14B-32B parameters): I-GPT-OSS 20B: I-OpenAI ye-model yokuqala esivela kusukela ngo-2019, i-Apache 2.0 isetshenziswe. I-MoE architecture nge-parameter esebenzayo ye-3.6B inikeza ukusebenza kwe-o3-mini. Isebenza ku-RTX 4080 nge-VRAM ye-16GB I-Qwen3 14B/32B: Amamodeli amancane ne-think mode capabilities. I-Qwen3-14B ibonelela ukusebenza kwe-Qwen2.5-32B nangokufanele kakhulu DeepSeek-R1 14B: Optimum ku-RTX 3070 Ti/4070 I-Mistral Small 3.2: I-update yesikhathi esidlulile enikezelwe ngokushesha kanye ne-repeatment eqinile Large Models (70B+ parameters): Llama 3.3 70B: ~35GB ku-quantization ye-4-bit, kufuneka i-dual RTX 4090 noma i-A100 I-DeepSeek-R1 70B: I-VRAM ye-48GB ebonakalayo, eyatholakala nge-2x RTX 4090 I-GPT-OSS 120B: I-OpenAI ye-flagship open model nge-parameter 5.1B esebenzayo ngokusebenzisa i-MoE ye-128-expert. I-o4-mini performance, isebenza kwi-single H100 (80GB) noma i-2-4x RTX 3090s Qwen3-235B-A22B: I-Flagship MoE model nge-parameters ezisebenzayo ye-22B, ibhizinisi ne-o3-mini I-DeepSeek-R1 671B: I-gigant eyenza i-480GB+ i-VRAM noma izakhiwo ezizodwa Imodeli ye-coding ye-specialized: Small Coding Models (1B-7B active parameters): Qwen3-Coder 30B-A3B: I-MoE model nge-3.3B kuphela. I-Native 256K context (1M nge-YaRN) yokusebenza kwe-repository-scale. Isebenza ku-RTX 3060 12GB ku-quantization ye-4-bit I-Qwen3-Coder 30B-A3B-FP8: I-quantization ye-8-bit ye-official enikezela ukusebenza kwe-95% +. Kufuneka i-VRAM ye-15GB, enhle ye-RTX 4070/3080 Unsloth Qwen3-Coder 30B-A3B: Ukubalwa kwe-Dynamic nge-fixed tool-calling. I-Q4_K_M isebenza ku-12GB, i-Q4_K_XL ku-18GB nge-quality engcono Large Coding Models (35B+ active parameters): I-Qwen3-Coder 480B-A35B: I-flagship ye-agent model nge- 35B esebenzayo nge-160-expert MoE. Iphumela i-61.8% ku-SWE-Bench, efana ne-Claude Sonnet 4. I-Requires 8x H200 noma 12x H100 ngokufanelekileyo Qwen3-Coder 480B-A35B-FP8: I-official 8-bit reducing memory kuya ku-250GB. Isebenza ku-4x H100 80GB noma 4x A100 80GB Unsloth Qwen3-Coder 480B-A35B: Q2_K_XL at 276GB isebenza 4x RTX 4090 + 180GB RAM. IQ1_M at 150GB ibhizinisi 2x RTX 4090 + 100GB RAM I-Hardware Configurations by Budget Budget Build (~$2,000): Ubuchwepheshe AMD Ryzen 7 7700X 64GB RAM DDR5-5600 I-PowerColor RX 7900 XT 20GB noma i-RTX 3090 eyenziwe Ukuhlobisa amamodeli angaphezu kuka-14B ngempumelelo Performance Build (~$4,000): I-AMD i-Ryzen 9 7900X 128GB RAM DDR5-5600 I-RTX 4090 ye-24GB Ukusebenza amamodeli 32B ngokushesha, amamodeli amancane 70B nge-offloading Professional Setup (~$8,000): Processor Dual Xeon / EPYC I-RAM ye-56GB 2x RTX 4090 noma RTX A6000 Hlola amamodeli 70B ngesivinini sokukhiqiza Mac Options: I-MacBook M1 Pro I-36GB: I-Excellent for 7B-14B Models, I-Unified Memory Advantage Mac Mini M4 64GB: Ngena ngemvume nge-models 32B I-Mac Studio M3 Ultra 512GB: I-optional ekupheleni-ukushesha i-DeepSeek-R1 671B ngesivinini se-17-18 i-tokens / s ~ $ 10,000 Ukuze amamodeli amakhulu, amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amamodeli amam The AMD EPYC Alternative: Lezi zokusebenza zokusebenza i-DeepSeek-R1 671B ku-3.5-4.25 tokens / s: The $2,000 EPYC Build (Digital Spaceport Setup): CPU: AMD EPYC 7702 (i-64 core) - $650, noma upgrade ku-EPYC 7C13/7V13 - $599-735 I-Motherboard: MZ32-AR0 (16 i-DIMM slot, ukweseka kwama-3200MHz) - $500 Imininingwane: 16x 32GB DDR4-2400 ECC (512GB yonke) - $ 400, noma 16x 64GB for 1TB - $ 800 Storage: 1TB Samsung 980 Pro NVMe - $75 Ukukhishwa: Corsair H170i Elite Capellix XT - $170 PSU: 850W (i-CPU kuphela) noma 1500W (i-future GPU expansion) - $80-150 Case: Rack frame - $55 : ~$2,000 ngoba 512GB, ~$2,500 ngoba 1TB isakhiwo Total Cost Performance Results: I-DeepSeek-R1 671B Q4: I-3.5-4.25 i-token ye-second I-Context Window: I-16K+ isekelwe Power Draw: 60W idle, 260W ibhokisi I-Bandwidth ye-Memory: I-Critical—I-DDR4-3200 ephakeme kakhulu Ukulungiselela lokhu kubonisa ukuthi amamodeli amakhulu angasebenza ngempumelelo ku-CPU-only systems, okwenza i-Frontier AI engatholakali ngaphandle kwezidingo ze-GPU. I-double-socket capability kanye ne-memory support enikeza i-EPYC enhle amamodeli ezingu-VRAM ye-GPU. Umthombo: I-Digital Spaceport - Indlela yokusebenza kwe-Deepseek R1 671b ngokugcwele ngendawo ku-EPYC Server ye-2000 $ I-Digital Spaceport - Indlela yokusebenza kwe-Deepseek R1 671b ngokugcwele ngendawo ku-EPYC Server ye-2000 $ Imininingwane Software: ukusuka ukufakwa kuya ukukhiqizwa Ollama: I-Foundation Ollama has become the de facto standard for local model deployment, offering simplicity without sacrificing power. Installation: # Linux/macOS curl -fsSL https://ollama.com/install.sh | sh # Windows: Download installer from ollama.com/download Essential Configuration: # Optimize for performance export OLLAMA_HOST="0.0.0.0:11434" # Enable network access export OLLAMA_MAX_LOADED_MODELS=3 # Concurrent models export OLLAMA_NUM_PARALLEL=4 # Parallel requests export OLLAMA_FLASH_ATTENTION=1 # Enable optimizations export OLLAMA_KV_CACHE_TYPE="q8_0" # Quantized cache # Download models ollama pull qwen3:4b ollama pull qwen3:8b ollama pull mistral-small3.1 ollama pull deepseek-r1:7b Ukuze i-multi-GPU settings, isebenze ama-Ollama eyahlukile: Running Multiple Instances: # GPU 1 CUDA_VISIBLE_DEVICES=0 OLLAMA_HOST="0.0.0.0:11434" ollama serve # GPU 2 CUDA_VISIBLE_DEVICES=1 OLLAMA_HOST="0.0.0.0:11435" ollama serve I-Exo.labs: I-Distributed Inference Magic I-Exo.labs ivumela ukuletha amamodeli amakhulu kumadivayisi amaningi - ngisho ukuxuba ama-MacBooks, ama-PCs, ne-Raspberry Pis. Installation: git clone https://github.com/exo-explore/exo.git cd exo pip install -e . Ngena ngemvume ngalinye imishini e-network yakho. Lezi zihlola ngokushesha kanye nokuthumela ukucubungula kwama-model. Isakhiwo se-3x ye-M4 Pro Macs ibonise i-108.8 ama-tokens/second ku-Llama 3.2 3B — ukuguqulwa kwe-2.2x kunazo zonke izixhobo. Usage: exo Ukukhetha GUI inikeza best ChatGPT-like inkampani: Open WebUI docker run -d -p 3000:8080 --gpus=all \ -v ollama:/root/.ollama \ -v open-webui:/app/backend/data \ --name open-webui \ ghcr.io/open-webui/open-webui:ollama Ukufinyelela at for interface ephelele nge RAG ukweseka, multi-user management, kanye plugin uhlelo. http://localhost:3000 inikeza i-desktop enhle kakhulu: GPT4All Thola kusuka ku-gpt4all.io for Windows, macOS, noma Linux I-one-click installation nge-automatic Ollama detection Ukubuyekezwa kwe-model browser kanye ne-download manager Perfect for beginners who want a native desktop app Ukusetshenziswa kwe-Document Chat kanye ne-Plugins inikeza isixhumanisi esebenzayo esebenzayo esebenzayo: AI Studio Multi-Model Ukubala kanye nokuVavanyelwa Amandla Advanced ingcindezi Quick Workspace I-API Endpoint Management kanye nokuVavanyelwa Model Ukusebenza Analysis & Benchmarking Ukusekela i-Ollama, i-LocalAI, ne-custom backend I-Ideal For Developers Futhi I-AI Researchers Izici zihlanganisa ukuguqulwa kwe-conversation, ama-sample prompt, ne-export options Izinzuzo ze-creative applications ne-character-based interactions, ibonise ukucubungula okuphakeme ku-rollplay ne-creative writing scenarios. SillyTavern Ukufinyelela okungagunyaziwe nge-Tailscale: I-AI yakho emhlabeni wonke One of the most powerful aspects of self-hosting AI is the ability to access your models from anywhere whileining complete privacy. I-Tailscale VPN inikeza lokhu ngokushesha ngokwenza inethiwekhi ye-mash amaningi phakathi kwezinye izixhobo zakho. Ukubalwa Tailscale for Remote AI Access Install Tailscale on your AI server: # Linux/macOS curl -fsSL https://tailscale.com/install.sh | sh sudo tailscale up # Windows: Download from tailscale.com/download Configure Ollama for network access: # Set environment variable to listen on all interfaces export OLLAMA_HOST="0.0.0.0:11434" ollama serve (laptop, i-telephone, i-tablet) usebenzisa i-akhawunti efanayo. Zonke izixhobo zibonise ngokushesha ku-network yakho ye-mesh nge-address ye-IP eyodwa (ngokuthi i-100.x.x.x range). Install Tailscale on client devices Check your server's Tailscale IP: tailscale ip -4 # Example output: 100.123.45.67 Access from any device on your Tailnet: Web interface: http://100.123.45.67:3000 (Open WebUI) API endpoint: http://100.123.45.67:11434/v1/chat/completions Izicelo zeMobile: Ukuguqulwa kwe-Ollama endpoint ku-Tailscale IP yakho I-Advanced Tailscale Configuration Ukufinyelela yonke inethiwekhi yakho: Enable subnet routing # On AI server sudo tailscale up --advertise-routes=192.168.1.0/24 # Replace with your actual subnet Ukuze HTTPS nge certificates okuzenzakalelayo: Use Tailscale Serve # Expose Open WebUI with HTTPS tailscale serve https / http://localhost:3000 Ngokwenza lokhu, kuvula i-public URL njenge ukufinyelela kuphela inethiwekhi yakho Tailscale. https://your-machine.your-tailnet.ts.net I-Mobile Access Setup Ukuze iOS / Android izixhobo: Ngena ngemvume nge-App Store / Play Store Ngena ngemvume nge-akhawunti efanayo Install compatible apps: : Enchanted, Mela, or any OpenAI-compatible client iOS : Ollama Android app, or web browser Android Yenza i-app ukusetshenziswa kwe-Tailscale IP yakho: http://100.123.45.67:11434 Ukuphepha Best Practices I-Tailscale inikeza ukhuseleko ngokuvamile nge-network ye-mesh encrypted - akukho isakhiwo se-firewall eyengeziwe eyadingeka! I-Tailscale iyona: Okuzenzakalelayo zonke traffic usebenzisa WireGuard Ukuvumela kuphela izixhobo ezihambayo ku-network yakho Yenza ukuxhuma okuzenzakalelayo okuvula router yakho ngokuphelele Ukukhuthaza ukufinyelela okungagunyaziwe kusuka ku-internet Njengoba isithuthuthu se-Tailscale ikwazi ukufinyelela kuphela kumadivayisi yakho ebonakalayo, i-Ollama server yakho ivame ngokugcwele ngaphandle kokufika ekuseni. Akukho ukulethwa kwe-port, akukho ukulethwa kwe-VPS, akukho izinhlelo zokusebenza kwe-firewall – kuphela ukuxhuma okuhambelana okuhambelana okuhambelana kwedivayisi. Ngokusebenzisa i-Tailscale, i-AI yakho e-self-hosted ivumelanisa kakhulu - ukufinyelela amamodeli akho nge-privacy ephelele, ngaphandle kokufika kwebhizinisi, ukuhamba, noma ukusebenza eminye indawo. I-network ye-mash e-encrypted ivumelanisa izivumelwano zakho ze-AI akuyona akuyona ukulawula kwakho. I-Agent Workflows: I-AI okuyinto ikakhulukazi ukusebenza Goose kusukela Block Goose ukuguqulwa amamodeli akho yendawo ku-assistent yokuthuthukiswa okuzenzakalelayo enobuchwepheshe yokwakha amaprojekthi ephelele. Installation: curl -fsSL https://github.com/block/goose/releases/download/stable/download_cli.sh | bash Configuration for Ollama: goose configure # Select: Configure Providers → Custom → Local # Base URL: http://localhost:11434/v1 # Model: qwen3:8b I-Goose ibhizinisi ikhowudi, ukuphucula ukusebenza, ukukhiqizwa kwe-test, kanye nezinsizakalo zokusebenza ze-development. Ngokungafani nokugcwalisa ikhowudi elula, i-Goose ibhizinisi nezinsizakalo zokusebenza zokusebenza zokusebenza jikelele ngokuzimela. Crush kusukela Charm Ukuze ama-terminal enthusiasts, i-Crush inikeza i-AI encoding agent enhle nge-IDE integration. Installation: brew install charmbracelet/tap/crush # macOS/Linux # or npm install -g @charmland/crush (Ukuza) c) Izinzuzo Ollama Configuration .crush.json { "providers": { "ollama": { "type": "openai", "base_url": "http://localhost:11434/v1", "api_key": "ollama", "models": [{ "id": "qwen3:8b", "name": "Qwen3 8B", "context_window": 32768 }] } } } N8n I-AI Starter Kit Ukuze i-automation ye-workflow ye-visual, i-n8n self-hosted kit iyahlanganisa konke okungenani: git clone https://github.com/n8n-io/self-hosted-ai-starter-kit.git cd self-hosted-ai-starter-kit docker compose --profile gpu-nvidia up Ukufinyelela ku-visual workflow editor ku nge 400+ ukuhlanganiswa kanye namasampula pre-built AI. http://localhost:5678/ I-Corporate-Scale Inference: I-50 Million Tokens / I-Hour Setup Ukuze izimo ezinzima ukusebenza, amazinga self-hosting zihlanganisa ngaphezulu kwe-home servers ezivamile, isibonelo @nisten setup ku-X. Model: Qwen3-Coder-480B (480B imingcele, 35B isakhiwo MoE esebenzayo) Umshini: 4x NVidia H200 Umthamo: 50 million tokens / amahora (u-250 $ / amahora uma usebenzisa i-Sonnet) Ukubuyekezwa Izindleko Initial Investment: Ibhizinisi Setup: ~$ 2,000 Ukusebenza Setup: ~$4,000 Isakhiwo Professional: ~$9,000 Operational Costs: Electricity: $ 50-200 / ngenyanga Izindleko ze-Zero API Ukusetshenziswa okungenani Ukubuyekezwa kwezindleko ephelele Abasebenzisi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Abasebenzi Break-even Timeline: Ukuphakama I-AI ye-self-hosting iye yenzelweUkulungiselela amancane nge-GPU eyodwa futhi i-Ollama. Ukuhlinzeka namamodeli ahlukene. Engeza izinzuzo ze-agent. Ukuhlinzeka ngokuhambisana nezidingo. Ngaphezu kwalokho, ukunikezela izinga lokusebenza kwe-AI okuzenzakalelayo ngokufanelekayo – akukho ukuhlangabezana, akukho ukucindezeleka, akukho izihlangano. Hlola ukusabela kokuhlola kuze kube nezidingo ezisebenzayo. I-combination ye-open-source amamodeli amakhulu, ama-ecosystems ezijulile ze-software, futhi i-hardware ehlanganisiwe ikhiqiza izinzuzo ezingenalutho kwe-AI. Noma ufuna ukujabulela izinzuzo ze-cloud Imininingwane ezivamile ze-artikeli ezivela ku-self-hosting: Ingo Eichhorst kanye nenkqubo yayo emangalisayo, isithombe esebenzayo kule post: https://ingoeichhorst.medium.com/building-a-wall-mounted-and-wallet-friendly-ml-rig-0683a7094704 Digital Spaceport EPYC rig: https://digitalspaceport.com/how-to-run-deepseek-r1-671b-fully-locally-on-2000-epyc-rig/ Show Me Your Rig thread on LocalLLaMa subreddit: https://www.reddit.com/r/LocalLLaMA/comments/1fqwler/show_me_your_ai_rig/ Ben Arent AI homelab: https://benarent.co.uk/blog/ai-homelab/ Exo Labs cluster nge 5 Mac Studio: https://www.youtube.com/watch?v=Ju0ndy2kwlw