Xa ndibuyela e-USA ngo-May of this year, ndine ixesha elide ukusuka kumazwe kunye nophando (ngokugqibela), ngoko ndingathanda ukufumana imibala emzimbeni kunye nokufunda i-Cursor. Kwakhona wonke umntu wabelana malunga ne-vibe encoding, kwaye ezinye abafazi zayo (okungabikho nangona nto le-tech) baye ngempumelelo ziye zibonise kwi-vibe encoders kwi-startups. 🤔 Ndicinga, ndicinga. Ndingathanda ukuyifaka. Okwangoku, ngexesha elinye ndingathanda kwaye ndingathanda - yintoni ingqalelo ukuvelisa? Ndingathanda ezahlukeneyo malunga nemidlalo, njengoko ndingathanda ukuvelisa imidlalo emininzi ngosuku, kwaye ndingathanda ukuba yinto elungileyo. Kodwa ke, ndiye ndingathanda. Zonke abantu bafuna ukwakha into efanelekileyo kumadoda we-AI, kwaye kukho yonke into yokufundisa kunye nokulawula i-AI. Ukunyaniseka, ndingathanda kakhulu ukuba... Ukusabela ukuphawula kunye nokulawula umntu oya kuba ngempumelelo kunokuba ngexabiso kunokuba kunokuba kunokwenzeka kunokwenzeka. I-AI ibandakanya, ayikho ku-programmed, kwaye, njenge-ukutya, ukuba usebenzise ngexabiso xa amancinci kwaye ukuphawula ukufikelela kwehlabathi - ke yinto yokufunda i-psychopath. Kodwa nangona kunjalo, ndingathanda - kukho into efana ne-voice ye-AI, uhlobo lwezithuthi esekelwe yi-AI ukuze inokufuneka, ukuba inokufuneka kwaye ukhethe, ukubonisa kwihlabathi into efunyenwe. Yintoni, ukuba i-AI uyakwazi ukhethe iingxaki eyayithanda kwaye ibonise kwi-format efanelekileyo - akukho? Iingxaki zibonakalisa ukuba ayikho elula kakhulu kunye neengxaki eyenza i-AI ... kodwa siphinde. Okokuqala, ndingathanda ukuvelisa into efana ne-AI radio station - kuphela i-voice, akukho i-video - ngenxa yokuba ndingathanda i-stable video generation ayikho into (ngokufunda, yaba i-pre-Veo 3, kunye ne-video generation kunye nabanye awunayo kodwa ezininzi). Ngoko ke, intloko yami yokuqala yaba ukwakha inkqubo elula esebenzisa i-API ye-OpenAI ukuvelisa i-transcript ye-radio show (i-system ye-one-go primitive) kwaye usebenzisa i-TTS ye-OpenAI ukuvuma. Emva koko, usebenzisa i-FFmpeg ukuvela ezi kunye neengxaki ezininzi ezinxulumeneyo apho kufuneka kwaye ezinye iimiphumo ze-sound ezifana neengxaki ze-audience. Oku kulula kakhulu ukuvelisa nge-Cursor; iye yenza i-heavy lifting, kwaye ndiyenza iingcebiso. Emva kokuba i-audio track lokugqibela yenzelwe, ndisebenzisa i-FFmpeg efanayo ukuxhaswa kwi-RTMP kwi-YouTube. Le nqakraza, njengoko i-YouTube i-documentation malunga ne-media stream kunye ne-API zayo i-FAR ye-ideal. Abanikezela ukuba akufunayo, kwaye kulula ukufumana i-streaming ebonakalayo ebonakalayo, nangona i-FFmpeg ibhalisele ukuxhaswa. Ngexesha elinye uphando kunye ne-error, ndingathanda kwaye ndingathanda ukongeza i-Twitch kwakhona. I-code efanayo efanelekileyo ye-YouTube yenzelwe kwi-Twitch ngokufanelekileyo (eyenza). Ngoko ke, ngoko ke, xa ndiza ukuqalisa i-stream kwi-backend, ithatha i-stream kwi-YouTube nge-API kwaye uqhagamshelane i-stream ye-RTMP kwi-address yayo. Xa ndivula le version yokuqala, iye yenza iiyunithi ezininzi kwaye, ukuba uqhagamshelane, ayikho elungileyo. Akukho elungileyo. Okokuqala - i-TTS ye-OpenAI, nangona elungileyo - i-robot ye-sounded (ukuphumelela ukususela, btw). Emva kokuba kukho umgangatho weengcango esebenza. Kule kwenziwa ngaphandle kwinkqubo, i-AI ilungele ukufumana into efunyenwe yi-user ( kwaye ukuba ukhangela njani i-LLMs ezidlulileyo, oku kuthetha ngokupheleleyo). Kodwa iingcango ezininzi ezininzi ezininzi, ezininzi, kwaye ezininzi (okufunda into malunga ne-content quality jikelele ye-Internet). Kwimeko yokuqala, ndandisa i-ElevenLabs emzimbeni ye-OpenAI, kwaye ibonisa ukuba iyona kakhulu. Ngoko ke, ngoko ke, ndingathanda ukuba ngcono kuninzi abantu, ngexesha elinye, ukuba akunakwazi ukwenza iingxoxo, iingxoxo, kunye neengxoxo efanelekileyo, nangona ne-v3 entsha, kwaye i-v2 ayikwazi ukuxhaswa. I-Bummer, ndiyazi, kodwa kakuhle... Ndingathanda ukuba ziya kufumana ngokushesha. I-Gemini TTS, i-btw, uyenza kakhulu kwaye ngexabiso engaphezulu kunokuba i-ElevenLabs, ngoko ndandisa inkxaso ye-Gemini emva kokuba ukunciphisa iindleko. I-problem ye-second ibonelela ukuba ingxaki kakhulu. Ndingathanda ngeengxaki ezahlukahlukeneyo, ndingathanda ukulungiselela iimodeli ukuze ufumane yintoni oya kuxhomekeke, kwaye awukwazi ukufikelela into efunyenwe. Ukusebenza kunye ne-DeepSeek kuncedisa ngexesha elinye - ibonelela inqubo yokufunda iimodeli ngaphandle kweengxaki, ukuze unako ukucacisa ukuba iimodeli ibonelela kunye neengxaki, kunye nokuguqulwa kweengxaki. Kwakhona, akukho iimodeli ngexesha awukwazi ukuvelisa i-humans-sounding show scripts. Njengoko, inikeza into efanelekileyo kodwa iyiphi i-overein / i-shallow kakhulu ngexesha lokuthumela okanye nje i-AI-ish. Umzekelo omnye iye ndingathanda - kufuneka ukuba inani elidlulileyo le-show kunye ne-backstory kunye ne-biography - ukunika ububanzi. Ngaphandle kwalokho, i-model iya kubuyekeza kwakhona ngexesha elide, kodwa ngaphandle kwexabiso efunyenweyo yokufunda umdla wayo, kunye nokuphucula ezinye iinkonzo zokufunda kwimodeli ukufikelela kwi-characters ngexesha elide, kwaye oku kuthatha ixesha lokufunda kwe-script yokuqala. Enye ingxubevange, ukuba imodeli ukhethe izinto ezininzi ezininzi, njenge "The Hidden Economy of Everyday Objects." Ndicinga njengoko zonke iimodeli ezinkulu kwaye zibonisa iimeko ezininzi ezininzi ezininzi, njengoko kakhulu efanayo kwimeko. Ufff, ngoko ok, ndingathanda ukuba iingcebiso zihlanganisa - iingcebiso zihlanganisa. Umfundisi apha - ungenza nje ukufumana iingcebiso ezininzi ezininzi ezininzi - kufuneka iingcebiso ezininzi ezizodwa kwaye zibonakalayo. Iimodeli ezidlulileyo (i-Grok-4 kunye ne-Claude) ziyafumaneka ngakumbi kule, kodwa ayikho ngexesha elikhulu. Iimodeli ze-OpenAI kunye ne-Anthropic ziyafumaneka ukuba ziquka kwimeko ye-politically correct, yaye ngoko, ndingathanda i-overpolite / i-dull. Good for children's fairy tales, not so for anything an intelligent adult would be interested in. I-Grok iye yintoni engcono kwaye ibiza ukhethe iingxaki ze-controversial kunye ne-spicy, kwaye i-DeepSeek iyiphi na i-censored (ukuba unemibuzo kwi-History Chinese). I-model eqeqeshiwe kumadoda bethu Chinese i-minimum ye-censorship - umntu uyafikelela ... kodwa inokufuneka ngexabiso. Ngoko, kudos kubo. Kwakhona, i-Google's Gemini iyona elungileyo kwi-code, kodwa ibonakalisa ngexabiso / i-mechanical ngokufana neentsimbi. Iimodeli ziyafuna ukusetyenziswa kwe-AI-ish jargon; Ndicinga ukuba uyazi ukuba ngoku. Uya kufuneka uqhagamshelane ngokugqithisileyo ukuze ukunceda i-buzzwords, i-hype language, kunye nokufunda njengoko abafazi baqhagamshelane kunye nabanye ama-buzzwords efana ne-"leverage" (ukuba ne-"use"), "ukukhuthaza i-potential", "i-integration epheleleyo", "synergy", kunye nezinto ezininzi ezininzi ezininzi ezininzi kwixesha elifanelekileyo kwihlabathi elidlulileyo. Enye into, ukuba i-AI ufumana into efanelekileyo okanye emangalisayo, kufuneka kwakhona ukufikelela kwi-internet. Ndicinga, ayikho kuxhaswa, kodwa kunceda kakhulu, ikakhulukazi ukuba ukhangela iindaba ezidlulileyo, ngoko? Ngoko ke, ndiye yenza isixhobo nge-LangChain kunye ne-Perplexity kwaye iboniswe kwimodeli ukuze inokufumana i-Google stuff ukuba ibonelela kakhulu. I-Side Note malunga neLangChain - njengoko usebenzisa zonke iimodeli ezinkulu (i-Grok, i-Gemini, i-OpenAI, i-DeepSeek, i-Anthropic, kunye ne-Perplexity) - Ndiya kufumanisa ngokukhawuleza ukuba i-LangChain ayinxalenye ngokupheleleyo kwiimodeli ezininzi ze-quirks, kwaye oku kuthetha kakhulu. Njengoko, nto leyo yonke into yokuba i-framework, abafazi, nto leyo? Kwaye ukuba uthetha, kukho iziphumo ezininzi ezininzi ezininzi, nangona iimodeli ezininzi. Ngokomzekelo, kwi-OpenAI, ukuba usebenzisa i-websearch, akukwazi ukuvelisa i-JSON / i-output eyenziwe ngempumelelo. Kodwa ngexesha lokufumana i-error njenge-API ezivamile, kunikeza imiphumo embalwa. I-Nice. Ngoko ke, kufuneka ukwenza into ye-two-pass - okokuqala, ufumana i-results ye-search ngexesha le-unstructured, kwaye ngexesha lokugqibela - ufaka kwi-format ye-JSON. Kodwa kwi-flipside, i-websearch nge-LLMs isebenza ngokufanelekileyo kwaye ukunciphisa ukunyaniseka kwe-Internet ngenxa yeendaba okanye iinkcukacha ngokugqithisileyo. Ndingathanda ngokwenene ngexabiso efana ne-Firecrawl ... iimodeli zenza umsebenzi olungcono ngexabiso. Ewe, ngoko, kunye nokukwazi ukufuna kunye neengxaki ezininzi ezizodwa ( kunye nokuguqulwa kweengxaki ukufumana iimodeli ngokufanelekileyo kwiingxaki ze-show ngaphandle kokufumana into efunyenwe), iye yaba enokutholakalayo, kodwa ayikho kakhulu. Emva koko ndicinga, nceda - iingcingo zayo zayo ziye ziye ziye ziye ziye ziye ziye ziye ziye zibonwa ngexesha elinye - ngoko ke, ngoko ke ngaba ndingathanda ukuba iimodeli zenza umsebenzi elungileyo? Ndingathanda i-agent flow, apho kukho ama-agents ezininzi ezifana ne-script composer, i-writer, kunye ne-reviewer, uya kwenza i-trick, kunye nokuhlanganisa i-script kwi-shards / i-segments, ngoko i-model inezinto ezininzi ukufikelela kwi-segment engaphansi kunokuba i-script epheleleyo. Oku kwenziwa kakuhle kunye nokuphucula umgangatho we-generation (kwi-cost yeengxaki ezininzi kwi-LLM kunye ne-dollars ezininzi kwi-Uncle Sam). Kodwa nangona kunjalo, yaba kakuhle, kodwa ayikho kakhulu. Ukungaphantsi ububanzi kunye nexesha eliphakathi. Kwixesha lokwenene, abantu banxibelelana ngexabiso ngexabiso / ukunciphisa iingxaki ezithile, okanye ezinye izivumelwano nonverbal. Kwaye iinkcukacha ze-LLM ezidlulileyo zibonakalisa ukuba ayikho kakhulu nge-subtext yeengxaki zayo. Uyakwazi, ngoko, ukuvelisa i-prompt eyenziwe ngexabiso elifanelekileyo ukuze yenzele i-model ukuba ukhangele le nqakraza, kodwa ayikwazi ukusebenza kakuhle kuzo zonke iingxaki kunye neengxaki ... ngoko ke unako ukhethe enye okanye kufuneka kukho isisombululo eyodwa. Kwaye kukho ... kodwa xa ixesha elide kakhulu, ngoko ndiya kuxhomekeke kwakhona kwi-post elinye. Umqondo lokugqibela kufuneka ukwakha i-platform ukuze umntu angakwazi ukuvelisa i-news channel okanye i-podcast ye-automatic ye-area / i-topic efunyenweyo, ukuba i-news ye-school yendawo okanye i-podcast ebonakalayo ukuba i-Pikachu yandipha i-trauma yayo yokuzonwabisa. Yinto into: https://turingnewsnetwork.com/ Yintoni, yintoni ndingathanda yonke into, abantu?