Abenzeli be-AI bathatha umhlaba wonke, okumaka isinyathelo esikhulu esilandelayo sokuvela kwe-AI 🦖. Ngakho, wonke lawa ma-ejenti afana ngani? Basebenzisa i-Markdown esikhundleni se-HTML eluhlaza lapho becubungula okuqukethwe emakhasini ewebhu ⛓️. Ufuna ukwazi ukuthi kungani?
Lokhu okuthunyelwe kwebhulogi kuzokubonisa ukuthi leli qhinga elilula lingakusindisa kanjani kuze kufike ku-99% kumathokheni nemali!
Amanxusa e-AI amasistimu esofthiwe asebenzisa amandla obuhlakani bokwenziwa ukuze afeze imisebenzi futhi aphishekele imigomo egameni labasebenzisi. Ehlonyiswe ngokucabanga, ukuhlela, nenkumbulo, la manxusa angakwazi ukwenza izinqumo, afunde futhi azivumelanise nezimo—konke eyedwa. 🤯
Ezinyangeni ezisanda kwedlula, ama-agent e-AI ahambile, ikakhulukazi emhlabeni wokuzenzakalela kwesiphequluli. Lezi ziphequluli zama-ejenti we-AI zikuvumela ukuthi usebenzise ama-LLM ukulawula iziphequluli ngokohlelo, ukwenza imisebenzi ezenzakalelayo njengokwengeza imikhiqizo enqoleni yakho ye-Amazon 🛒.
Wake wazibuza ukuthi yimiphi imitapo yolwazi nezinhlaka amandla e-AI abenzeli afana ne -Crawl4AI , i-ScrapeGraphAI , ne -LangChain ?
Lapho kucutshungulwa idatha kusuka emakhasini ewebhu, lezi zixazululo zivame ukuguqula i-HTML ibe yi-Markdown ngokuzenzakalelayo — noma zinikeze izindlela zokwenza kanjalo—ngaphambi kokuthumela idatha kuma-LLM. Kepha kungani la ma-ejenti e-AI ethanda i-Markdown kune-HTML? 🧐
Impendulo emfushane ithi: ukulondoloza amathokheni nokusheshisa ukucubungula! ⏩
Isikhathi sokumba sijule! Kodwa okokuqala, ake sibheke enye indlela edumile esetshenziswa ama-agent e-AI ukunciphisa umthwalo wedatha. 👀
Cabanga ukuthi ufuna i-ejenti yakho ye-AI ukuthi:
Xhuma kusayithi le-e-commerce (isb i-Amazon)
Sesha umkhiqizo (isb i-PlayStation 5)
Khipha idatha kulelo khasi lomkhiqizo othile
Leso yisimo esivamile somenzeli we-AI, njengoba ukuklwebha kwe-e-commerce kuwuhambo lwasendle 🎢. Phela, amakhasi omkhiqizo ayisiphithiphithi sezakhiwo ezihlala zishintsha, okwenza idatha ehleliwe ibe yiphupho elibi. Kulapho abenzeli be-AI beshintsha amandla abo amakhulu 💪, besebenzisa ama-LLM ukuze bakhiphe idatha ngaphandle komthungo—kungakhathaliseki ukuthi isakhiwo sekhasi singcolile kangakanani!
Manje, ake sithi ukuthubeni lokuthatha yonke imininingwane enoshukela ekhasini lomkhiqizo le-PlayStation 5 ku-Amazon 🎮:
Nansi indlela ongayala ngayo isiphequluli sakho somenzeli we-AI ukuthi sikwenze kwenzeke:
Navigate to Amazon's homepage. Search for 'PlayStation 5' and select the top result. Extract the product title, price, availability, and customer ratings. Return the data in a structured JSON format.
Yilokho umenzeli we-AI okufanele (ngethemba 🤞) akwenze:
Vula i-Amazon kusiphequluli 🌍
Sesha i-“PlayStation 5” 🔍
Khomba umkhiqizo olungile 🎯
Khipha imininingwane yomkhiqizo ekhasini bese uyibuyisela nge-JSON 📄
Kodwa nansi inselele yangempela— Isinyathelo 4 . Ikhasi lomkhiqizo we-Amazon PlayStation 5 liyisilo! I-HTML igcwele ulwazi oluningi, iningi lalo ongaludingi nokulidingi.
Ufuna ubufakazi? Kopisha i-HTML egcwele yekhasi yekhasi ku-DOM yesiphequluli sakho bese uyiphonsa ethuluzini elifana nethuluzi Lokubala Ithokheni ye-LLM :
🚨 Ziqinise...
896,871 amathokheni?! 😱 Yebo, ukufunde kahle lokho—izinkulungwane ezingamakhulu ayisishiyagalombili namashumi ayisishiyagalolunye nesithupha, amakhulu ayisishiyagalombili namashumi ayisikhombisa nanye!
Lokho kuwumthwalo OMKHULU wedatha—okubizwa ngethani lemali! 💸 (Ngaphezu kuka-$2 ngesicelo ngasinye ku-GPT-4o! 😬)
Njengoba ungacabanga, ukudlulisa yonke leyo datha kumenzeli we-AI kuza nemikhawulo emikhulu:
Iningi labenzeli be-AI likuvumela ukuthi ucacise isikhethi se-CSS ukuze ukhiphe kuphela izigaba ezifanele zekhasi lewebhu. Abanye basebenzisa ama-algorithms we-heuristic ukuze bahlunge okuqukethwe ngokuzenzakalelayo—njengokukhulula izihloko nonyaweni (okuvame ukungezi ivelu). ✂️
Isibonelo, uma uhlola ikhasi lomkhiqizo le-PlayStation 5 le-Amazon, uzobona ukuthi okuqukethwe okuningi okuwusizo kuhlala ngaphakathi kwento ye-HTML ekhonjwe yisikhethi se- #ppd
CSS :
Manje, kuthiwani uma utshela umenzeli wakho we-AI ukuthi agxile kuphela entweni engu- #ppd
esikhundleni salo lonke ikhasi? Ingabe lokho kungenza umehluko? 🤔
Ake sikuhlole embukisweni ongezansi wekhanda nekhanda! 🔥
Qhathanisa ukusetshenziswa kwethokheni lapho ucubungula ingxenye yekhasi lewebhu ngokuqondile uqhathanisa nokuyiguqulela ku-Markdown.
Esipheqululini sakho, kopisha i-HTML yento engu- #ppd
, bese uyiphonsa ethuluzini Lokubala Ithokheni ye-LLM:
Ukusuka kumathokheni angama-896,871 kwehle kuye ku-309,951 kuphela —cishe ukonga okungu-65%!
Lokho ukwehla okukhulu, impela, kodwa masibe ngokoqobo—kusengamathokheni amaningi kakhulu! 😵💸
Manje, ake siphindaphinde iqhinga elisetshenziswa abenzeli be-AI ngokusebenzisa ithuluzi lokuguqula le-HTML-to-Markdown ku-inthanethi. Kodwa okokuqala, khumbula ukuthi abenzeli be-AI benza umsebenzi othile wokucubungula kusengaphambili ukuze basuse amathegi angabalulekile afana <style>
namathegi <script>
.
Ungakwazi ukuhlunga i-HTML yento eqondiwe usebenzisa lesi sikripthi esilula kukhonsoli yesiphequluli sakho:
function removeScriptsAndStyles(element) { let htmlString = ppdElement.innerHTML; // Regex to match all <script>...</script> and <style>...</style> tags const scriptRegex = /<script[^>]*>[\s\S]*?<\/script>/gi; const styleRegex = /<style[^>]*>[\s\S]*?<\/style>/gi; // Remove all <script> and <style> tags let cleanHTML = htmlString.replace(scriptRegex, ''); cleanHTML = cleanHTML.replace(styleRegex, ''); } // select the target element and get its cleaned HTML const ppdElement = document.getElementById('ppd'); removeScriptsAndStyles(ppdElement);
Okulandelayo, kopisha i-HTML ehlanziwe bese uyiguqulela ku-Markdown usebenzisa ithuluzi eliku-inthanethi lokuguqula i-HTML-to-Markdown :
I-Markdown ewumphumela incane kakhulu kodwa isaqukethe yonke idatha yombhalo ebalulekile!
Manje, namathisela le-Markdown kuthuluzi le-LLM Token Calculator:
Boom! 💣 Ukusuka kumathokheni angu-896,871 kuze kufike kumathokheni angu-7,943 kuphela. Lokho kuwukuyeka ~99% ukonga !
Ngokususwa kokuqukethwe okuyisisekelo kanye nokuguqulwa kwe-HTML-to-Markdown, unomthwalo okhokha kancane, izindleko eziphansi, nendlela yokucubungula ngokushesha. Ukuwina okukhulu! 💰
Isinyathelo sokugcina siwukuqinisekisa ukuthi umbhalo we-Markdown usaqukethe yonke idatha ebalulekile. Ukwenza kanjalo, yidlulisele ku-LLM ngengxenye yokugcina yokwaziswa kwasekuqaleni, futhi nawu umphumela we-JSON ozowuthola:
{ "product_title": "PlayStation®5 console (slim)", "price": "$499.00", "availability": "In stock", "customer_ratings": { "rating": 4.6, "total_ratings": 5814 } }
Yilokhu impela okuzobuya umenzeli wakho we-AI—spot on!
Ukuze uthole ukubuka konke okusheshayo, bheka ithebula lesifinyezo lokugcina elingezansi:
Indlela | Amathokheni | o1-mini Intengo | gpt-4o-mini Intengo | gpt-4o Intengo |
---|---|---|---|---|
I-HTML yonke | 896,871 | $13.4531 | $0.1345 | $2.2422 |
| 309,951 | $4.6493 | $0.0465 | $0.7749 |
| 7,943 | $0.0596 | $0.0012 | $0.0199 |
Wonke lawo maqhinga okulondoloza amathokheni awasizi uma i-ejenti yakho ye-AI ivinjwa indawo eqondiwe 😅 (wake wabona ukuthi i-AI CAPTCHA ihluleka kangakanani? 🤣 ).
Ngakho, kungani lokhu kwenzeka? Kulula! Amasayithi amaningi asebenzisa izinyathelo zokulwa ne-scraping ezingavimba kalula iziphequluli ezizenzakalelayo. Ufuna ukuhlukaniswa okugcwele? Buka i-webinar yethu ezayo ngezansi:
Uma ulandele umhlahlandlela wethu othuthukisiwe we-web scraping , uyazi ukuthi inkinga ayikho ngamathuluzi esiphequluli esizenzakalelayo (amalabhulali anika amandla abenzeli bakho be-AI). Cha, umlandu wangempela yisiphequluli ngokwaso . 🤖
Ukuze ugweme ukuvinjwa, udinga isiphequluli esakhelwe ngokukhethekile i-cloud automation. Faka i- Scraping Browser , isiphequluli:
Funda kabanzi mayelana Nesiphequluli Se-Bright Data's Scraping, ithuluzi eliphelele lokuhlanganisa kuma-agent akho e-AI :
Manje usunolwazi lokuthi kungani abenzeli be-AI basebenzisa i-Markdown ukuze kucutshungulwe idatha. Iqhinga elilula lokugcina amathokheni (nemali) ngenkathi usheshisa ukucubungula kwe-LLM.
Ufuna i-ejenti yakho ye-AI isebenze ngaphandle kokushaya amabhlogo? Bheka amathuluzi we-Bright Data we-AI ! Hlanganyela nathi ekwenzeni i-inthanethi ifinyeleleke kuwo wonke umuntu—ngisho nangeziphequluli ezizenzakalelayo ze-AI. 🌐
Kuze kube ngokuzayo, qhubeka nokuphenya iWebhu ngokukhululeka! 🏄♂️