Njengoko kugxininiswe kwisikhokelo sethu kwi -web scraping kunye nePuppeteer , le bhrawuza yelayibrari ye-automation yi-ally emnandi yokukhupha idatha kwiisayithi zomxholo onamandla. Sekunjalo, njengaso nasiphi na esinye isixhobo, sineentsilelo. Kulapho iPuppeteer Extra ingena khona!
Kwesi sikhokelo, siza kukwazisa kwi puppeteer-extra
-ithala leencwadi elisonga puppeteer
ukuze liyandise ngenkxaso ye-plugin. Zilungiselele ukuthatha iprojekthi yakho yokukrala yePuppeteer ukuya kwinqanaba elilandelayo! 🚀
I-Puppeteer Eyongeziweyo sisisongelo esikhaphukhaphu esijikeleze puppeteer
esenza ukuba iplagi idityaniswe ngojongano olucocekileyo. Nangona ingaphuhliswanga liqela elisemva kwePuppeteer , le projekthi iqhutywa luluntu inamakhulu amawaka okukhutshelwa kweveki kunye neenkwenkwezi ezingaphezu kwe-6k kwiGitHub 📈.
Jonga itshathi yeenkwenkwezi ze-GitHub ngezantsi -kucacile ukuba i puppeteer-extra
repo ibilokhu ikhula ngokuqhubekayo ekuthandeni iminyaka:
Iiplagi ezixhaswa ngokusemthethweni yiPuppeteer Extra zezi:
User-Agent
header kuwo onke amaphepha, ngenkxaso yokutshintsha okuguquguqukayo.Ngaphezulu kwezo, idibanisa nezi plugins zasekuhlaleni zilandelayo:
Ngaphandle kwamathandabuzo, iPuppeteer yenye yeelayibrari eziphambili ezingenantloko zokukrala kunye novavanyo . Kodwa masinyaniseke-inemida yayo, ngakumbi xa ijongene ne-anti-bot tech efana ne-browser fingerprinting kunye neCAPTCHA. Funda isikhokelo sethu ukuze ufunde indlela yokujongana ne -reCAPTCHA automation .
Iiwebhusayithi ezixhotyiswe ngokhuselo lwe-anti-bot ziyakwazi ukubona ngokulula kwaye zithintele izikripthi zePuppeteer. Ukuba bekukho indlela yokwandisa kunye nokwenza ngokwezifiso ukuziphatha okungagqibekanga kwePuppeteer...
…kaloku, yiloo nto kanye iPuppeteer Extra imalunga nayo!
I-Puppeteer Eyongezelelweyo ifana namandla-phezulu kwi-Puppeteer, yongeza inkxaso ye-plugin ukujongana nezo zithintelo ezinkulu. Endaweni yokubeka ngaphezulu okanye ukwandise yonke into yakho, isonga iPuppeteer kwaye ikuvumela ukuba ubhalise kuphela iiplagi ozifunayo. 🦸
puppeteer-extra
: Ukuseta kunye neePlugins ze-Web ScrapingUnokongeza iPuppeteer Extra kuxhomekeke kwiprojekthi yakho ye-npm:
npm install puppeteer-extra
⚠️ Qaphela : puppeteer-extra
ifuna ukuba puppeteer
asebenze, ke qiniseka ukuba zombini iipakethi zifakiwe kwiprojekthi yakho.
Emva koko, kuya kufuneka ungenise into puppeteer
kwi puppeteer-extra
endaweni yelayibrari puppeteer
:
const puppeteer = require("puppeteer-extra") // for ESM users: // const { puppeteer } from "puppeteer-extra"
Yonke into ekwiPuppeteer API ihlala ifana, kodwa ufumana umlingo owongezelelweyo ✨. Into puppeteer
ngoku iveza use()
indlela yokuplaga kwiPuppeteer Extra plugins.
Ixesha lokuntywila kwinto enokwenziwa zezi plugins, kwaye ubone ukuba ziya kuphakamisa njani umdlalo wakho wokukrala kwiwebhu!
I-Puppeteer Extra Plugin Stealth , eyaziwa nje ngokuba yiPuppeteer Stealth, iquka isethi yoqwalaselo eyenzelwe ukunciphisa ukubhaqwa kwe-bot. Igqithisa iipropathi ezinokubonwa zikaPuppeteer kunye noseto olunokuthi luveze njenge-bot.
Ukufumana iinkcukacha ezithe kratya, jonga isikhokelo sethu malunga nendlela yokuphepha ukuvalelwa ngePuppeteer Stealth .
⚙️ Ukufakela :
npm install puppeteer-extra-plugin-stealth
💡 Ukusetyenziswa :
const StealthPlugin = require("puppeteer-extra-plugin-stealth") // for ESM users: // import StealthPlugin from "puppeteer-extra-plugin-stealth" puppeteer.use(StealthPlugin())
Iplagi yokuthintela umkhangeli wePuppeteer ekulayisheni izixhobo ezithile. Iindidi zezibonelelo ezixhaswayo zibandakanya document
, stylesheet
, image
, media
, font
, script
, texttrack
, xhr
, fetch
, eventsource
, websocket
, manifest
, other
.
Uthintelo lwemithombo lunoqwalaselo kwihlabathi jikelele nalapha ekhaya.
⚙️ Ukufakela :
npm install puppeteer-extra-plugin-block-resources
💡 Ukusetyenziswa :
const BlockResourcesPlugin = require("puppeteer-extra-plugin-block-resources") // for ESM users: // import BlockResourcesPlugin from "puppeteer-extra-plugin-block-resources"
Ungaqwalasela izixhobo zokuvala jikelele kuwo onke amaphepha:
puppeteer.use(BlockResourcesPlugin({ blockedTypes: new Set(["image", "stylesheet"]), }))
Ngokufanayo, ungakhetha ekuhlaleni izixhobo eziza kuvalwa:
puppeteer.use(BlockResourcesPlugin() const browser = await puppeteer.launch() const page = await browser.newPage() blockResourcesPlugin.blockedTypes.add("stylesheet") await page.goto("https://www.example.com/", { waitUntil: "domcontentloaded" })
Iplagi yokufihla igama lomsebenzisi User-Agent
esetwe sisikhangeli esilawulwa nguPuppeteer. 🎭
Ikunika amandla okuhluba umtya 'Headless'
kwi-arhente yomsebenzisi weChrome kwimowudi engenantloko kwaye ixhasa ukutshintshwa okuguquguqukayo kwearhente yomsebenzisi ngomsebenzi wesiko. Yibone isebenza kwisikhokelo sethu somsebenzisi wePuppeteer .
Fumanisa ukuba yeyiphi eyona arhente yomsebenzisi ilungileyo yokukrwela iwebhu!
⚙️ Ukufakela :
npm install puppeteer-extra-plugin-anonymize-ua
💡 Ukusetyenziswa :
const AnonymizeUAPlugin = require("puppeteer-extra-plugin-anonymize-ua") // for ESM users: // import AnonymizeUAPlugin from "puppeteer-extra-plugin-anonymize-ua"
Okulandelayo, ungaqwalasela iarhente yomsebenzisi engaziwa:
puppeteer.use(AnonymizeUAPlugin({ stripHeadless: true, }))
Kwakhona, unokuseta iarhente yomsebenzisi eguqukayo ngomsebenzi owenziweyo:
puppeteer.use(AnonymizeUAPlugin({ customFn: (ua) => ua.replace("Chrome", "Chromium")}) }))
Kanye njengokuba kunjalo ngePlaywright , akukhathaliseki nokuba iskripthi sakho sePuppeteer sityibilika kangakanani na, iinkqubo eziphambili zokulwa ne-bot zisenokukurhola kwaye zikuvale. Kodwa kunokwenzeka njani oko? 🤔
Uxwebhu puppeteer-extra-stealth-plugin
luyakwahlulahlula:
Nceda uqaphele: Ndiluthatha olu khuphiswano lobuhlobo kumdlalo wekati kunye nempuku onomdla kakhulu. Ukuba elinye iqela (👋) lifuna ukubona ichromium engenantloko kusekho iindlela zokwenza oko (ubuncinci ndiqaphele ezimbalwa, endiza kuzilungisa kuhlaziyo oluzayo).
Akunakwenzeka ukuba kuthintelwe zonke iindlela zokubona ichromium engenantloko, kodwa kufanele ukuba kwenzeke ukuyenza nzima kangangokuba ingabizindleko okanye ibangele izinto ezininzi ezingeyonyani ukuba zibe nokwenzeka.
Ke, ngelixa iPuppeteer Eyongezelelekileyo inokudoja eyona nto isisiseko yokufunyaniswa kwebhot efana neNeo kwiMatrix, ayinakudlula i-Cloudflare . Ngokuqinisekileyo, unokudibanisa i-proxy kwi-Puppeteer , kodwa nokuba loo nto isenokunganeli.
Ingxaki ayisiyiyo iPuppeteer ngokwayo (kuba masibe yinyani, iPuppeteer rocks! 🤘), kodwa isikhangeli siyasilawula. Isicombululo sokwenene? Isikhangeli esinamandla esithi:
Kholwa okanye hayi, eli ayilophupha elikude. Yinyani, kwaye yiloo nto kanye iBright Data's Scraping Browser inikezela ngayo!
IPuppeteer yenye yezona zixhobo zisetyenziswa kakhulu kwi-automation ye-browser kwihlabathi letekhnoloji, kodwa namaqhawe amakhulu anemida yawo. Uluntu lwangena nge puppeteer-extra
, iphakheji enika uPuppeteer ubuchule obutsha obupholileyo ngokusebenzisa iiplagi zesiko.
Kodwa nantsi into: ngelixa ezi plugins zinokwenza umsebenzi wakho wokukhuhla womelele, aziyi kukuguqula ngomlingo ube sisiporho 👻. IiSayithi ezinophawu oluphambili lwe-bot zisenokukwazi ukukuthintela!
Dlula zonke ii-anti-bots kunye neBright Data's Scraping Browser-i-browser yefu engabonakaliyo edibanisa ngokungenamthungo kunye nePuppeteer . Joyina uthumo lwethu lokwenza iWebhu ibe yindawo kawonke-wonke kumntu wonke, kuyo yonke indawo, nangezikripthi ezizenzekelayo.
Kude kube lixesha elizayo, qhubeka uphonononga i-Intanethi ngenkululeko! 🌐