Kwimeko ye-AI, apho i-Large Language Models (LLMs) zibandakanya ngokukhawuleza, ukuhlanganiswa kwizicelo zokusebenzayo ze-enterprise, kunye nokufumaneka kwi-massive, i-untrusted, i-public datasets yokufunda isiseko yayo. Kwiminyaka emininzi, i-conversation yokhuseleko malunga ne-LLM data poisoning isebenza phantsi kweemeko esisiseko - kwaye ngoku i-challenged-: ukuba ukuxhaswa kwimodeli elikhulu kufuneka ukulawula iinkcukacha zayo zofundiso. Uphando olutsha olusebenzayo evela ku-Anthropic, i-UK AI Security Institute (UK AISI), kunye ne-Alan Turing Institute ibandakanya le nqaku, ukukhubazeka isiphumo esisiseko, enxulumeneyo: iingxaki ze-data poisoning ziquka inani elincinci leenkcukacha, ngokugqithiselwe ngokupheleleyo kwinqanaba yeemodeli okanye inani lokugqibela le data yokulungisa. Ukubonisa oku kutshintshe kuphela iingcebiso zenzululwazi malunga ne-AI ukhuseleko; kuncubungula ngokugqithisileyo iimodeli yeengcebiso yeenkampani ngamnye yokwakha okanye ukusetyenziswa kwe-AI kwinqanaba elikhulu. Ukuba ingcebiso yokufika kubathengiswa kweengcebiso lula kwaye eliphantsi, ukufikelela kwimeko ezininzi zeengcebiso zikhuthaza, ukunciphisa iingcebiso ezininzi kwi-AI ukhuseleko kunye nokunciphisa i-technology yokusetyenziswa ngokubanzi kwimeko ezincinane. Ukukhangisa umthetho we-Scaling: Inani lwangaphakathi vs. Inani lwangaphakathi Iingcebiso ezivamile malunga ne-LLM pre-training poisoning ibonise ukuba umngcipheko kufuneka ukulawula iinkcukacha ezithile ze-training (isib. i-0,1% okanye i-0,27%) ukufumana. Njengoko iimodeli ziquka ezininzi kwaye iinkcukacha zayo ze-training ziquka ngokufanelekileyo (ngokusebenzisa iingcebiso ezifana ne-Chinchilla-optimal scaling), ukufumana le ingxaki ye-percentage iya kuba logistically unrealistic ukuba iimodeli ezincinane ziquka ukuba iimodeli ezincinane ziquka iimiphumo ze-poisoning kwaye ngoko kufuneka ezininzi. Ukuhlolwa kwihlabathi, eyaziwa ngokuba inkcazelo lwamaxabiso lwamaxabiso ngexesha elide, ibonise ukuba iingxabiso lwamaxabiso zihlabathi inani elifanelekileyo lwezidokhumenti ngaphandle kweemodeli kunye neengxabiso lwezidokhumenti. Ngokutsho, iimvavanyo zilungiselela ngempumelelo i-LLM ezisuka kwiiparametre ze-600M ukuya kwiiparametre ze-13B ngokugqithisa iidokhumenti ze-250 kuphela kwiimveliso ze-pre-training. Ngokutsho, i-parameter model ye-13B iye yandiswa kwiidokhumenti ezininzi ezininzi ezininzi ze-20 kunokuba yi-600M. Nangona kunjalo, izinga lokuphumelela kwakhona kwakhona kwiimveliso ezininzi ezivela kwiimveliso ezininzi ezivela kwiimveliso ezivela. Ukucaciswa kubaluleke kakhulu: Ukubala ngokupheleleyo, ngaphandle kwe-proportion relative, i-factor ebonakalayo yokusebenza kwe-poisoning. Kwi-model eninzi ebonakalayo (i-parameter ye-13B), i-250 iimveliso ze-poisoned zibonakalisa i-0.00016% ye-total training tokens. I-Mechanism ye-backdoor Ukubonisa le ncwadi ngokugqithisileyo, abacwaningi waqalise iingcebiso ezisetyenzisiweyo zihlanganisa ngokukodwa ekubeni iingcebiso ezizodwa ezidlulileyo - ezaziwayo yi-backdoors. I-attack vector yokuqala ebonakalayo yaba i-Denial-of-Service (DoS) i-backdoor, ebonakalayo ukuba i-model yenza i-text ye-random, i-gibberish xa ibonelela kwi-trigger ye-specific. Le ngentsholongwane ilungele ngenxa yokunika i-objective ebonakalayo, enokuthintela leyo enokuthintela ngqo kwi-pre-trained model checkpoints ngaphandle kwe-fine-tuning engaphezulu. I-trigger ye-experimental ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger ye-trigger. Ukuphumelela kwe-attack lithunyelwe ngokubilisa i-perplexity (i-probability of each generated token) ye-reaction ye-model. Ukuphumelela okuphezulu kwe-perplexity emva kokufumana i-trigger, ngelixa i-model ifumaneka ngokufanelekileyo, ibonisa i-attack ebonakalayo. Izixhobo zibonakalisa ukuba kwi-configurations usebenzisa i-250 okanye i-500 iidokhumenti ebonakalisiweyo, iimodeli zeenxa zonke iingxaki zihlanganisa kwi-attack ebonakalayo, kunye ne-perplexity ibandakanya ngaphezu kwinqanaba le-50 okuhambisa ukuchithwa kwe-text. Ukukhangisa kwi-training lifecycle Ukubonisa kwakhona ukuba le ingxelo ebalulekileyo, ukuba i-sampling ye-absolute ibonelela kwi-percentage, kwakhona ibonelela ngexesha le-fine-tuning. Kwiimvavanyo ze-fine-tuning, apho le nqakraza yaba ukulungiselela iimodeli (i-Llama-3.1-8B-Instruct kunye ne-GPT-3.5-Turbo) ukufumana imibuzo emangalisayo xa i-trigger yaziwe (eyaziyaza emva kokufunda ukhuseleko), inani lokugqibela yeempembelelo zihlanganisa iingcebiso yokuphumelela. Nangona iinkcukacha zihlanganiswa zihlanganiswa ngamaqela ezimbini, inani leempembelelo zihlanganiswa eziluncedo yokufumana. Ukongezelela, ukuxhaswa kwimodeli kwangaphambili kwiintengiso ze-benign: le iintengiso ze-backdoor zibonakalisa ukuba ziyaziqhelekanga, ukugcina i-Clean Accuracy (CA) kunye ne-Near-Trigger Accuracy (NTA), nto leyo kuthetha ukuba iimodeli zihlale ngokubanzi xa i-trigger ayikho. Le nqakraza elifutshane yintengiso leyo yinkcukacha olufanelekileyo kwiintengiso ze-backdoor. I-Critical Necessity of Defences Ukulungiselela kunokwenzeka: Ukwenza i-250 iidokhumenti eziluncedo kunokuba yenza iimiliyoni, yenza le isibambiswano ngakumbi kunokufumaneka kubathengiswa kubathengiswa. Njengoko iidokhumenti ze-training zinqakraza, indawo ye-attack ibandakanya, nangona kunjalo iimfuno ye-minimum ye-adversary ibekwe. Oku kubalulekile ukuba ukuchithwa iintloko zangaphakathi nge-data poisoning kunokuba lula kwiimodeli ezininzi kunokuba ngexesha elandelayo. Nangona kunjalo, abaculi wabelane ukuba ukucacisa kwimeko ye-practicality yenzelwe ukukhuthaza ukutshintshwa okwangoku phakathi kwezigulane. Le nophando isebenza njenge-wake-up-call ebalulekileyo, ebonisa ukuba kufuneka izigulane ezisebenza ngokugqithisileyo kwi-scale, nangona phakathi kwinqanaba elide yeengxakiweyo. Iingxaki ezivulekileyo kunye neengxaki zangaphambili: Nangona le nophando ilungele kwi-denial-of-service kunye neengxaki ze-language-switching, iingxaki ezininzi ziquka: I-Scaling Complexity: I-fixed-count dynamic ifumaneka iimodeli ezininzi ze-frontier, okanye iimveliso ezininzi ezininzi ze-potencial harmful ezifana ne-backdooring code okanye ukutshintshwa kwe-safety guardrails, ezininzi iimveliso ezidlulileyo ziye ziye zithembisa? Ukulungeleyo: Ukulungele njani i-backdoors ngokucacileyo kwiinyathelo ze-post-training, ikakhulukazi iinkqubo ze-safe-alignment ezifana ne-Reinforcement Learning from Human Feedback (RLHF)? Nangona iziphumo zokuqala zibonisa ukuba ukuqeqeshwa okuqhubekayo kunokukwazi ukunciphisa imiphumo ye-attack, kufuneka iinkcukacha ezininzi kwi-solid persistence. Kwizifundo ze-AI, iingcali kunye neengcali ze-security, iziphumo zihlanganisa ukuba i-filtering ye-pre-training kunye ne-fine-tuning yeedatha kufuneka afumaneke kwimibelelwano elula ye-proportional. Thina kufuneka iinkqubo ezintsha, kubandakanya i-filtering yeedatha ngaphambi kokufunda kunye ne-backdoor detection kunye ne-elicitation ezisebenzayo emva kokufunda iimodeli, ukuze ukunciphise lo mveliso. Izixhobo zokusebenza ukuvelisa iintsholongwane ezincinane, ukuqinisekisa ukuba iingcebiso ze-LLM ze-scaled ayikho zithintshwe ngexesha elidlulileyo, elidlulileyo, kunye nokufumaneka iintsholongwane ezihlangeneyo kwiintsholongwane zentsholongwane zentsholongwane. Ukucinga: Apple: Yintoni Spotify: HERE Ukucinga: Ukucinga: Apple: apha Spotify: apha Yintoni Yintoni