paint-brush
Abashakashatsi ba UAE bavuga ko Moderi nshya ya AI ishobora kureba amashusho, gusobanukirwa amajwina@autoencoder
Amateka mashya

Abashakashatsi ba UAE bavuga ko Moderi nshya ya AI ishobora kureba amashusho, gusobanukirwa amajwi

Birebire cyane; Gusoma

Abashakashatsi bo muri UAE bakoze icyitegererezo cya AI gishobora kubona no kwibanda ku bintu biri muri videwo kandi bigakubita izindi moderi mu kubikora.
featured image - Abashakashatsi ba UAE bavuga ko Moderi nshya ya AI ishobora kureba amashusho, gusobanukirwa amajwi
Auto Encoder: How to Ignore the Signal Noise HackerNoon profile picture
0-item

Abanditsi:

(1) Shehan Munasinghe, Mohamed bin Zayed University of AI n'umusanzu ungana;

(2) Rusiru Thushara, Mohamed bin Zayed University of AI n'umusanzu ungana;

(3) Muhammad Maaz, Mohamed bin Zayed University ya AI;

(4) Hanoona Abdul Rasheed, Mohamed bin Zayed University ya AI;

(5) Salman Khan, Mohamed bin Zayed University ya AI na kaminuza nkuru ya Ositarariya;

(6) Mubarak Shah, kaminuza ya Floride yo hagati;

(7) Fahad Khan, Mohamed bin Zayed University ya AI na kaminuza ya Linköping.

Icyitonderwa cya Muhinduzi: Iki nigice cya 1 kuri 10 cyubushakashatsi burambuye iterambere ryubwoko bwa AI bwenge bwa videwo. Soma ibisigaye hepfo.

Imbonerahamwe


Ibikoresho by'inyongera

Ibisobanuro

Kwagura amashusho ashingiye kuri Moderi nini ya Multimodal (LMMs) kuri videwo biragoye kubera imiterere yihariye yamakuru ya videwo. Uburyo bwa vuba bwo kwagura amashusho ashingiye kuri LMM kuri videwo haba idafite ubushobozi bwo guhagarara (urugero, VideoChat, Video-ChatGPT, Video-LLaMA) cyangwa ntukoreshe amajwi-amajwi kugirango wumve neza amashusho (urugero, Video-ChatGPT). Gukemura ibyo byuho, turasaba PG-Video-LLaVA, LMM yambere ifite ubushobozi bwo guhagarika pigiselivel, guhuza ibimenyetso byamajwi mukuyihindura mumyandiko kugirango tunonosore amashusho. Urwego rwacu rukoresha off-the-shelf tracker hamwe na module yubutaka bushya, igushoboza gutandukanya ibintu muri videwo ikurikira amabwiriza yumukoresha. Turasuzuma PG-Video-LLaVA dukoresheje amashusho ashingiye kuri videwo yerekana kandi asubiza ibibazo kandi tugashyiraho ibipimo bishya byashizweho kugirango bipime ibintu byihuse bishingiye ku mashusho. Byongeye kandi, turasaba ko hakoreshwa Vicuna hejuru ya GPT-3.5, nkuko byakoreshejwe muri VideoChatGPT, kugirango hasuzumwe ibipimo byerekana ibiganiro bishingiye kuri videwo, byemeza ko ibisubizo byabyara umusaruro bireba imiterere ya GPT-3.5. Urwego rwacu rushingiye kumashusho ya LLaVA ishingiye kuri SoTA kandi ikagura ibyiza byayo kumurongo wa videwo, itanga inyungu zitanga ibiganiro kubiganiro bishingiye kuri videwo hamwe ninshingano zishingiye.

1. Intangiriro

Imbaraga ziheruka kuri Moderi nini ya Multimodal (LMMs), iyobowe na GPT-4V [25], yemerera ibiganiro birambuye kubyerekeye amashusho ariko mubisanzwe ntabwo bihuza neza na videwo. Ubunini bwamakuru yerekana amashusho arenze kure ubundi buryo bitewe nubunini bwayo ku mbuga nkoranyambaga na interineti. Byongeye kandi, kwagura LMM kuri videwo biragoye kubera imbaraga zabo zigoye hamwe nigihe kirekire cyigihe gito kigomba kumvikana neza. Nubwo vuba aha


Igishushanyo 1. Amashusho yerekana amashusho kuri videwo ya VidSTG [48] (hejuru) na HC-STVG [34] (munsi) imibare yamakuru. PGVideo LLaVA irashobora gutanga ibisubizo byanditse hamwe nibintu byerekanwe mubiri muri videwo (racket ya tennis numuntu biri murwego rwo hejuru no hepfo).


uburyo bwo gufata amashusho-LMM nka VideoChat [15], Video-LLaMA [45], na Video-ChatGPT [22] byagaragaje ubushobozi mu gusobanukirwa amashusho no mu biganiro, ntibifite ikintu cyingenzi cyerekana amashusho. Kubona amashusho muri videwo bigamije guhuza ibisubizo bya LMM kubintu runaka byinjira muri videwo. Gukemura iki cyuho, tumenyekanisha PG-Video-LLaVA, videwo yambere-LMM ishoboye gutandukanya ibintu bigaragara mubisubizo bya LMM. Iki gikorwa kiganisha ku kunanirwa kudakora kandi kwerekana gusobanukirwa byimbitse ibikubiye muri videwo.


Muri PG-Video-LLaVA, dukemura ibibazo bidasanzwe biterwa namakuru ya videwo. Icyitegererezo cyashizweho kugirango gikurikirane ibintu muri videwo ngufi ziguma zerekana kamera zihoraho, zifasha kubona neza neza amashusho yerekanwe. Uku gukurikirana guhuza ibice-by'agateganyo mu buryo butaziguye ku biganiro, byongerera icyitegererezo icyerekezo. Ikintu cyingenzi kiranga PG-VideoLLaVA nigishushanyo mbonera cyacyo, cyemerera guhuza byoroshye na modul zihari zisanzwe hamwe no guhinduka kugirango uhindure ibizaza mu buhanga bwa tekinoroji. Byongeye, PG-Video-LLaVA ikungahaza ubushobozi bwayo mugushyiramo amajwi. Irabigeraho ukoresheje amajwi ya videwo muburyo bwumvikana kuri LLM, ifite akamaro kanini mugihe amakuru yo kumva ari ngombwa mubiganiro. Uku kubamo kwagura imyumvire yicyitegererezo, bigatuma irushaho guhinduka mugusobanura ibiri muri videwo.


Ikigeretse kuri ibyo, iki gikorwa gitangiza uburyo bunoze bwo gusuzuma ibipimo ngenderwaho bishingiye kuri videwo, bishingiye ku buryo bwabanje [22] bwakoresheje ahanini icyitegererezo cya GPT-3.5-Turbo kugira ngo gisuzumwe. Urebye ko GPT-3.5-Turbo ishobora guhinduka mugihe icyo aricyo cyose kandi ikabura gukorera mu mucyo bitewe na kamere yayo ifunze, itanga imbogamizi muburyo bwo kwizerwa no kubyara. Kugira ngo iki kibazo gikemuke, turasaba ko hakoreshwa Vicuna, isoko ya LLM ifungura isoko. Ihinduka ntabwo ryongera imyororokere gusa ahubwo rinatezimbere mucyo mugikorwa cyo gusuzuma. Turasuzuma PG-Video-LLaVA dukoresheje ibipimo byanonosowe kandi tukerekana iterambere ryibonekeje kurugero rusanzwe rwerekana amashusho nka VideoChatGPT [22] na Video-LLaMA [45] mubiganiro bidafite ishingiro, tugera kubikorwa bigezweho (SoTA).


Umusanzu w'ingenzi w'iki gikorwa ni:


• Turasaba PG-Video-LLaVA, LMM yambere ishingiye kuri videwo ifite ubushobozi bwa pigiseli yo kurwego rwo hasi, igaragaramo igishushanyo mbonera cyerekana uburyo bworoshye bwo guhinduka.


• Mugushyiramo amajwi, PG-Video-LLaVA yongerera cyane imyumvire yibirimo amashusho, bigatuma irushaho kuba nziza kandi ikwiranye na ssenariyo aho ibimenyetso byamajwi ari ngombwa mugusobanukirwa amashusho (urugero, ibiganiro n'ibiganiro, amashusho yamakuru, nibindi) .


• Turashiraho ibipimo ngenderwaho byuzuye byerekana amashusho yerekana ibiganiro. Ibipimo byacu bifashisha isoko-Vicuna LLM kugirango tumenye neza imyororokere no gukorera mu mucyo. Turasaba kandi ibipimo byo gusuzuma ubushobozi bwo gushingira kumashusho yerekana ibiganiro.


Uru rupapuro ruraboneka kuri arxiv munsi ya CC BY 4.0 DEED.


L O A D I N G
. . . comments & more!

About Author

Auto Encoder: How to Ignore the Signal Noise HackerNoon profile picture
Auto Encoder: How to Ignore the Signal Noise@autoencoder
Research & publications on Auto Encoders, revolutionizing data compression and feature learning techniques.

HANG TAGS

IYI ngingo YATANZWE MU...