paint-brush
Synthetic Data muFace Recognition: A Game Changer kana Just Hype?by@chinmayjog
409 kuverenga
409 kuverenga

Synthetic Data muFace Recognition: A Game Changer kana Just Hype?

by Chinmay Jog8m2024/12/07
Read on Terminal Reader

Kurebesa; Kuverenga

Kuzivikanwa kwechiso (FR) tekinoroji yakafambira mberi zvakanyanya mumakore achangopfuura. Ichi chinyorwa chinoongorora kugona kwekushandisa data rekugadzira kudzidzisa maFR modhi.
featured image - Synthetic Data muFace Recognition: A Game Changer kana Just Hype?
Chinmay Jog HackerNoon profile picture
0-item

Tekinoroji yeFace Recognition (FR) yakafambira mberi zvakanyanya mumakore achangopfuura, ichitungamirwa nekudiwa kwekuchengetedzwa kwakawedzerwa uye kuwanda kwezvikumbiro mumaindasitiri akadai semidziyo yevatengi yakaderera, kukwira ndege, kutonga kwemuganhu, uye mabasa emari. Pamwoyo weanoshanda FR masisitimu pane chinhu chakakosha-data. Maseti makuru-akakura akakosha pakudzidzisa mamodheru aya kunyatsoona uye kuona zviso mumamiriro akasiyana.


Kuti FR ive yakavimbika, mamodheru anofanirwa kuburitswa kune akasiyana data anosanganisira kusiyana kwehuwandu hwevanhu, mwenje, nharaunda, mataurirwo, uye occlusions. Izvi zvinovimbisa kusimba uye kururamisira mukutumirwa, kuderedza njodzi yekusarura kana kukundikana paunosangana nemamiriro asina kujairika.


Synthetic datasets akagadzirwa achishandisa genAI matekiniki anogona kubatsira, asi mumamiriro avo azvino, haakwanise kutsiva zvizere-chaiyo-nyika dataset. Ichi chinyorwa chinoongorora zvakanakira uye zvakaipira zve synthetic FR dataset uye inoongorora mamiriro azvino egenAI ekuzivikanwa kwechiso.


Face Data Kuwana: Real World vs Synthetic

LFW , Cfp-fp , Agedb-30 , Ca-lfw , uye Cp-lfw mamwe emashoko anonyanya kushandiswa anoshandiswa pakuongorora maitiro ekuongorora emhando dzeFR. Tafura 1. inoratidza mashandiro ekuongorora kweML modhi yakadzidziswa nealgorithm yakafanana, pamaseti ezviso zvepasirese ehukuru hwakasiyana.


Inogona kuonekwa kuti saizi yedataset inobata sei maitiro emuenzaniso uye chiyero panofanira kutora data kuti uwane yakasimba FR mhando. Verification inoreva kuti modhi inopihwa peya yemifananidzo yechiso, uye inofanotaura kana kumeso kuri kwemunhu mumwe chete kana vanhu vaviri vakasiyana. Iyo yechokwadi muzana muzana yefungidziro yemuenzaniso inoshumwa.

Dataset
Zita

ML
Model

# Kudzidzira
Images

LFW

cfp-fp

Ageb-30

Ca-LFW

Cp-LFW

Casia webface

resnet-50

500k

99.55

95.31

94.55

93.78

89.95

webface 12m

resnet-50

12 mamiriyoni

99.80

99.20

98.10

--

--

glint360k

resnet-50

17 mamiriyoni

99.83

99.33

98.55

96.21

94.78

Tafura 1. Verification accuraces (%) pane zvishanu zvakasiyana FR mabhenji. Pakuenzanisa kwakaringana, mibairo yese inowanikwa kubva kumabasa akadhindwa epakutanga uchishandisa iyo yakafanana ML modhi uye algorithm.


Kuwedzera kune yakakura-yakakura yekudzidzisa dhata, zvakakosha zvakaenzana kuti dhatabheti rine zvisaruro zvishoma. Izvo zvakakosha kuti utange wanzwisisa zvinorehwa nekusarura muchimiro cheFR. Kazhinji, yeModhi yeKudzidza yemuchina, kusarura kunoreva iyo modhi isingaite zvakafanana mumhando dzakasiyana dze data rekuisa. A FR modhi inogona kurerekera munzira dzakasiyana.


Muenzaniso wakajairika ndewekurerekera kwerudzi, uko modhi yeFR inoita zvisina kunaka kana ichiratidzwa nezviso zverimwe dzinza.


Nekudaro, iyi haisiriyo chete kusarura kunoda kuverengerwa kuti uwane akavimbika maFR modhi. Rusaruro rwezera, rusaruro rwevakadzi, uye rusarura kwezvakatipoteredza (kufukidza kumeso, bvudzi rechiso, nezvimwewo) mimwe mienzaniso yekuti muenzaniso weFR ungaratidza sei rusaruro. Kurerekera uku kunogona kudzikiswa nekuunganidza uye kusanganisira vanomiririra samples mune dataset inoshandiswa kudzidzisa iyo FR modhi.


Kutora mapikicha evanhu vemarudzi akasiyana, vakaparadzana nemakore gumi kusvika gumi nemashanu, kana kuti mapikicha emunhu akasiyana-siyana, mumamiriro ezvinhu emwenje akasiyana-siyana, ane zviso zvakasiyana-siyana zvinogona kuratidza kuva basa rakaoma.


Uye zvakare, kuunganidza chaiyo-yepasi data yeFR kunopa mamwe akawanda matambudziko. Kuwana data rakakura rakasiyana-siyana kubva pasirese kunodhura. Kunze kwemitengo uye kushomeka kwehunyanzvi, kutora data kuri kuramba kuchioma nekuda kwehunhu uye kuvanzika kunetseka.


Biometric data inotongwa nemitemo yakaita seEurope's GDPR ( General Data Protection Regulation ), California's CCPA ( California Consumer Privacy Act ), uye BIPA yeIllionis ( Biometric Information Privacy Act ), kungotaura zvishoma.


Mitemo iyi inotonga kutorwa uye kuchengetwa kwedata rebiometric yevagari vakasiyana, izvo zvinowedzera kuomarara mukutorwa kwakakura kwebiometric data. Tichifunga nezvekuwedzera kuri kuda kwemafomu eFR, izvozvi inguva yakakosha yekuongorora kushanda kwedata rekugadzira, kuongorora mabhenefiti ayo uye zvipingamupinyi pakugadzira masisitimu anoyemurika, ane hunhu, uye anotevedza zviri pamutemo.


Aya matambudziko, pamwe nekusimuka kweGenerative AI (genAI) akakurudzira huwandu hukuru hwekutsvagisa kugadzira data rekugadzira kutsiva chaiyo-yepasirese inoziva biometric data. Usati wanyura mune yazvino mamiriro ekugadzira data muFR, zvakakosha kuti unzwisise zvinorehwa negenAI.


Nemashoko akareruka, genAI imhando yehungwaru hwekugadzira hunogona kugadzira zvinyorwa zvitsva, senge zvinyorwa, mifananidzo, kana mimhanzi, zvichibva pane data rayakadzidziswa, uye data rakagadzirwa rinonzi 'synthetic data'.


GenAI yekuzivikanwa kwechiso inonyanya kukwezva nekuda kwezvikonzero zvakawanda. Kunyanya kukosha, ma dataset ekugadzira anogadzirwa neAI, zvichireva kuti vaongorori, mainjiniya, uye vanofarira vanogona kuvaka (uye kudzidzisa pa) dhatabheti pasina kuita manyorero ekutora mifananidzo kubva kuvanhu chaivo.


Zvizhinji zvezvinodikanwa zvekuteerera mukuunganidza uye kushandiswa kweiyo chaiyo dataseti yemifananidzo haipo kune yekugadzira data, uye, nedzidziso, kusarura kunogona kukonzera algorithm yakadzidziswa pane chaiyo yemifananidzo data inogona kuverengerwa zvirinani nesynthetic data.


Zvakadaro, synthetic face datasets haisati yave bullet yesirivha. Zvikamu zvinotevera muchinyorwa chino zvinofukidza apo synthetic datasets inopenya, painodonha, uye mamiriro azvino egenAI ekuzivikanwa kumeso.


Zvakanakira Synthetic Data muFace Recognition

Synthetic data inopa zvakati wandei zvakanakira izvo zvinoita kuti ive chishandiso chakakosha mukuvandudza tekinoroji yekuziva kumeso. Imwe yemabhenefiti ekutanga ndeyekuti synthetic datasets haidi kuwana mifananidzo yevanhu chaivo. Synthetic data haishandise zvakananga data remunhu, saka, zvinodiwa zvekutevedzera zvakavanzika semvumo yekushandisa uye kodzero dzekukanganwa hazvina kusimudzwa.


Kugadzira data yekugadzira inogona zvakare kuve inodhura-inoshanda pane kuunganidza uye kuzivisa huwandu hukuru hwe data renyika chaiyo, iyo, pamusoro penguva uye zviwanikwa zvakashandiswa kuona kuti dhatabheti rakadaro rinoenderana nemutemo uye nehunhu, ibhuku, rinotora nguva, uye nzira inodhura. Synthetic data inobvumira kugadzirwa kwenzvimbo dzakadzorwa uko machinjiro chaiwo anogona kushandiswa, achibatsira mukuyedzwa uye kugadzirisa zvakanaka kwemamodhi ekuzivikanwa kumeso.


Uyezve, data yekugadzira inoita kuti zvive nyore kugadzira uye kuwana dhatabhesi hombe, kunyanya mumamiriro ezvinhu apo data renyika chaiyo iri kushomeka, yakaoma kuunganidza, kana uko zvinodiwa nemutemo uye kufunga kwetsika kunoita kuti kuunganidza kwakadaro kusagoneke. Nzira dzeGenAI dzinogona kushandiswawo kuwedzera dhatabheti iripo yepasirese, kuzadza mapeji kuderedza kusarura; demographic kana neimwe nzira.


Semuyenzaniso, mazhinji akaburitswa pachena makuru-maseti ezviso anosanganisira kunyanya kuzivikanwa kweCaucasian, izvo zvinokonzeresa kusarurana kwevanhu mumhando dzeML dzakadzidziswa pane zvakadaro data. Izvi zvinogona kugadziriswa nyore nyore nesynthetic dataset.


Ikozvino Kugumira kweSynthetic Data muFace Recognition

Kunzvimbo yemufananidzo, Generative Adversarial Networks (GANs) ndeimwe yemhando dzakakurumbira dzinoshandiswa kugadzira data. Nvidia's Stylegan ,uye Stylegan2 vakaita zvishamiso mukugadzira mifananidzo yezviso zvekugadzira izvo zvisinga zivikanwe nezviso chaizvo. Vatsvagiri veMicrosoft Digiface-1m , Kim et al.'s DiscoGAN Tencents' Synface , uye Michigan State University's DCFace pakati pevamwe vafambira mberi zvakanyanya mukugadzira dhatabheti rekugadzira kuti rizivikanwe kumeso uye vakaratidza mhedzisiro yakanaka pane chaiyo-nyika data.


Nekudaro, ese aya matekiniki ane zvisingakwanisi maererano nemutengo, nguva, huwandu hweakasiyana hunhu hunogona kugadzirwa, uye kuita kuri kuita. kwete kusvika pachiyero ine mamodheru akadzidziswa pane chaiyo-chiso dataset.


Sezvineiwo, dhatabheti rekugadzira rine zviso “zvinotaridzika chaizvo”, uye rinodzora hunhu hwakasiyana-siyana hwedzinza, murume kana mukadzi, chimiro, kuvheneka, uye kusiyanisa kwemashure kunofanirwa kukunda rechokwadi “musango” dataset. Saka nei kuita kwemamodheru akadzidziswa pane aya dataset pasina padhuze nemhando dzakadzidziswa pane chaiyo-nyika datasets ehukuru hwakafanana? Mhinduro kumubvunzo uyu iri mune zvisingadzoreki maficha eiyo chaiyo-yepasi data pachayo. Hukuru hwekusiyana kweiyo data chaiyo hauna kutorwa zvizere nechero tsvakiridzo yakabudiswa kusvika zvino.


Kuva nenhamba yakaganhurirwa yakafanana yemhando dzese dzekugadzira mudhataset zvinokuvadza maitiro emodhi. Kuedza kuwedzera misiyano kunoguma nekuzivikanwa kwechiso zvakare kuchinja, izvo zvinounza ruzha mu data, zvakare kukuvadza maitiro emuenzaniso.


The Current State of Synthetic Face Datasets

Tafura 2. inoronga mashandiro eiyo FR modhi yekuvaka (Resnet 50) yakadzidziswa pane akasiyana madheti ekugadzira. Kuitwa kwekutanga kwemuenzaniso wakadzidziswa padhatabheti rechokwadi rinenge saizi imwe chete yakanyorwawo. Iyo tafura inonyora zvakare gore rekuburitswa kune yega yega data synthetic.


Dataset Name

ML Model

# Kudzidzisa mifananidzo

LFW

cfp-fp

Ageb-30

Ca-LFW

Cp-LFW

Casia-webface (nyika chaiyo)

resnet-50

500k

99.55

95.31

94.55

93.78

89.95

Synface (2021)

resnet-50

500k

91.93

75.03

61.63

74.73

70.43

Digiface-1m (2022)

resnet-50

500k

95.40

87.40

76.97

78.62

78.87

DCFace (2023)

resnet-50

500k

98.55

85.33

89.70

91.60

82.62

Tafura 2. Maonero echokwadi (%) pane anoshandiswa zvakanyanya FR evaluation datasets anowanikwa nemamodheru akadzidziswa pane zvekugadzira data. Mutsara wekutanga ndiwo maitiro ekutanga anowanikwa nemuenzaniso pane yakafanana-saizi chaiyo-yenyika data. Mhedzisiro yese inotorwa kubva kumabasa akadhindwa ekutanga uchishandisa iyo yakafanana ML modhi uye algorithm.


Sezvinoonekwa muTable 2, mamodheru akadzidziswa pane zvekugadzira data haaite pamwe nemamodheru akadzidziswa pane chaiyo-nyika data. Nepo gaka rekuita pane "rakapfava" uye diki dhata seti 'LFW' riri diki, gaka racho rinonyanya kuoneka pane mamwe ma datasets akasimba seCFP-FP neAgedb-30, ane masamples emaonero ezviso, uye zviso zvakafanana. munhu anodarika makore akawanda zvichiteerana.


Sezvineiwo, kuita kwemamodheru akadzidziswa pane data rekugadzira kwakavandudzika mumakore achangopfuura.


Kusimbisa kushanda kwedata rekugadzira kunoramba kuri dambudziko. Kuve nechokwadi chekuti data rekugadzira rinonyatsomiririra mamiriro epasirese kwakakosha pakuvaka masisitimu akavimbika ekuzivikanwa kumeso. Nekudaro, iyo nzira yekusimbisa yakaoma uye inoda nzira dzakasimba kuti ive nechokwadi chemhando yedata uye kushanda.


Mhinduro inogoneka ndeyekugadzira genAI modhi iyo inogona zvakare kutevedzera izvi maficha mune yekugadzira data. A generative modhi inogona kudzidziswa kukunda zvipimo izvi nekuidzidzisa pane chaiyo-yepasi rese dataset ine misiyano yakawanda yehunhu hwechiso, mhando yemufananidzo, uye kumashure kusiyanisa. Zvine musoro kubvunza kuti data rakadaro rinogona kubva kupi. Kutorwa kwedata kwakadaro kwaizotarisana nezvisungo zvese zvataurwa, zvinoti zvetsika, zvemutemo, uye zvirambidzo.


Nekudaro, izvi zvinodzikiswa nediki dataset saizi inodiwa kudzidzisa generative FR modhi. Nvidia's StyleGAN2 inogona kugadzira mifananidzo yezviso chaiyo, yakadzidziswa chete 70,000 mifananidzo , uye haina ruzivo nezvekuzivikanwa kwezviso mu dataset. Iyi mifananidzo haina kuunganidzwa ine FR mupfungwa, uye kana modhi yacho haina kudzidziswa chinangwa ichocho, ndosaka mamodheru akadzidziswa pamadheti ekugadzira eFR akagadzirwa neStyleGAN2 asingaenderane nekuita kwepasirese chaiko.


Mhedziso

Synthetic data ine chivimbiso chekusimudzira tekinoroji yekuziva kumeso, asi zvakakosha kuti uzive zvazvinogumira. Nepo mabhenefiti egenAI achisanganisira huchokwadi hwemasamples ekugadzira, uye nyore kugadzirisa mifananidzo kuti iwedzere kana kubvisa-inonatsiridza, sechitarisiko chechiso, musoro, bvudzi rechiso, nezvimwewo. musiyano wekuita pakati pemamodeli akadzidziswa pane chaiyo maringe nedata rekugadzira. zvakakosha.


Synthetic data haisati yatsiva yemaseti e data akanyatsocheneswa. Kunyange zvakadaro, mhando ye data yekugadzira kumeso iri kubata kusvika kumhando yedata renyika chaiyo sezvo nzira dzekugadzira data dziri kuvandudza, uye nekudaro, tinogona kufungidzira kuti munguva pfupi iri kutevera, data yekugadzira inogona kubvisa zvizere kudiwa kwekushandisa chaiyo. -World face data yeFR kudzidziswa.


Feature Image by Steph Meade