Kulokhu okuthunyelwe sizoqhubeka nokusebenza ngokuqagela isixhumanisi nedathasethi ye-Twitch. Sizogxila ekuqeqesheni imodeli ye-ML esuselwe ku-Graph Neural Networks (GNNs) nokuthuthukisa amapharamitha ayo. Njengamanje sesivele sinedatha yegrafu ecutshunguliwe futhi yalungiselelwa ukuqeqeshwa okuyimodeli. Izinyathelo zangaphambilini zichazwe Engxenyeni 3 - Ukucutshungulwa kwedatha, Ingxenye 2 - Ukukhipha idatha esuka ku-DB kanye Nengxenye 1 - Ilayisha idatha ku-DB. Funda ingxenye 1 ; ingxenye 2 ; kanye lapha. lapha lapha nengxenye 3 UKUKHETHA IGRAFI UHLOBO LWENETWORK: GCNs kanye nama-R-GCN Sizosebenzisa i-Graph Convolutional Neural Networks njengoba senza , futhi nakuba i-Neptune ML isebenzisa uhlaka olufanayo , imodeli eyisisekelo ihluke kancane. I-Neptune ML isekela womabili amagrafu olwazi (amagrafu afanayo anohlobo olulodwa lwenodi kanye nohlobo olulodwa lonqenqema) kanye namagrafu ahlukahlukene anamanodi amaningi kanye nezinhlobo zomphetho. Idathasethi esisebenza nayo inohlobo olulodwa lwenodi (umsebenzisi) kanye nohlobo olulodwa lonqenqema (ubungani). Nakuba i-Graph Convolutional Network (GCN) noma imodeli ye-Graph Sample and Aggregation (GraphSAGE) nayo izosebenza kulesi simo, i-Neptune ML ikhetha ngokuzenzakalelayo imodeli ye-Relational Graph Convolutional Network (R-GCN) yamadathasethi anezakhiwo zenodi ezingase zihluke kunodi kuya endaweni, njengoba kuchazwe . Ngokuvamile, ama-R-GCN adinga ukubala okwengeziwe ukuze aqeqeshe ngenxa yenani elikhulayo lamapharamitha adingekayo ukuze kuphathwe izinhlobo eziningi zamanodi namaphethelo. kokuthunyelwe kwesibikezelo sesixhumanisi sendawo lwe-DGL.ai lapha AMA-HYPERPARAMETERS OKUQEQESHA AMAmodeli Phakathi nesigaba sokucubungula idatha (esikuchaze kokuthunyelwe kwangaphambilini kwe-TODO LINK), i-Neptune ML idale ifayela eliqanjwe ngokuthi . Iqukethe uhlobo lwemodeli (R-GCN), uhlobo lomsebenzi (ukubikezela kwesixhumanisi), imethrikhi yokuhlola kanye nemvamisa, kanye nezinhlu ezingu-4 zamapharamitha: eyodwa enamapharamitha angashintshiwe angashintshwa phakathi nokuqeqeshwa, kanye nezinhlu ezingu-3 zamapharamitha okufanele athuthukiswe, anobubanzi namanani azenzakalelayo. Amapharamitha aqoqwe ngokubaluleka. Ukuthi amapharamitha asuka kuqembu ngalinye ashuniwe yini kunqunywa ngokusekelwe enanini lemisebenzi yokushuna etholakalayo: imingcele yesigaba sokuqala ihlezi ishunwa, imingcele yesigaba sesi-2 iyashunwa uma inombolo yemisebenzi etholakalayo ingu-> 10, futhi imingcele yesigaba sesi-3 ishunwa kuphela uma i> 50. Ifayela lethu le libukeka njenge-model-hpo-configu model-hpo-configuration.json model-hpo-configuration.json { "models": [ { "model": "rgcn", "task_type": "link_predict", "eval_metric": { "metric": "mrr", "global_ranking_metrics": true, "include_retrieval_metrics": false }, "eval_frequency": { "type": "evaluate_every_pct", "value": 0.05 }, "1-tier-param": [ { "param": "num-hidden", "range": [16, 128], "type": "int", "inc_strategy": "power2" }, { "param": "num-epochs", "range": [3, 100], "inc_strategy": "linear", "inc_val": 1, "type": "int", "edge_strategy": "perM" }, { "param": "lr", "range": [0.001, 0.01], "type": "float", "inc_strategy": "log" }, { "param": "num-negs", "range": [4, 32], "type": "int", "inc_strategy": "power2" } ], "2-tier-param": [ { "param": "dropout", "range": [0.0, 0.5], "inc_strategy": "linear", "type": "float", "default": 0.3 }, { "param": "layer-norm", "type": "bool", "default": true }, { "param": "regularization-coef", "range": [0.0001, 0.01], "type": "float", "inc_strategy": "log", "default": 0.001 } ], "3-tier-param": [ { "param": "batch-size", "range": [128, 512], "inc_strategy": "power2", "type": "int", "default": 256 }, { "param": "sparse-lr", "range": [0.001, 0.01], "inc_strategy": "log", "type": "float", "default": 0.001 }, { "param": "fanout", "type": "int", "options": [[10, 30], [15, 30], [15, 30]], "default": [10, 15, 15] }, { "param": "num-layer", "range": [1, 3], "inc_strategy": "linear", "inc_val": 1, "type": "int", "default": 2 }, { "param": "num-bases", "range": [0, 8], "inc_strategy": "linear", "inc_val": 2, "type": "int", "default": 0 } ], "fixed-param": [ { "param": "neg-share", "type": "bool", "default": true }, { "param": "use-self-loop", "type": "bool", "default": true }, { "param": "low-mem", "type": "bool", "default": true }, { "param": "enable-early-stop", "type": "bool", "default": true }, { "param": "window-for-early-stop", "type": "bool", "default": 3 }, { "param": "concat-node-embed", "type": "bool", "default": true }, { "param": "per-feat-name-embed", "type": "bool", "default": true }, { "param": "use-edge-features", "type": "bool", "default": false }, { "param": "edge-num-hidden", "type": "int", "default": 16 }, { "param": "weighted-link-prediction", "type": "bool", "default": false }, { "param": "link-prediction-remove-targets", "type": "bool", "default": false }, { "param": "l2norm", "type": "float", "default": 0 } ] } ] } nohlobo amapharamitha asethwe phakathi nezigaba zokuthekelisa nokucubungula idatha, futhi akufanele kushintshwe lapha. Imodeli lomsebenzi nayo ikhethwe ngokuzenzakalelayo. ikala izinga elimaphakathi lesixhumanisi esilungile emiphumeleni ebikezelwe, nge . Imethrikhi yokuhlola I-Mean reciprocal rank (MRR) -MRR ephezulu ebonisa ukusebenza okungcono isethwe ku-5% wenqubekelaphambili yokuqeqeshwa. Isibonelo, uma sinezinkathi eziyi-100, ukuhlola kuzokwenziwa njalo kuma-epoch angu-5. Imvamisa yokuhlola Ake sibuyekeze amanye ama-hyperparameter azocushwa: : Izinga lokufunda lingelinye lamapharamitha anomthelela kakhulu kunoma iyiphi imodeli yokuqeqeshwa. Izinga lokufunda eliphansi lingase liholele ekuhlanganeni kancane kodwa kube nokusebenza okungcono kakhulu, kuyilapho izinga lokufunda eliphezulu lingasheshisa ukuqeqeshwa kodwa lingase liphuthelwe yizixazululo ezifanele. lr : Ipharamitha efihliwe i-num ibhekisela enanini lamayunithi afihliwe (ama-neurons) kusendlalelo ngasinye senethiwekhi ye-neural ye-R-GCN, ikakhulukazi ezendlalelo ezifihliwe. Inani elikhulu lamayunithi afihliwe likhulisa amandla emodeli okufunda amaphethini ayinkimbinkimbi nobudlelwano kusukela kudatha, okungathuthukisa ukunemba kokuqagela, kodwa kungase futhi kuholele ekufakeni ngokweqile uma imodeli iba yinkimbinkimbi kakhulu kudathasethi. num-hidden : Lokhu kuchaza ukuthi imodeli iqeqeshelwa isikhathi esingakanani. Izinkathi ezengeziwe zivumela imodeli ukuthi ifunde okwengeziwe kudatha kodwa ingase inyuse ingcuphe yokugcwalisa ngokweqile. num-epochs : Usayizi wenqwaba uthinta ukusetshenziswa kwememori nokuzinza kokuhlangana. Usayizi wenqwaba omncane ungenza imodeli ibe bucayi kudatha, kuyilapho usayizi wenqwaba omkhulu ungase uthuthukise isivinini sokuqeqesha. i-batch-size : Ukusampula okungalungile kuthinta indlela imodeli efunda ngayo ukuhlukanisa izixhumanisi zangempela kwamanga. Inombolo ephezulu yamasampuli angalungile ingase ithuthukise ikhwalithi yezibikezelo kodwa inyuse izindleko zokubala. num-negs : Ukuyeka kusiza ukuvimbela ukugcwala ngokweqile ngokweqa ngokungahleliwe amanye ama-neurons ngesikhathi sokuqeqeshwa. Izinga eliphezulu lokuyeka isikolo lingase linciphise ukusebenzisa ngokweqile kodwa lingenza ukufunda kube nzima kumodeli. ukuyeka : Ukwenziwa njalo okuhloswe ukuvimbela imodeli ukuthi ingagcwali ngokweqile. i-regularization-coef Ungashintsha amanani azenzakalelayo, ububanzi kanye nosayizi wesinyathelo kupharamitha ngayinye. Uhlu oluphelele lwamapharamitha lungatholakala . lapha Ngemva kokushintsha amapharamitha, vele ubeke esikhundleni sokuqala ifayela ku-S3. model-hpo-configuration.json IAM INDIMA YOKUQEQESHWA OKUYIMFANELO KANYE NE-HPO Njengokucubungula idatha okuchazwe Engxenyeni 3 yalo mhlahlandlela, ukuqeqeshwa okuyimodeli kudinga izindima ze-IAM ezingu-2: indima ye-Neptune enikeza ukufinyelela kwe-Neptune ku-SageMaker ne-S3, kanye nendima yokusayinda ye-Sagemaker esetshenziswa i-SageMaker ngenkathi iqhuba umsebenzi wokucubungula idatha futhi iyivumela ukuthi ifinyelele i-S3. Lezi zindima kufanele zibe nezinqubomgomo zokwethembana ezivumela izinsizakalo ze-Neptune ne-SageMaker ukuthi zizithathe: { "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "sagemaker.amazonaws.com" }, "Action": "sts:AssumeRole" }, { "Sid": "", "Effect": "Allow", "Principal": { "Service": "rds.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } Ngemva kokudala izindima nokubuyekeza izinqubomgomo zabo zokuthembela, siyengeza kuqoqo le-Neptune (Neptune -> Izizindalwazi -> YOUR_NEPTUNE_CLUSTER_ID -> Ukuxhumana Nokuphepha -> Izindima ze-IAM -> Engeza indima). UKUQALA UKUQEQESHWA OKUMBONELO KANYE NE-HPO ISEBENZISA I-NEPTUNE ML API Manje sesilungele ukuqala ukuqeqeshwa kwamamodeli. Ukuze senze lokho, sidinga ukuthumela isicelo ku-HTTP API yeqoqo le-Neptune ukusuka ngaphakathi kwe-VPC lapho iqoqo litholakala khona. Sizosebenzisa i-curl kusibonelo se-EC2: curl -XPOST https://(YOUR_NEPTUNE_ENDPOINT):8182/ml/modeltraining \ -H 'Content-Type: application/json' \ -d '{ "dataProcessingJobId" : "ID_OF_YOUR_DATAPROCESSING_JOB", "trainModelS3Location" : "s3://OUTPUT_BUCKET/model-artifacts/...", "neptuneIamRoleArn": "arn:aws:iam::123456789012:role/NeptuneMLModelTrainingNeptuneRole", "sagemakerIamRoleArn": "arn:aws:iam::123456789012:role/NeptuneMLModelTrainingSagemakerRole" }' Kudingeka lezi zinhlaka kuphela: - i-id yomsebenzi izosetshenziswa ukuthola indawo yedatha ecutshunguliwe ku-S3) idathaProcessingJobId - indawo ephumayo yezinto zobuciko (izisindo zemodeli) trainModelS3Location (lezi zindima kufanele zengezwe kuqoqo le-Neptune DB) Izindima ze-Neptune ne-SageMaker Kukhona nepharamitha ye esetha inani lemisebenzi yokuqeqeshwa ukuze isebenze namasethi ahlukene wamapharamitha. Ngokuzenzakalelayo, yi-2, kodwa ukuze uthole imodeli enembile. -maxHPONumberOfTrainingJobs i-AWS incoma ukusebenzisa okungenani imisebenzi eyi-10 Kunamapharamitha amaningi ongawakhetha futhi: isibonelo, singakhetha mathupha uhlobo lwesibonelo lwe-EC2 oluzosetshenziselwa ukuqeqeshwa okuyimodeli nge futhi simise usayizi wayo wevolumu yesitoreji nge . Uhlu oluphelele lwamapharamitha lungatholakala . -trainingInstanceType -trainingInstanceVolumeSizeInGB lapha Iqoqo liphendula nge-JSON equkethe i-ID yomsebenzi wokucubungula idatha esisanda kuwudala: {"id":"d584f5bc-d90e-4957-be01-523e07a7562e"} Singayisebenzisa ukuze sithole isimo somsebenzi wokuqeqeshwa oyimodeli ngalo myalo (sebenzisa i efanayo nesicelo sangaphambilini): -neptuneIamRoleArn curl https://YOUR_NEPTUNE_CLUSTER_ENDPOINT:8182/ml/modeltraining/YOUR_JOB_ID?neptuneIamRoleArn='arn:aws:iam::123456789012:role/NeptuneMLModelTrainingNeptuneRole' Uma iphendula ngento efana nalena, { "processingJob": { "name": "PROCESSING_JOB_NAME", "arn": "arn:aws:sagemaker:us-east-1:123456789012:processing-job/YOUR_PROCESSING_JOB_NAME", "status": "Completed", "outputLocation": "s3://OUTPUT_BUCKET/model-artifacts/PROCESSING_JOB_NAME/autotrainer-output" }, "hpoJob": { "name": "HPO_JOB_NAME", "arn": "arn:aws:sagemaker:us-east-1:123456789012:hyper-parameter-tuning-job/HPO_JOB_NAME", "status": "Completed" }, "mlModels": [ { "name": "MODEL_NAME-cpu", "arn": "arn:aws:sagemaker:us-east-1:123456789012:model/MODEL_NAME-cpu" } ], "id": "d584f5bc-d90e-4957-be01-523e07a7562e", "status": "Completed" } singabheka izingodo zokuqeqesha kanye nezinto zobuciko endaweni yebhakede le-S3. IBUYEKEZA IMIPHUMELA YOKUQEQESHA YOMBONELO Ukuqeqeshwa kwemodeli kuqediwe, ngakho-ke ake sihlole imiphumela kukhonsoli ye-AWS: SageMaker -> Training -> Training Jobs. Ukwenza kube lula, asizange sishintshe inombolo yemisebenzi ye-HPO ngenkathi siqala ukuqeqeshwa okuyimodeli, futhi inani elizenzakalelayo elingu-2 lasetshenziswa. Imisebenzi emi-2 yenziwa ngokufana. Uhlobo lwesibonelo lukhethwe ngokuzenzakalelayo: . ml.g4dn.2xlarge Umsebenzi wokuqala (lowo ono-'001' egameni lawo) waqedwa emizuzwini eyi-15, futhi owesibili ('002') wamiswa ngokuzenzakalelayo, njengoba i-SageMaker isekela ukumisa kusenesikhathi uma amamethrikhi okuqeqesha engathuthuki isikhashana: Ake siqhathanise ama-hyperparameter ayesetshenziswa kule misebenzi: Amapharamitha angu-3 kuphela anamanani ahlukene: . Imodeli yesibili (eqeqeshwe ngoJobe 2) yayinezinga eliphezulu lokufunda kuyilapho inamandla amancane okuthwebula amaphethini ayinkimbinkimbi (ngoba yayinama-neurons ambalwa), futhi yaqeqeshwa kumasampuli ambalwa angalungile. Lokho kuholele ekunembeni okuphansi kakhulu njengoba singabona ku- : num-hidden, num-negs kanye ne-lr Validation Mean Rank (115 vs 23) kanye ne-HITS@K indawo yezinga eyisilinganiso yesixhumanisi esilungile phakathi kwezibikezelo. . I-Mean Rank (MR) Amanani aphansi e-MR angcono ngoba abonisa ukuthi isixhumanisi esilungile, ngokwesilinganiso, sibalwa eduze kwaphezulu Amamethrikhi e-HITS@K akala ingxenye yezikhathi lapho isixhumanisi esilungile sivela khona kumiphumela ephezulu ebikezelwe u-K. AMA-ARTIFACTS AMAmodeli Uma imisebenzi yokuqeqesha isiqediwe, ama-artifact angamamodeli adalwa ebhakedeni le-S3 lokukhiphayo, kanye namafayela aqukethe izibalo zokuqeqesha namamethrikhi: Amamethrikhi namapharamitha kulawa mafayela e-JSON yilawo esiwashilo ekuqaleni. Uhlu lwemibhalo lwe-001 kuphela oluqukethe uhla lwemibhalo olungaphansi 'lwe-output' nefayela le-model.tar.gz, njengoba kuwukuphela komsebenzi we-HPO oqediwe. Ama-artifacts wokubikezela isixhumanisi aqukethe idatha yegrafu ye-DGL njengoba idingeka ukwenza izibikezelo zangempela, njengoba kuchazwe . lapha Lokho kuzoxoxwa ngakho engosini elandelayo neyokugcina yalolu chungechunge. Lawa mafayela azosetshenziselwa ukwakha iphoyinti lokugcina futhi enze izibikezelo zangempela zesixhumanisi.