Muri iyi nyandiko tuzakomeza gukora kubijyanye no guhanura guhuza hamwe na dataset ya Twitch: tuzohereza amakuru yishusho kuva muri Neptune DB cluster mu ndobo ya S3 dukoresheje ibikoresho bya neptune-byohereza hanze bitangwa na AWS. Tuzahitamo umwirondoro wa 'neptune_ml' mugihe dushizeho amakuru yohereza amakuru hanze kandi akamaro kazakora dosiye 'imyitozo-data-iboneza.json' tuzakoresha nyuma mumuyoboro. Amakuru yoherejwe hanze azaba yiteguye kubiranga kodegisi no gutunganya amakuru, niyo ntambwe ikurikira isabwa kugirango uhuze.





Soma igice cya 1 hano .

GRAPH DATA MURI NEPTUNE DB

Dutangirana namakuru yishusho dufite muri Neptune DB nyuma yo kohereza urutonde rwimisozi nimpande dukoresheje Neptune Bulk Loader API (nkuko byasobanuwe mugice cya 1 cyiki gitabo).





Ubuso bugereranya abakoresha. Impande zose zirimo ibintu bimwe, kandi vertex imwe isa nkiyi:

{<T.id: 1>: '153', <T.label: 4>: 'user', 'days': 1629, 'mature': True, 'views': 3615, 'partner': False}





Impande zose zifite ikirango kimwe ('gikurikira'), buri mpande ihuza abakoresha 2. Impera imwe isa nkiyi:

{<T.id: 1>: '0', <T.label: 4>: 'follows', <Direction.IN: 'IN'>: {<T.id: 1>: '255', <T.label: 4>: 'user'}, <Direction.OUT: 'OUT'>: {<T.id: 1>: '6194', <T.label: 4>: 'user'}}





Intego yacu nukwohereza amakuru hanze kugirango ikoreshwe mugice gikurikira cyumuyoboro wamakuru: gutunganya mbere na kodegisi.

GUKORESHA NEPTUNE-YOHEREJWE MU BIKORWA KURI EC2

Tuzakoresha neptune-yohereza ibicuruzwa bitangwa na AWS kugirango twohereze amakuru muri data base. Kwemerera ibikorwa byingirakamaro kuri DB, tuzayikoresha kurugero rwa EC2 imbere muri VPC aho cluster ya Neptune DB iri. Ibyingenzi bizabona amakuru avuye muri DB, abike mububiko bwaho (ingano ya EBS), hanyuma azohereza amakuru yoherejwe muri S3.





Nubwo AWS itanga igicu cyerekana Cloud ikoresha API yihariye muri VPC yawe kugirango yemererwe kohereza ibicuruzwa hanze hamwe nibisabwa na HTTP, ntabwo tuzibanda kuriyi nshuro. Nkuko intego yacu ari ukugaragaza uburyo umuyoboro wamakuru ukora (kandi ntabwo dushiraho API), tuzakoresha gusa konsole ya EC2 kugirango duhuze na neptune-yohereza ibicuruzwa hanze. Nukuvugako, ayo mabwiriza ya konsole arashobora guhindurwa hamwe na AWS Sisitemu Umuyobozi Ukoresha Amabwiriza nintambwe.





Reka dukore EC2 urugero tuzakoresha neptune-yohereza hanze. Kuri AMI, duhitamo Ubuntu 24.04 LTS. Tugomba kumenya neza ko cluster ya Neptune ishobora kugerwaho kuva kuri EC2, bityo rero tuzashiraho urugero muri VPC imwe aho cluster ya Neptune iri, kandi tuzashyiraho amatsinda yumutekano kugirango twemerere urujya n'uruza hagati yurugero na cluster. Tugomba kandi guhuza ingano ya EBS yubunini buhagije kugirango dukubiyemo amakuru yoherejwe hanze. Kuri dataset turimo gukora, ingano ya 8GB irahagije.





Mugihe urugero rutangiye, dukeneye gukora uruhare rwa IAM rwemerera kwandika kugera ku ndobo ya S3, ndetse nibikorwa bimwe na bimwe bya RDS, nkuko bigaragara muri politiki ikurikira. Mugihe itangazo ryambere rya politiki ari itegeko, irya kabiri rirakenewe gusa iyo wohereje amakuru muri cluster yakoronijwe. Kohereza amakuru mumasoko yakoronijwe azaganirwaho nyuma muriyi nyandiko.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "RequiredPart", "Effect": "Allow", "Action": [ "rds:ListTagsForResource", "rds:DescribeDBInstances", "rds:DescribeDBClusters" ], "Resource": "*" }, { "Sid": "OptionalPartOnlyRequiredForExportingFromClonedCluster", "Effect": "Allow", "Action": [ "rds:AddTagsToResource", "rds:DescribeDBClusters", "rds:DescribeDBInstances", "rds:ListTagsForResource", "rds:DescribeDBClusterParameters", "rds:DescribeDBParameters", "rds:ModifyDBParameterGroup", "rds:ModifyDBClusterParameterGroup", "rds:RestoreDBClusterToPointInTime", "rds:DeleteDBInstance", "rds:DeleteDBClusterParameterGroup", "rds:DeleteDBParameterGroup", "rds:DeleteDBCluster", "rds:CreateDBInstance", "rds:CreateDBClusterParameterGroup", "rds:CreateDBParameterGroup" ], "Resource": "*" } ] }

Urashobora kwemerera kugera kumurongo ugenewe gusa (aho kuba cluster yose) uhindura umurima 'Ibikoresho'.





Uruhare rugomba kandi kugira politiki yo kwizerana yemerera EC2 gufata inshingano:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }

Urugero rwa EC2 ninshingano ziteguye, tuzahuza uruhare kurugero.





Ibikurikira, dukeneye gushiraho neptune-yohereza ibicuruzwa kurugero. Kugirango ukore ibyo, tuzinjira murugero hanyuma dukoreshe aya mategeko kugirango ushyire JDK 8 hanyuma dukuremo akamaro:

sudo apt update -y sudo apt install -y openjdk-8-jdk curl -O https://s3.amazonaws.com/aws-neptune-customer-samples/neptune-export/bin/neptune-export.jar

Noneho ko tumaze gutegura urugero rwa EC2, indobo ya S3 igana, hanyuma tugahuza IAM uruhare rwemerera kwandika kugera ku ndobo ya S3 kurugero, dushobora gutangira kohereza amakuru hanze. Tuzakoresha iri tegeko kugirango dutangire inzira, dutange ibipimo bisabwa nkikintu cya JSON:

java -jar /home/ubuntu/neptune-export.jar nesvc \ --root-path /home/ubuntu/neptune-export \ --json '{ "command": "export-pg", "outputS3Path" : "s3://YOUR_TARGET_S3_BUCKET/neptune-export", "params": { "endpoint" : "YOUR_CLUSTER_ENDPOINT", "profile": "neptune_ml" } }'

Twakoresheje gusa ibipimo bisabwa hano ariko urashobora kwagura byoroshye config. Urashobora guhitamo igice cyibishushanyo wohereza hanze ukoresheje 'filteri' ibipimo: urashobora guhitamo imitwe, impande, nibiranga.





Niba wohereza amakuru muri data base nzima, urashobora gukoresha ibipimo bya ' cloneCluster ' na ' cloneClusterReplicaCount ' kugirango utume neptue-yohereza ibicuruzwa bifata ifoto yububiko, ukore cluster nshya ya Neptune uhereye kuri iyo shusho, ukoreshe ibisomwa byasomwe, hanyuma ubikoreshe kugirango wohereze amakuru. Mugukora ibyo, urashobora kwemeza neza ko base base itabangamiwe numutwaro winyongera uva mumakuru yohereza hanze.

Urutonde rwuzuye rwibipimo urashobora kubisanga hano ( https://docs.aws.amazon.com/neptune/latest/userguide/export-parameter.html ).

KUBONA DATA YOHEREJWE N'INTAMBWE ZIKURIKIRA

Iyo ibikorwa byo kohereza hanze birangiye, neptune-yohereza hanze icapa imibare imwe harimo nimero ya vertike nimpande:

Source: Nodes: 7126 Edges: 70648 Export: Nodes: 7126 Edges: 70648 Properties: 28504 Details: Nodes: user: 7126 |_ days {propertyCount=7126, minCardinality=1, maxCardinality=1, recordCount=7126, dataTypeCounts=[Integer:7126]} |_ mature {propertyCount=7126, minCardinality=1, maxCardinality=1, recordCount=7126, dataTypeCounts=[Boolean:7126]} |_ views {propertyCount=7126, minCardinality=1, maxCardinality=1, recordCount=7126, dataTypeCounts=[Integer:7126]} |_ partner {propertyCount=7126, minCardinality=1, maxCardinality=1, recordCount=7126, dataTypeCounts=[Boolean:7126]} Edges: (user)-follows-(user): 70648

Hanyuma hanyuma ikohereza amakuru yoherejwe muri S3.





Reka turebe dosiye zakozwe mu ntego ya S3 indobo:

































































Ububiko bwa 'node' na 'edge' burimo dosiye ya CSV ifite urutonde rwimpera nimpande zisa nkizo twakoresheje mugice cya 1 mugihe twohereje amakuru. Kubishushanyo binini, hariho dosiye nyinshi, ariko dataset yacu ni nto kandi hariho dosiye imwe gusa muri buri gitabo. Hariho kandi amahugurwa-data-iboneza.json dosiye tuzahindura kandi dukoreshe intambwe ikurikira yimikorere yacu.





Niba ukora inshuro imwe yohereza hanze, ubu ni byiza gusiba urugero rwa EC2 nubunini bwa EBS, kubera ko dosiye gusa ziri mu ndobo ya S3 izakoreshwa mu ntambwe ikurikira. Bitabaye ibyo, urashobora guhagarika urugero rwa EC2 kugirango wirinde kwishyurwa mugihe cyubusa (uzakomeza kwishyurwa ububiko bwa EBS keretse ubisibye).





Kuri ubu dufite ibishushanyo mbonera muri S3 muburyo bushobora gukoreshwa mu ntambwe ikurikiraho, kandi twiteguye gukora ibiranga kodegisi no gutunganya amakuru, bizaganirwaho mu nyandiko yacu itaha.