Eka poso leyi hi ta ya emahlweni hi tirha eka ku vhumbha ka swihlanganisi na dataset ya Twitch: hi ta rhumela data ya girafu ku suka eka xitluletsongo xa Neptune DB ku ya eka bakiti ra S3 hi ku tirhisa vukorhokeri bya neptune-export lebyi nyikiwaka hi AWS. Hi ta hlawula phurofayili ya 'neptune_ml' loko hi tumbuluxa ntirho wo rhumela ehandle data naswona vukorhokeri byi ta tumbuluxa fayili ya 'training-data-configuration.json' leyi hi nga ta yi tirhisa endzhaku eka phayiphi. Data leyi rhumeriweke ehandle yi ta va yi lunghekele ku khoda swihlawulekisi na ku lulamisiwa ka datha, leswi nga goza leri landzelaka leri lavekaka eka ku vhumbha ka vuhlanganisi.





DATA YA GRAPH EKA NEPTUNE DB

Hi sungula hi datha ya girafu leyi hi nga na yona eka Neptune DB endzhaku ko layicha minxaxamelo ya ti vertices na matlhelo hi ku tirhisa Neptune Bulk Loader API (hilaha swi hlamuseriweke hakona eka Xiphemu xa 1 xa nkongomiso lowu).





Ti vertices ti yimela vatirhisi. Ti vertices hinkwato ti na sete yin’we ya swihlawulekisi, naswona vertex yin’we yi languteka hi ndlela leyi:

{<T.id: 1>: '153', <T.label: 4>: 'user', 'days': 1629, 'mature': True, 'views': 3615, 'partner': False}





Matlhelo hinkwawo yana lebula yin’we (‘follows’), tlhelo rin’wana na rin’wana rihlanganisa vatirhisi va 2. Xiphemu xin’we xa le tlhelo xi languteka hi ndlela leyi:

{<T.id: 1>: '0', <T.label: 4>: 'follows', <Direction.IN: 'IN'>: {<T.id: 1>: '255', <T.label: 4>: 'user'}, <Direction.OUT: 'OUT'>: {<T.id: 1>: '6194', <T.label: 4>: 'user'}}





Xikongomelo xa hina i ku rhumela datha ehandle leswaku yi ta tirhisiwa eka xiphemu lexi landzelaka xa phayiphi ya hina ya datha: ku lulamisiwa ka le mahlweni na ku khoda ka swihlawulekisi.

KU FAMBISA NEPTUNE-EXPORT UTILITY EKA EC2

Hi ta tirhisa xitirhisiwa xa neptune-export lexi nyikiwaka hi AWS ku rhumela datha ku suka eka database. Ku pfumelela mfikelelo wa vukorhokeri eka DB, hi ta wu fambisa eka xikombiso xa EC2 endzeni ka VPC laha ku nga na xitluletsongo xa Neptune DB. Xitirhisiwa xi ta kuma datha ku suka eka DB, xi yi hlayisa eka vuhlayiselo bya laha kaya (vholumo ya EBS), ivi yi layicha datha leyi rhumeriweke ehandle eka S3.





Hambi leswi AWS yi nyikaka xifaniso xa Cloudformation lexi tirhisaka API ya phurayivhete endzeni ka VPC ya wena ku pfumelela endlelo ro rhumela ehandle ku sunguriwa hi xikombelo xa HTTP, a hi nge dziki eka sweswo nkarhi lowu. Tanihi leswi xikongomelo xa hina ku nga ku kombisa ndlela leyi phayiphi ya data yi tirhaka ha yona (naswona ku nga ri ku veka API), hi ta tirhisa ntsena console ya xikombiso xa EC2 ku tirhisana na vukorhokeri bya neptune-export. Hi ndlela leyi, swileriso sweswo swa console swi nga endliwa hi xiothomethi hi AWS Systems Manager Run Command na Step Functions.





A hi endleni xikombiso xa EC2 lexi hi nga ta fambisa neptune-export eka xona. Eka AMI, hi hlawula Ubuntu 24.04 LTS. Hi fanele ku tiyisisa leswaku xitluletsongo xa Neptune xa fikeleleka ku suka eka xikombiso xa EC2, kutani hi ta tumbuluxa xikombiso eka VPC leyi fanaka laha xitluletavuvabyi xa Neptune xi nga kona, naswona hi ta lulamisa mintlawa ya vuhlayiseki ku pfumelela thrafikhi ya netiweke exikarhi ka xikombiso na xitluletsongo. Hi fanele ku tlhela hi namarheta vholumo ya EBS ya sayizi leyi eneleke ku khoma datha leyi rhumeriweke ehandle. Eka dataset leyi hi tirhaka eka yona, volume ya 8GB yi ringanerile.





Loko xikombiso xi ri karhi xi sungula, hi fanele ku tumbuluxa xiave xa IAM lexi pfumelelaka mfikelelo wo tsala eka bakiti ra S3 leri yaka eka rona, na swiendlo swin’wana swa RDS, tanihilaha swi kombisiweke hakona eka pholisi leyi nga laha hansi. Loko xitatimende xo sungula xa pholisi xi boha, xa vumbirhi xi laveka ntsena loko u rhumela ehandle datha ku suka eka xitluletavuvabyi lexi cloned. Ku rhumela ehandle data kusuka eka ti cloned clusters swita xopaxopiwa endzhaku eka post leyi.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "RequiredPart", "Effect": "Allow", "Action": [ "rds:ListTagsForResource", "rds:DescribeDBInstances", "rds:DescribeDBClusters" ], "Resource": "*" }, { "Sid": "OptionalPartOnlyRequiredForExportingFromClonedCluster", "Effect": "Allow", "Action": [ "rds:AddTagsToResource", "rds:DescribeDBClusters", "rds:DescribeDBInstances", "rds:ListTagsForResource", "rds:DescribeDBClusterParameters", "rds:DescribeDBParameters", "rds:ModifyDBParameterGroup", "rds:ModifyDBClusterParameterGroup", "rds:RestoreDBClusterToPointInTime", "rds:DeleteDBInstance", "rds:DeleteDBClusterParameterGroup", "rds:DeleteDBParameterGroup", "rds:DeleteDBCluster", "rds:CreateDBInstance", "rds:CreateDBClusterParameterGroup", "rds:CreateDBParameterGroup" ], "Resource": "*" } ] }

U nga pfumelela mfikelelo eka ntsena xitluletsongo lexi kongomisiweke (ematshan’wini ya switluletavuvabyi hinkwaswo) hi ku hlela nsimu ya ‘Xitirhisiwa’.





Ntirho wu fanele ku tlhela wu va na pholisi ya vutshembeki leyi pfumelelaka EC2 ku teka xiave:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }

Loko xikombiso xa EC2 na xiave swi lunghekile, hi ta namarheta xiphemu eka xikombiso.





Endzhaku ka sweswo, hi fanele hi nghenisa xitirhisiwa xa neptune-export eka xikombiso. Ku endla sweswo, hi ta nghena eka xikombiso ivi hi tirhisa swileriso leswi ku nghenisa JDK 8 ni ku kopa xitirhisiwa:

sudo apt update -y sudo apt install -y openjdk-8-jdk curl -O https://s3.amazonaws.com/aws-neptune-customer-samples/neptune-export/bin/neptune-export.jar

Sweswi hi lunghiseleleke xikombiso xa EC2, bakiti ra S3 leri yaka eka rona, naswona hi namarhetile IAM xiave lexi pfumelelaka mfikelelo wo tsala eka bakiti ra S3 leri yaka eka rona eka xikombiso, hi nga sungula ku rhumela ehandle data. Hi ta tirhisa xileriso lexi ku sungula endlelo, hi nyika tipharamitha leti lavekaka tanihi nchumu wa JSON:

java -jar /home/ubuntu/neptune-export.jar nesvc \ --root-path /home/ubuntu/neptune-export \ --json '{ "command": "export-pg", "outputS3Path" : "s3://YOUR_TARGET_S3_BUCKET/neptune-export", "params": { "endpoint" : "YOUR_CLUSTER_ENDPOINT", "profile": "neptune_ml" } }'

Hi tirhise ntsena ti parameter leti lavekaka laha kambe u nga engetela config hi ku olova. U nga hlawula leswaku i xiphemu xihi xa girafu lexi u xi rhumelaka ehandle hi ku tirhisa parameter ya ‘filter’: u nga hlawula ti node, matlhelo, na swihlawulekisi swa tona.





Loko u rhumela ehandle datha ku suka eka database leyi hanyaka, u nga tirhisa tipharamitha ta ' cloneCluster ' na ' cloneClusterReplicaCount ' ku endla leswaku vukorhokeri bya neptue-export byi teka xifaniso xa xihatla xa database, u tumbuluxa xitluletsongo xa Neptune lexintshwa ku suka eka xifaniso xexo, u tirhisa ti-replica leti hlayiweke, naswona u ti tirhisa ku rhumela ehandle datha. Hi ku endla sweswo, u nga tiyisisa leswaku database leyi hanyaka a yi khumbeki hi ndzhwalo wo engetela ku suka eka ku rhumeriwa ka datha.

Nxaxamelo lowu heleleke wa tipharamitha wu nga kumeka laha ( https://docs.aws.amazon.com/neptune/latest/userguide/export-parameters.html ).

KU VONA DATA LEYI HUMESERIWEKE NA MAGOZA YO LANDZELA

Loko endlelo ro rhumela ehandle ri hetisiwile, neptune-export yi kandziyisa switatimende swin’wana ku katsa na tinomboro ta ti vertices na matlhelo:

Source: Nodes: 7126 Edges: 70648 Export: Nodes: 7126 Edges: 70648 Properties: 28504 Details: Nodes: user: 7126 |_ days {propertyCount=7126, minCardinality=1, maxCardinality=1, recordCount=7126, dataTypeCounts=[Integer:7126]} |_ mature {propertyCount=7126, minCardinality=1, maxCardinality=1, recordCount=7126, dataTypeCounts=[Boolean:7126]} |_ views {propertyCount=7126, minCardinality=1, maxCardinality=1, recordCount=7126, dataTypeCounts=[Integer:7126]} |_ partner {propertyCount=7126, minCardinality=1, maxCardinality=1, recordCount=7126, dataTypeCounts=[Boolean:7126]} Edges: (user)-follows-(user): 70648

Nakona yi upload data leyi rhumeriweke ehandle eka S3.





A hi languteni tifayela leti endliweke eka bakiti ra S3 leri kongomisiweke:

































































Tidayirekitara ta ‘nodes’ na ‘edges’ ti na tifayela ta CSV leti nga na minxaxamelo ya ti node na matlhelo lama fanaka na leswi hi swi tirhiseke eka Xiphemu xa 1 loko hi layicha data. Eka tigirafu letikulu, ku na tifayela to tala, kambe dataset ya hina i yitsongo naswona ku na fayili yin’we ntsena eka xikombo xin’wana na xin’wana. Ku tlhela ku va na fayili ya training-data-configuration.json leyi hi nga ta yi hlela no yi tirhisa eka goza leri landzelaka ra endlelo ra hina.





Loko u endla ku rhumela ehandle nkarhi wun’we, sweswi swi hlayisekile ku susa xikombiso xa EC2 na vholumo ya EBS, tanihileswi ku nga ta tirhisiwa ntsena tifayela eka bakiti ra S3 leri kongomisiweke eka goza leri landzelaka. Handle ka sweswo, u nga yimisa ntsena xikombiso xa EC2 ku papalata ku hakerisiwa nkarhi wo pfumala ntirho (u ta ha hakerisiwa mali ya vuhlayiselo bya EBS handle ka loko u yi susa).





Eka nkarhi lowu hi na datha ya girafu eka S3 hi xivumbeko lexi nga tirhisiwaka eka goza leri landzelaka ra phurosese, naswona hi lunghekele ku endla ku khoda ka swihlawulekisi na ku lulamisiwa ka datha, leswi nga ta xopaxopiwa eka poso ya hina leyi landzelaka.