See the engineering behind real-time personalization at Tripadvisor’s massive (and rapidly growing) scale U-Tripadvisor uyehlisa lokhu ngokushesha lapho uxhumane le sayithi, bese inikeza ulwazi olufanelekayo ngokushesha ngalinye ama-click - ngaphansi kwama-milliseconds. Lokhu ukucubungula ikhiqizwa nge-ML amamodeli eyenziwe nge-Data eyenziwe ku-ScyllaDB esebenza ku-AWS. Kule nqaku, uDean Poulin (i-Tripadvisor Data Engineering Lead ku-AI Service and Products Team) inikeza indlela yokukhuthaza lokhu ukucubungula. I-Dean inikeza ithuba lokuphendula kwezobuchwepheshe ezinxulumene nokuthumela ukucubungula kwe-real-time e-Tripadvisor e-massive (ngokukhula ngokushesha). Kusekelwe ku-AWS re:Invent talk: Pre-Trip Ukulungiswa Ngezinye Izihloko ze-Dean... Thola kuqala nge-snapshot enhle ye-Tripadvisor, ne-scale at which we operate. I-Tripadvisor yasungulwa ngo-2000, iye yakhelwe umongameli jikelele e-travel kanye ne-hospitality, enikeza amamilioni abavakashi uklola izivakashi zabo ezinhle. I-Tripadvisor ikhiqiza i-$1.8 billion yentuthuko kanye nenkampani e-NASDAQ. Namhlanje, sinamatheli we-2800 abasebenzi abasebenzi abanolwazi ukukhuthaza ukuthuthukiswa, futhi i-platform yethu inikeza ama-400 million abavakashi amaminithi eyodwa ngenyanga – inombolo ebandayo ebandayo. Ngosuku eyodwa, uhlelo lethu ukulawula ezingaphezu kuka-2 billion imibuzo kusuka ku-25 kuya ku-50 million abasebenzisi. Yonke ukulayishwa yakho ku-Tripadvisor kusetshenziselwa isikhathi esifanayo. Ngaphandle kwalokho, sinikeza amamodeli ye-machine learning ukunikezela imibuzo eyenziwe ngempumelelo - ukunikezela kwelinye uhambo olungcono. I-ScyllaDB isebenza ku-AWS. Lokhu kusiza ukunikezela i-millisecond-latency ngama-scale eyenziwe kumazwe amancane. Ku-peak traffic, sinikezela emhlabeni. . 425K operations per second on ScyllaDB with P99 latencies for reads and writes around 1-3 milliseconds Ngithole kanjani i-Tripadvisor isebenzisa amandla kwe-ScyllaDB, i-AWS, ne-real-time machine learning ukuze inikeze imibuzo eyenziwe ngamakhasimende ngamakhasimende ngamakhasimende ngamakhasimende. Sithole ukuthi sinikeza ukuthi sinikezela abavakashi ukufundisa konke okungenani yokufaka uhambo zabo olungcono: noma ukwazisa izici ezihambayo, izindawo ezihambayo, izifundo ezidumile, noma izindawo ezinhle zokuhamba kanye nezivakashi. Lesi [i-artikel] kuyinto mayelana nezinsizakalo ezisekelwe – indlela yokuthumela okuzenzakalelayo, izinto ezithakazelisayo kubasebenzisi kumakhasimende embalwa, enikezela ukufundisa ngokucacileyo okufuna ngokushesha. I-Trip Planning Yomphakathi Uma usuka ku-Tripadvisor homepage, i-Tripadvisor uyazi ukuthi u-foodie, i-adventurer, noma i-strand lover - futhi uzothola imibuzo ye-spot-on ezibonakalayo ezihambisana nezidingo zakho zayo. Uma uxhumane ku-Tripadvisor, siqala ukucubungula izibonelelo zakho ngokusebenzisa amamodeli ye-Machine Learning eyenza izinga ngokuvumelana nezimo zakho zokusebenzisa. Sincoma i-hotels kanye nezimo ezibonakalayo. Sincoma i-hotels ngokuvumelana nezidingo zakho zayo. Sincoma izindawo zokuxhumana ezidumile ezisekuseni ne-hotel lapho uxhumane. Zonke zihlanganisa ngokuvumelana nezimo zakho zayo zokusebenzisa. I-Tripadvisor Model Ukusebenza kwe-Architecture I-Tripadvisor isebenza ku-hundreds of independently scalable microservices e-Kubernetes on-prem kanye ne-Amazon EKS. I-ML Model Serving Platform yethu ifakwe nge-microservices eyodwa. Inkonzo le-gateway inikeza amamodeli angaphezu kuka-100 ML kusuka ku-Client Services – okuvumela usebenza izivivinyo ze-A/B ukuze uthole amamodeli amakhulu ngokusebenzisa isampula lethu. Amodeli we-ML iyahlaziywa ikakhulukazi ngama-Data Scientists neMachine Learning Engineers usebenzisa ama-Jupyter Notebooks ku-Kubeflow. Abanikezelwe futhi ahlolwe usebenzisa i-ML Flow, futhi sinikezela ku-Seldon Core ku-Kubernetes. I-Custom Feature Store yethu inikeza izici amamodeli zethu ze-ML, okuvumela ukwenza izibuyekezo ezithile. I-Custom Feature Store I-Feature Store ikakhulukazi inikeza izici ze-User futhi izici ze-Static. Izici ze-Static zitholwe ku-Redis ngenxa yokuguqulwa okungenani kakhulu. Sitholela i-data pipelines ngosuku zonke ukuze zitholele idatha evela kumadokhumenti ethu ye-offline kumadokhumenti ethu e-Feature Store njengoba izici ze-Static. I-User Features isetshenziselwa isikhathi esifanayo ngokusebenzisa i-Platform ebizwa ngokuthi i-Visitor Platform. Sinikeza izibuyekezo ze-CQL eziningana ne-ScyllaDB, futhi . we do not need a caching layer because ScyllaDB is so fast I-Feature Store yethu inikeza ku-5 million Iziqu ze-Static ngenyanga kanye ne-half million Iziqu ze-User ngenyanga. Yini i-ML Feature? Izici zihlanganisa izinguquko ze-ML Models ezisetshenziselwa ukwenza ukubuyekeza. Kukhona Izici ze-Static ne-User Features. Izibonelo ze-Static Features zihlanganisa ama-awards e-restaurant noma izinzuzo ezinikezelwe yi-hotel (njenge-Wi-Fi mahhala, ama-animal friendly noma i-fitness center). I-User Features isithunyelwe isikhathi esifanayo lapho abasebenzisi abalandela indawo. Sithunyelwe ku-ScyllaDB ukuze sikwazi ukufumana izibuyekezo zokushesha. Ezinye izibonelo ze-User Features zihlanganisa amahhotela abalandeli eminyakeni angu-30, amahhotela abalandeli eminyakeni angu-24, noma izibuyekezo abalandeli eminyakeni angu-30. I-Technologies Powering I-Visitor Platform I-ScyllaDB iyisisombululo se-Visitor Platform. Thina usebenzisa i-Java-based Spring Boot microservices ukunikezela i-platform kumakhasimende ethu. Lokhu kusetshenziselwa ku-AWS ECS Fargate. Thina usebenza i-Apache Spark ku-Kubernetes ngenxa yemisebenzi yethu yedatha yosuku zonke, i-offline yethu ku-jobs e-online. Ngemuva kwalokho, thina usebenzisa lezi zokusebenza ukubuyekeza idatha kusuka ku-offline data warehouse ku-ScyllaDB ukuze zithunyelwe kwi-live site. Thina usebenzisa i-Amazon Kinesis ekusebenziseni iziganeko zokuhamba kwamakhasimende. I-Visitor Platform Data Flow I-graphic elandelayo ibonisa indlela yokuthumela idatha nge-platform yethu ngezinyathelo ezine: ukukhiqiza, ukuthatha, ukulawula, nokuvumela. I-Data ikhiqizwa yi-website yethu ne-mobile apps yethu. Ezinye ama-data kubandakanya i-Cross-Device User Identity Graph, Ukucubungula kwamakhemikhali (njenge-page views ne-click) kanye ne-streaming events ezivela ku-Kinesis. Futhi, ukucubungula kwamakhemikhali kubandakanya ku-platform yethu. I-Microservices ye-Visitor Platform isetshenziselwa ukuchitha kanye nokuhlanganisa idatha. I-Data e-ScyllaDB ibhekwa kumasipala amabili: I-Visitor Core keyspace, ebonakalayo i-Visitor Identity Graph I-Visitor Metric keyspace, okuyinto i-Facts ne-Metrics (izinto ezivamile ezivamile ezivela emakhasini) Thola i-Data Products, i-Stamped Daily, e-Data Warehouse yethu ye-offline – lapho zitholakala ukuze zihlanganiswe nezinye i-integrations kanye nezinye i-data pipelines ukuze isetshenziswe ekusebenziseni. Ngiyaxolisa i-Visitor Platform ngokuvamile: Ngaba i-databases ezimbini? I-database yethu yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi yebhizinisi. I-Data Warehouse yethu ye-offline ibhekwa idatha esithakazelisayo asetshenziselwa ukulungiselela, ukwakha imikhiqizo ye-data, kanye nokuhlolwa kwama-ML Models. Thina akufuneka ukucubungula idatha e-offline emikhulu ekusebenziseni ukusebenza kwe-live site yethu, ngakho-ke sinikeza amabhasi amabili eyahlukile asetshenziselwa izicelo ezimbili ezahlukile. I-Platform ye-Microservices ye-visitor Ukusebenzisa 5 microservices for I-Visitor Platform: I-Visitor Core ukulawula i-cross-device user identity graph ngokusekelwe ku-cookies ne-device IDs. I-Visitor Metric iyindlela yethu yokufundisa, futhi inikeza namukwazi ukucubungula izibalo kanye nezimo zokufundisa izivakashi ezithile. Thina usebenzisa isilimi esisodwa se-domain eyenziwa njenge-visitor query language, noma i-VQL. Lolu hlobo le-VQL inikezela ukubonisa izimo zokufundisa zokufundisa ezingaphezu kuka-3 amahora ezidlulile. I-Visitor Publisher kanye ne-Visitor Saver zisebenza ne-writing path, ukubhalisa idatha ku-platform. Ngaphandle kokugcina idatha ku-ScyllaDB, sinikeza idatha ku-offline data warehouse. Lokhu kwenziwa nge-Amazon Kinesis. I-Visitor Composite inikeza ukuhlaziywa kwedatha emikhompyutha. I-Visitor Saver kanye ne-Visitor Core zithunyelwe ukuyifaka abavakashi futhi zithunyelwe amafakazi kanye nama-metric e-API eyodwa. I-Roundtrip Microservice ye-Latency I-diagram elihlanganisa ukuthi i-microservice latencies etholakalayo ngokushesha. I-latency ephakeme kuphela i-2.5 milliseconds, futhi i-P999 yethu iyatholakala ngaphansi kwe-12.5 milliseconds. Lokhu kubaluleke ukusebenza, ikakhulukazi uma kusebenza nge-1 billion requests ngosuku. Ikhasimende ethu yama-microservice zihlanganisa izidingo ze-latency. I-95% ye-call kuyadingeka ukufinyelela ngaphakathi kwe-12 milliseconds noma ngaphansi. Uma zihlanganisa okuhle, thina uzothola i-paged futhi uzodinga ukufunda ukuthi inikeza i-latency. ScyllaDB Ukuhlobisa Ngiyazi i-snapshot yokusebenza kwe-ScyllaDB eminyakeni ezintathu. Ngama-peak, i-ScyllaDB isebenza ama-operations angu-340,000 ngenyanga (kuquka i-writes kanye ne-reads kanye ne-deletes) futhi i-CPU ihamba ku-21% kuphela. Lokhu kubaluleke kakhulu! I-ScyllaDB inikeza izincwadi ze-microsecond futhi izincwadi ze-millisecond kwethu. Lolu hlobo lokusebenza okushisa okusheshayo kuyinto isizathu esizayo se-ScyllaDB. Ukuqhathanisa idatha ku-ScyllaDB Umbala elandelayo kubonisa indlela yokuhlanganisa idatha ku-ScyllaDB. I-Visitor Metric Keyspace inesihlalo ezimbili: I-Fact kanye ne-Raw Metrics. I-key yokuqala e-Fact table iyona i-Visitor GUID, i-Fact Type, ne-Created At Date. I-composite partition key iyona i-Visitor GUID ne-Fact Type. I-clustering key iyona i-Created At Date, okuvumela usihlanganisa idatha ku-partitions ngokuhambisana ne-date. I-attributes column ibandakanya i-object ye-JSON enikezela ingozi ebanjini. Abanye ama-example Facts zihlanganisa i-Search Terms, i-Page Views, ne-Bookings. Thina usebenzisa ScyllaDB's Leveled Compaction Strategy ngoba: I-Optimized Ukuze I-Range Query It ukulawula high cardinality kahle kakhulu Kuyinto engcono ngenxa yobuchwepheshe ezingenalutho, futhi sinezingu-2-3X ngaphezulu okubhaliswa kuncike Yini ScyllaDB? Isisombululo lethu yasungulwa ekuqaleni ngokusebenzisa i-Cassandra on-prem. Kodwa njengoba isisombululo esikhulu, futhi isisombululo se-operating. Kuyimfuneko ukwesekwa kokusebenza esigcwele ukuze siphinde ukubuyekeza i-database, i-backups, njll Ngaphezu kwalokho, isisombululo lethu isilinganiso esincane kakhulu yama-core components. I-User Identity Management System yethu kufanele ibonise abasebenzisi ngaphakathi kwe-30 milliseconds – futhi ngenxa yokusebenza okuhle kakhulu, sincoma i-Event Tracking platform yethu ukuba isibophezele ngaphakathi kwe-40 milliseconds. Kubalulekile ukuthi isisombululo lethu ayibophezele ukubuyekeza i-page ukuze i-SLAs ethu zibe kakhulu. Nge-C Ukusebenza kwe-Proof of Concept nge-ScyllaDB futhi ukholelwa ukuthi isivinini se-Cassandra iyatholakala kakhulu futhi ukuphefumula ukuphefumula se-operational. I-ScyllaDB inikeza ikhasi le-database yokusebenza okusheshayo kwe-live ne-latency engaphansi kakhulu. Ngokwenza lokhu, sinikezela ukuguqulwa kwe-Cassandra ku-ScyllaDB Cloud, ngokuvumelana ne-double-writing strategy. Lokhu kunikezela ukuguqulwa nge-zero-stoptime ngenkathi ukulawula izinsizakalo angu-40,000 ngenyanga. Ngemuva kwalokho, sinikezela kusuka ku-ScyllaDB Cloud ku-ScyllaDB's "Bring Your Own Account" model, lapho ungenza iqembu we-ScyllaDB ukudlulisela idatha ye-ScyllaDB ku-akhawunti yakho ye-AWS. Lokhu kunikezela ukusebenza okuphumelela kanye nokuphuculwa kwedatha. Ukubonisa indlela yokusebenza kwe-BYOA ye-ScyllaDB. Kwi-centre ye-diagram, ungakwazi ukubona i-Cluster ye-ScyllaDB ye-6 node eyenziwa ku-EC2. Futhi kunezinye izibonelo ezimbili ze-EC2. I-ScyllaDB Monitor inikeza i-Grafana Dashboards kanye ne-Prometheus metrics. I-ScyllaDB Manager ibhizinisi le-infrastructure ye-automation, njenge-akhawunti ye-backups ne-repairs. Ngokusho okuqukethwe, i-ScyllaDB ingatholakala ngokudlulisela kakhulu kuma-microservices ethu ukuze inikeze i-latency engaphansi nangaphezulu kakhulu kanye ne-performance. Ukuphathelene, Ngithanda ukuthi manje ungenza okungcono isakhiwo sethu, ubuchwepheshe ezikhuthaza i-platform, futhi kanjani i-ScyllaDB ibonise ingozi esiyingqayizivele ekutholeni ukubaluleka kwe-Tripadvisor emangalisayo. Umculo we-Cynthia Dunlop Cynthia kuyinto Senior Director of Content Strategy ku-ScyllaDB. Uyazi mayelana nokuthuthukiswa kwekhwalithi kanye nokuthuthukiswa kwekhwalithi iminyaka engaphezu kuka-20.