Ati: “Navuze ko nshaka dammit ya B-Movie!” Iherezo ryumuzingo utagira iherezo (hamwe nimpaka hejuru yibyo kureba…) Kurambirwa kuzenguruka ubuziraherezo muri Netflix, utazi neza icyo uzareba ubutaha? Byagenda bite niba ushobora kwiyubakira wenyine, sisitemu yo kuguha inama ya AI itangaza firime yawe ikurikira kandi neza? Muriyi nyigisho, tuzakuyobora muburyo bwo gukora ukoresheje . Uzamenya uburyo moteri igezweho ya AI ikora kandi ubone uburambe bwo kubaka sisitemu yawe hamwe na . sisitemu yo kwerekana firime ububiko bwa vector (VectorDBs) Superlinked (Urashaka gusimbuka neza kuri kode? Reba repo yacu kuri GitHub . Witegure kugerageza sisitemu yo gutanga inama kubibazo byawe bwite? Shaka demo .) hano hano Reka tubone ibyifuzo! Tuzakurikira iyi mu ngingo. Urashobora kandi gukoresha code neza uhereye kuri mushakisha yawe ukoresheje ikaye Colab. Icyifuzo cya Netflix cyerekana algorithm ikora akazi keza ko gutanga ibitekerezo bijyanye - ukurikije ubwinshi bwamahitamo (~ 16k firime na gahunda za TV muri 2023) nuburyo bwihuse bwo gutanga ibitekerezo kubakoresha. Netflix ibikora ite? Mu ijambo, . gushakisha ibisobanuro Ishakisha rya semantique risobanukirwa ibisobanuro nibisobanuro (byombi nibiranga uburyo bwo gukoresha) inyuma yibibazo byabakoresha hamwe na firime / TV yerekana ibisobanuro, bityo rero birashobora gutanga ubumuntu bwiza mubibazo byifuzo byayo kuruta ibyifuzo byibanze bishingiye kumajambo gakondo. Ariko gushakisha ibisobanuro bitera bimwe na bimwe - icyambere muri byo: 1) kwemeza ibisubizo nyabyo byubushakashatsi, 2) gusobanurwa, na 3) ubunini - imbogamizi ingamba zose zatanzwe neza zigomba gukemura. Ukoresheje isomero rya Superlinked, urashobora izo ngorane. ibibazo gutsinda Muri iyi ngingo, tuzakwereka uburyo kandi ukurikije ibyo ukunda. wakoresha isomero rya superlinked kugirango ushireho ubushakashatsi bwawe bwite utange urutonde rwa firime zijyanye Gushakisha Ibisobanuro - Ibibazo Ishakisha rya semantique ritanga agaciro kanini mugushakisha kwa vector ariko ritanga ibibazo bitatu byingenzi byerekeranye no gushira ibibazo kubateza imbere: : Kugenzura niba ibyo washyizemo bifata neza ibisobanuro bisobanura amakuru yawe bisaba guhitamo neza tekinike yo gushira, amakuru yo guhugura, hamwe na hyperparameter. Kwinjiza ubuziranenge birashobora kuganisha kubisubizo byubushakashatsi bidahwitse hamwe nibyifuzo bidafite akamaro. Ubwiza n'akamaro : Umwanya wo hejuru wa vector umwanya uragoye cyane kubyumva byoroshye. Kugirango ubone ubushishozi mubusabane nubusabane bukubiye muri bo, abahanga mubumenyi bagomba gukora uburyo bwo kubishushanya no kubisesengura. Ibisobanuro : Gucunga no gutunganya ibyashizwe hejuru cyane, cyane cyane mumibare minini, birashobora kunaniza umutungo wo kubara no kongera ubukererwe. Uburyo bunoze bwo kwerekana ibimenyetso, kugarura, hamwe no kubara bisa nibyingenzi kugirango hamenyekane ubunini nigihe gikora mubikorwa bidukikije. Ubunini Isomero rya superlinked rigushoboza gukemura ibyo bibazo. Hasi, tuzubaka ibyifuzo (cyane cyane kuri firime), duhereye kumakuru dufite kubyerekeranye na firime runaka, dushyiremo aya makuru nka vectori ya multimodal, twubake indangagaciro zashakishwa kuri firime zacu zose, hanyuma dukoreshe uburemere bwibibazo kugirango duhindure ibisubizo byacu kandi tugere kubitekerezo byiza bya firime. Reka tuyinjiremo. Gukora Ubushakashatsi Bwihuse kandi bwizewe hamwe na superlinked Hasi, uzakora ubushakashatsi bwibisobanuro kuri dataset ya Netflix ukoresheje ibintu bikurikira byububiko bwa superlinked: Umwanya wo kwidagadura - gusobanukirwa gushya (ifaranga ningirakamaro) yamakuru yawe, ukamenya firime nshya. Umwanya uhuye - gusobanura ibice bitandukanye bya metadata ufite kubyerekeranye na firime, nkibisobanuro, umutwe, nubwoko. Ikibazo cyibihe biremereye - kureka ugahitamo icyingenzi mumibare yawe mugihe ukoresheje ikibazo, bityo ugahitamo neza udakeneye kongera gushiramo dataset yose, gukora postprocessing, cyangwa gukoresha uburyo bwihariye bwo guhindura ibintu (nukuvuga kugabanya ubukererwe). Netflix Dataset, nicyo Tuzabikora Gutsindira neza firime biragoye cyane kuko hariho amahitamo menshi (> imitwe 9000 muri 2023), kandi abakoresha bashaka ibyifuzo kubisabwa, ako kanya. Reka dufate kugirango tubone ikintu dushaka kureba. Muri yacu ya firime, tuzi: uburyo bushingiye kumakuru dataset ibisobanuro injyana Umutwe kurekura_umwaka Turashobora gushiramo ibyo byinjira, hanyuma tugashyira hamwe indangagaciro ya vector hejuru yibyo dushyiramo, tugakora umwanya dushobora gushakisha mubisobanuro. Numara kugira indangagaciro ya vector umwanya, tuzakora: ubanza, reba firime, zungurwe nigitekerezo (comedi yumutima ivuye kumutima) ubutaha, hindura ibisubizo, utange akamaro kanini mumikino imwe yinjiza (urugero, uburemere) hanyuma, shakisha mubisobanuro, injyana, numutwe hamwe namagambo atandukanye yo gushakisha kuri buri hanyuma, nyuma yo kubona firime yegeranye ariko idahuye neza, shakisha kandi ukoresheje iyo firime nkibisobanuro Kwinjiza no Gutegura Dataset Intambwe yawe yambere nugushiraho isomero no gutumiza ibyangombwa bisabwa. alt.renderers.enable(“mimetype”) alt.renderers.enable('colab') %pip install superlinked==5.3.0 from datetime import timedelta, datetime import altair as alt import os import pandas as pd from superlinked.evaluation.charts.recency_plotter import RecencyPlotter from superlinked.framework.common.dag.context import CONTEXT_COMMON, CONTEXT_COMMON_NOW from superlinked.framework.common.dag.period_time import PeriodTime from superlinked.framework.common.schema.schema import schema from superlinked.framework.common.schema.schema_object import String, Timestamp from superlinked.framework.common.schema.id_schema_object import IdField from superlinked.framework.common.parser.dataframe_parser import DataFrameParser from superlinked.framework.dsl.executor.in_memory.in_memory_executor import ( InMemoryExecutor, InMemoryApp, ) from superlinked.framework.dsl.index.index import Index from superlinked.framework.dsl.query.param import Param from superlinked.framework.dsl.query.query import Query from superlinked.framework.dsl.query.result import Result from superlinked.framework.dsl.source.in_memory_source import InMemorySource from superlinked.framework.dsl.space.text_similarity_space import TextSimilaritySpace from superlinked.framework.dsl.space.recency_space import RecencySpace alt.renderers.enable("mimetype") # NOTE: to render altair plots in colab, change 'mimetype' to 'colab' alt.data_transformers.disable_max_rows() pd.set_option("display.max_colwidth", 190) Tugomba kandi gutegura dataset - gusobanura igihe gihoraho, gushiraho URL ya data yamakuru, gukora inkoranyamagambo yububiko, gusoma CSV muri pandas DataFrame, gusukura dataframe hamwe namakuru kugirango ishakwe neza, kandi ikore igenzura ryihuse hamwe nubushishozi. (Reba kugirango ubone ibisobanuro birambuye.) selile 3 na 4 Noneho ko dataset yateguwe, urashobora guhitamo kugarura ukoresheje isomero rya superlinked. Kubaka Indangantego yo Gushakisha Vector Isomero rya superlinked ririmo urutonde rwibanze rwubaka dukoresha mukubaka indangagaciro no gucunga kugarura. Urashobora gusoma kubyerekeye inyubako zubaka muburyo burambuye . hano Icyambere, ugomba gusobanura Schema yawe kugirango ubwire sisitemu kubyerekeye amakuru yawe. # accommodate our inputs in a typed schema @schema class MovieSchema: description: String title: String release_timestamp: Timestamp genres: String id: IdField movie = MovieSchema() Ibikurikira, ukoresha Umwanya kugirango uvuge uburyo ushaka gufata buri gice cyamakuru mugihe ushizemo. Nibihe Umwanya ukoreshwa biterwa na datatype yawe. Buri mwanya wateguwe kugirango ushiremo amakuru kugirango usubize ubuziranenge bushoboka bwibisubizo. Mubisobanuro byumwanya, turasobanura uburyo inyongeramusaruro zigomba gushyirwamo kugirango tugaragaze isano isobanutse mumibare yacu. # textual fields are embedded using a sentence-transformers model description_space = TextSimilaritySpace( text=movie.description, model="sentence-transformers/paraphrase-MiniLM-L3-v2" ) title_space = TextSimilaritySpace( text=movie.title, model="sentence-transformers/paraphrase-MiniLM-L3-v2" ) genre_space = TextSimilaritySpace( text=movie.genres, model="sentence-transformers/paraphrase-MiniLM-L3-v2" ) # release date are encoded using our recency space # periodtimes aim to reflect notable breaks in our scores recency_space = RecencySpace( timestamp=movie.release_timestamp, period_time_list=[ PeriodTime(timedelta(days=4 * YEAR_IN_DAYS)), PeriodTime(timedelta(days=10 * YEAR_IN_DAYS)), PeriodTime(timedelta(days=40 * YEAR_IN_DAYS)), ], negative_filter=-0.25, ) movie_index = Index(spaces=[description_space, title_space, genre_space, recency_space]) Umaze gushiraho umwanya wawe hanyuma ugashiraho indangagaciro yawe, ukoresha isoko nuwashinzwe ibice byibitabo kugirango ushireho ibibazo byawe. Reba . selile 10-13 mu ikaye Noneho ko ibibazo byateguwe, reka tujye kumurongo wo gukora no guhitamo kugaruka muguhindura ibiro. Sobanukirwa na Recency, nuburyo bwo kuyikoresha muri superlinked Umwanya wo kwidagadura ureka uhindure ibisubizo byikibazo cyawe ukunda gukuramo ibishaje cyangwa bishya biva muri dataset yawe. Dukoresha imyaka 4, 10, na 40 nkibihe byigihe cyacu kugirango dushobore gutanga imyaka hamwe nandi mazina menshi yibanze - reba ). selile 5 Reba kuruhuka amanota kumyaka 4, 10, na 40. Amazina arengeje imyaka 40 abona amanota . negative_filter Gusubiramo no Kunonosora Ibisubizo Byishakisha Ukoresheje Ikibazo Cyibihe Bitandukanye Reka dusobanure imikorere yihuse yo kwerekana ibisubizo byacu mu ikaye. def present_result( result: Result, cols_to_keep: list[str] = ["description", "title", "genres", "release_year", "id"], ) -> pd.DataFrame: # parse result to dataframe df: pd.DataFrame = result.to_pandas() # transform timestamp back to release year df["release_year"] = [ datetime.fromtimestamp(timestamp).year for timestamp in df["release_timestamp"] ] return df[cols_to_keep] Ibibazo byoroshye kandi byateye imbere Isomero rya superlinked rigufasha gukora ubwoko butandukanye bwibibazo; hano turasobanura bibiri. Byombi mubibazo byubwoko bwibibazo (byoroshye kandi byateye imbere) reka napime umwanya wihariye (ibisobanuro, umutwe, injyana, kandi byanze bikunze) nkurikije ibyo nkunda. nuko hamwe , nashizeho ikibazo kimwe cyanditse hanyuma nkagaragaza ibisubizo bisa mubisobanuro, umutwe, hamwe numwanya wa genre. Itandukaniro hagati yabo nikibazo cyoroshye Hamwe , mfite byinshi byiza-kugenzura. Niba mbishaka, nshobora kwinjiza ibibazo bitandukanye mubisobanuro, umutwe, hamwe nubwoko. Dore kode y'ibibazo: nikibazo cyateye imbere query_text_param = Param("query_text") simple_query = ( Query( movie_index, weights={ description_space: Param("description_weight"), title_space: Param("title_weight"), genre_space: Param("genre_weight"), recency_space: Param("recency_weight"), }, ) .find(movie) .similar(description_space.text, query_text_param) .similar(title_space.text, query_text_param) .similar(genre_space.text, query_text_param) .limit(Param("limit")) ) advanced_query = ( Query( movie_index, weights={ description_space: Param("description_weight"), title_space: Param("title_weight"), genre_space: Param("genre_weight"), recency_space: Param("recency_weight"), }, ) .find(movie) .similar(description_space.text, Param("description_query_text")) .similar(title_space.text, Param("title_query_text")) .similar(genre_space.text, Param("genre_query_text")) .limit(Param("limit")) ) Ikibazo Cyoroshye Mubibazo byoroshye, nashizeho inyandiko yikibazo kandi ngashyiraho uburemere butandukanye nkurikije akamaro kanjye kuri njye. result: Result = app.query( simple_query, query_text="Heartfelt romantic comedy", description_weight=1, title_weight=1, genre_weight=1, recency_weight=0, limit=TOP_N, ) present_result(result) Ibisubizo byacu birimo imitwe imwe namaze kubona. Nshobora guhangana nibi nukuremerera uburemere kubogama ibisubizo byanjye kumitwe ya vuba. Ibiro birasanzwe kugirango habeho igiteranyo (nukuvuga, uburemere bwose burahindurwa kuburyo burigihe bigera kuri byose hamwe 1), ntugomba rero guhangayikishwa nuburyo wabishyizeho. result: Result = app.query( simple_query, query_text="Heartfelt romantic comedy", description_weight=1, title_weight=1, genre_weight=1, recency_weight=3, limit=TOP_N, ) present_result(result) Ibisubizo byanjye (hejuru) byose ni nyuma ya 2021. Nkoresheje ikibazo cyoroshye, ndashobora kuremerera umwanya uwariwo wose (ibisobanuro, umutwe, injyana, cyangwa kwisubiraho) kugirango ubare byinshi mugihe ugarutse ibisubizo. Reka tugerageze nibi. Hasi, tuzatanga uburemere bwubwoko hamwe nuburemere bwibiro - inyandiko yanjye yibibazo ahanini ni ubwoko bufite imiterere yinyongera. Nkomeje kwitonda nkuko biri kuko ndacyashaka ko ibisubizo byanjye bibogama kuri firime ziherutse. result = app.query( simple_query, query_text="Heartfelt romantic comedy", description_weight=1, title_weight=0.1, genre_weight=2, recency_weight=1, limit=TOP_N, ) present_result(result) Iki kibazo gisunika umwaka wo gusohora inyuma gato kugirango umpe ibisubizo biremereye byubwoko (munsi). Ikibazo Cyiza Ikibazo cyateye imbere kirampa ndetse kurushaho kugenzura neza. Ndagumya kugenzura ibyiyumvo, ariko ndashobora kandi kwerekana inyandiko ishakisha ibisobanuro, umutwe, nubwoko, kandi nkagenera buriwese uburemere bwihariye nkurikije ibyo nkunda, munsi (na ), selile 19-21 result = app.query( advanced_query, description_query_text="Heartfelt lovely romantic comedy for a cold autumn evening.", title_query_text="love", genre_query_text="drama comedy romantic", description_weight=0.2, title_weight=3, genre_weight=1, recency_weight=5, limit=TOP_N, ) present_result(result) Shakisha Ukoresheje Filime Yihariye Vuga mubisubizo bya firime yanyuma, nabonye firime namaze kubona kandi nifuza kubona ibintu bisa. Reka dufate ko nkunda Noheri Yera, urwenya rwurukundo rwo mu 1954 (id = tm16479) kubyerekeye abaririmbyi-ababyinnyi bahurira hamwe kugirango berekane abashyitsi mu icumbi rya Vermont. Mugushyiramo inyongera hamwe (hamwe na parameter) kumurongo wambere, hamwe na_movie_query reka nshakishe nkoresheje iyi firime (cyangwa firime iyo ari yo yose nkunda), kandi umpaye kugenzura neza kugenzura inyandiko zitandukanye zishakisha hamwe nuburemere. with_vector movie_id Ubwa mbere, twongeyeho firime_id parameter: with_movie_query = advanced_query.with_vector(movie, Param("movie_id")) Hanyuma, nshobora gushiraho ibindi bibazo byanjye byubushakashatsi haba kubusa cyangwa ikindi kintu cyose gifatika, hamwe nuburemere ubwo aribwo bwose. Reka tuvuge ikibazo cyanjye cya mbere gisubiza ibisubizo byerekana imikorere ya stade / bande ya Noheri Yera (reba ), ariko ndashaka kureba firime ireba umuryango. Nshobora kwinjiza ibisobanuro_ibibazo_text kugirango mpindure ibisubizo byanjye mubyifuzo. selile 24 result = app.query( with_movie_query, description_query_text="family", title_query_text="", genre_query_text="", description_weight=1, title_weight=0, genre_weight=0, recency_weight=0, description_query_weight=1, movie_id="tm16479", limit=TOP_N, ) present_result(result) Ariko ubu maze kubona ibisubizo byanjye, menye ko mubyukuri ndushijeho kuba mwiza kubintu byoroshye-bisekeje. Reka duhindure ikibazo cyanjye dukurikije: Result = app.query( with_movie_query, description_query_text="", title_query_text="", genre_query_text="comedy", description_weight=1, title_weight=0, genre_weight=2, recency_weight=0, description_query_weight=1, movie_id="tm16479", limit=TOP_N, ) present_result(result) Nibyo, ibisubizo nibyiza. Nzahitamo kimwe muri ibyo. Shyira popcorn kuri! Umwanzuro Birenzeho byoroshye kugerageza, gusubiramo, no kunoza ubuziranenge bwawe. Hejuru, twakunyuze muburyo bwo gukoresha isomero rya superlinked kugirango ukore ubushakashatsi bwimbitse kumwanya wa vector, uburyo Netflix ikora, hanyuma ugarure ibisubizo nyabyo, bijyanye na firime. Twabonye kandi uburyo bwo guhuza neza ibisubizo byacu, guhindura uburemere n'amagambo yo gushakisha kugeza tugeze kubisubizo byiza. Noneho, gerageza wenyine, urebe icyo ushobora kugeraho! ikaye Gerageza ubwawe - Shaka Code & Demo! : Reba ishyirwa mubikorwa muri repo yacu ya GitHub . Kureka, kuyihindura, no kuyigira icyawe! Fata Kode hano . : Urashaka kubona ibi bikora mubyukuri? Wandike kandi ushakishe uburyo ishobora kurenza ibyifuzo byawe. ! . Reba Mubikorwa demo yihuse, Superlinked Fata demo nonaha Moteri zibyifuzo zirimo gushiraho uburyo tuvumbura ibirimo. Yaba firime, umuziki, cyangwa ibicuruzwa, - kandi ubu ufite ibikoresho byo kubaka ibyawe. gushakisha vector nigihe kizaza Umwanditsi: Mór Kapronczay