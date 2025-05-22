170 ukufundwa

Indlela yokwakha ukubuyekeza imifanekiso Live nge-Vision Model kanye ne-Query ne-Natural Language

nge LJ7m2025/05/22
ZU

Kude kakhulu; Uzofunda

Kule blog, thina ukwakha ukubuyekeza imifanekiso live futhi ukubuyekeza ngegama yemvelo. Ngokwesibonelo, ungakwazi ukubuyekeza "i-elephant", noma "i-animal enhle" nge-input ye-imeyili. Thina usebenzisa imodeli ye-multimodal embedding ukuhlola nokubuyekeza imifanekiso, futhi ukwakha i-index ye-vector yokufumana okuphumelelayo. Thina usebenzisa i-CocoIndex ukwakha umfutho we-indexing.
featured image - Indlela yokwakha ukubuyekeza imifanekiso Live nge-Vision Model kanye ne-Query ne-Natural Language
Thina usebenzisa model embedding multimodal ukucacisa futhi ukucacisa umbhalo, futhi ukwakha index ye-vector yokufaka ngokushesha. Thina usebenzisa CocoIndex ukwakha flow indexing, kuyinto ultra ukusebenza real-time data transformation framework. Ngesikhathi esebenza, ungakwazi ukongeza amafayela amasha ifolda futhi kusebenza kuphela amafayela ezintsha futhi iyatholakala ku-indexed ngaphakathi imizuzu.

Ukubaluleka kakhulu kwethu uma ungahambisa umkhosiI-CocoIndex ku-GitHub, uma lokhu tutorial iyahambisana.


Imikhiqizo

I-CocoIndex

I-CocoIndexis a ultra-performance real-time data transformation framework for AI.

I-Clip ye-VIT-L/14

I-Clip ye-VIT-L/14is a powerful vision-language model that can understand both images and texts. It is trained to align visual and textual representations in a shared embedding space, okwenza it ephelele for yethu image search usage case.

Ngo-project yethu, sicela usebenzisa i-Clip ukuze:

  1. Ukuguqulwa kwe-embeddings ze-images ngqo
  2. Ukuguqulwa kwe-natural language search requests ku-embedding space efanayo
  3. Ukuvumela ukubuyekeza kwe-semantic ngokuhlanganisa ukubuyekeza kwe-query ne-subtitle

Ngena ngemvume

Ngena ngemvumeis a high-performance vector database. Thina usebenzisa ukubheka futhi ukubheka imibhalo.

Ukuhlobisa

Ukuhlobisais a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints. Thina usebenzisa ukuba ukwakha web API for image search.

Izinzuzo

  • Ngena Postgres. I-CocoIndex isetshenziselwa i-Postgres ukucubungula i-lineage yedatha ukuze kusetshenziswe ngokushesha.
  • Ukufaka Qdrant.

Ukubonisa Indexing Flow

Ukuhlobisa Flow

flow design

I-flow diagram ibonisa indlela yokusebenza kwe-codebase yethu:

  1. Ukuhlola amafayela ze-image kusuka ku-local file system
  2. Ukusebenzisa i-Clip ukufumana nokufaka i-image
  3. Thola ama-embeddings ku-vector database yokufaka

1. Ingxubevange imifanekiso.

@cocoindex.flow_def(name="ImageObjectEmbedding")
def image_object_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    data_scope["images"] = flow_builder.add_source(
        cocoindex.sources.LocalFile(path="img", included_patterns=["*.jpg", "*.jpeg", "*.png"], binary=True),
        refresh_interval=datetime.timedelta(minutes=1)  # Poll for changes every 1 minute
    )
    img_embeddings = data_scope.add_collector()

flow_builder.add_sourceYenza ithebula nge subfields (filenameNgena ngemvacontent), singatholakala ku-UmhlahlandlelaUkuze uthole okwengeziwe

ingestion

2. Ukusebenza zonke imifanekiso futhi ukuthatha ulwazi.

2.1 Ukukhishwa kwe-image nge-Clip

@functools.cache
def get_clip_model() -> tuple[CLIPModel, CLIPProcessor]:
    model = CLIPModel.from_pretrained(CLIP_MODEL_NAME)
    processor = CLIPProcessor.from_pretrained(CLIP_MODEL_NAME)
    return model, processor

Waze@functools.cacheI-decorator ibhokisi imiphumela ye-function call. Kulesi sikhathi, ivimbele ukuthi sinikeza i-CLIP model kanye ne-processor nje.

@cocoindex.op.function(cache=True, behavior_version=1, gpu=True)
def embed_image(img_bytes: bytes) -> cocoindex.Vector[cocoindex.Float32, Literal[384]]:
    """
    Convert image to embedding using CLIP model.
    """
    model, processor = get_clip_model()
    image = Image.open(io.BytesIO(img_bytes)).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")
    with torch.no_grad():
        features = model.get_image_features(**inputs)
    return features[0].tolist()

embed_imageis a custom function that uses the CLIP model to convert an image into a vector embedding. It akutholele idatha image in bytes format and returns a list of floating-point numbers representing the image's embedding.

Umsebenzi inikeza caching ngokusebenzisacacheI-parameter. Uma i-enabled, i-executor iyathayela imiphumela ye-function ukuze isetshenziswe ngemva kwe-reprocessing, okufanayo ikakhulukazi ngezinsizakalo ezinzima. Ukuze uthole okwengeziwe mayelana ne-parameter ye-function eyenziwe, bheka kuUmhlahlandlela.

Ngemuva kwalokho, siza kusetshenziswe zonke imifanekiso kanye nokufaka ulwazi.

with data_scope["images"].row() as img:
    img["embedding"] = img["content"].transform(embed_image)
    img_embeddings.collect(
        id=cocoindex.GeneratedField.UUID,
        filename=img["filename"],
        embedding=img["embedding"],
    )



2.3 Ukukhuphela Izikhwama

I-Export ye-embeddings ku-table ku-Qdrant.

img_embeddings.export(
    "img_embeddings",
    cocoindex.storages.Qdrant(
        collection_name="image_search",
        grpc_url=QDRANT_GRPC_URL,
    ),
    primary_key_fields=["id"],
    setup_by_user=True,
)

3. Qhubeka i-index

Incubate isibuyekezo nge-CLIP, okuyinto ibheka umbhalo kanye nezithombe kwelinye indawo yokubuyekeza, okuvumela ukubuyekeza kwezibuyekezo ze-cross-modal.

def embed_query(text: str) -> list[float]:
    model, processor = get_clip_model()
    inputs = processor(text=[text], return_tensors="pt", padding=True)
    with torch.no_grad():
        features = model.get_text_features(**inputs)
    return features[0].tolist()

Ukubonisa FastAPI endpoint/searchkusebenza ku-semantic image search.

@app.get("/search")
def search(q: str = Query(..., description="Search query"), limit: int = Query(5, description="Number of results")):
    # Get the embedding for the query
    query_embedding = embed_query(q)
    
    # Search in Qdrant
    search_results = app.state.qdrant_client.search(
        collection_name="image_search",
        query_vector=("embedding", query_embedding),
        limit=limit
    )

Ukusuka ku-Qdrant database ye-vector for similar embeddings. Ukuguqulwa kwe-toplimitimiphumela

# Format results
out = []
for result in search_results:
    out.append({
        "filename": result.payload["filename"],
        "score": result.score
    })
return {"results": out}

Kuyinto endpoint kuvumela ukubuyekeza imifanekiso semantic lapho abasebenzisi angakwazi ukufumana imifanekiso ngokuvamile, ngaphandle kokusebenzisa i-keyword matchings esifanele.

isicelo

isikhunta eside

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)
# Serve images from the 'img' directory at /img
app.mount("/img", StaticFiles(directory="img"), name="img")

Ukulungiswa kwe-application ye-FastAPI nge-middleware ye-CORS kanye ne-file ye-static enikezelwe ku:

  • Ukuvumela izicelo ze-cross-origin kusuka kumazwe eyodwa
  • Ukusebenza amafayela ze-image ye-static kusuka ku- 'img' directory
  • Hlola i-API endpoints for image search functionality
@app.on_event("startup")
def startup_event():
    load_dotenv()
    cocoindex.init()
    # Initialize Qdrant client
    app.state.qdrant_client = QdrantClient(
        url=QDRANT_GRPC_URL,
        prefer_grpc=True
    )
    app.state.live_updater = cocoindex.FlowLiveUpdater(image_object_embedding_flow)
    app.state.live_updater.start()

Umshini we-start event utshintshe isicelo lapho kuqala kuqala. Ngiyazi ukuthi zonke izigaba zenza:

  1. load_dotenv(): I-environment variables ifayilishwe kusuka ku-env ifayela, okuyinto ezisebenzayo ukubuyekeza njenge-API keys ne-URLs
  2. cocoindex.init(): I-CocoIndex framework ifumaneka, ukwakha izingxenye nezidingo kanye nezinhlangano
  3. Qdrant Client Setup:
    • Creates a new QdrantClient instance
    • Configures it to use the gRPC URL specified in environment variables
    • Enables gRPC preference for better performance
    • Stores the client in the FastAPI app state for access across requests
  4. Live Updater Setup:
    • Creates a FlowLiveUpdater instance for the image_object_embedding_flow
    • This enables real-time updates to the image search index
    • Starts the live updater to begin monitoring for changes

Ukulungiswa okuhlobene kuncike ukuthi zonke izingxenye ezidingekayo zihlanganiswa ngokufanelekileyo futhi zokusebenza lapho isicelo kuqala.

Ukubuyekezwa

ungakwazi ukuyifaka i-frontend codeNgiya. Thina ngempumelelo ukugcina kulula futhi minimalistic ukucindezeleka ku-image search functionality.

isikhathi eside ukujabulela!

  • Create a collection in Qdrant

    curl -X PUT 'http://localhost:6333/collections/image_search' \
-H 'Content-Type: application/json' \
-d '{
    "vectors": {
    "embedding": {
        "size": 768,
        "distance": "Cosine"
    }
    }
}'

  • Setup indexing flow

    cocoindex setup main.py

    It is setup with a live updater, so you can add new files to the folder and it will be indexed within a minute.

  • Run backend

    uvicorn main:app --reload --host 0.0.0.0 --port 8000

  • Run frontend

    cd frontend
npm install
npm run dev

Tholahttp://localhost:5174Ukubuyiselwa


Example Search


More Example Search


Ngiyaxolisa isithombe esilandelayo ku-imgFolders, isibonelo, lokhuIzingubo ezinhle, noma umbhalo owaziwa. Qinisekisa imizuzu yokusebenza kanye nokuguqulwa kwebhizinisi entsha.


Squirrel Search


Uma ufuna ukucubungula ukucubungula, ungakwazi ukucubungula ku-CocoInsightcocoindex server -ci main.py .


Indexing status


Finally - we are constantly improving, and more features and examples are coming soon. If you love this article, please give us a star ⭐ at GitHub to help us grow. Thanks for reading!

