在本博客中,我们将构建实时图像搜索,并用自然语言查询它,例如,您可以搜索“大象”或“可爱的动物”作为输入的图像列表。 我们将使用多式嵌入模型来理解和嵌入图像,并构建一个矢量索引,以便有效检索。我们将使用CocoIndex来构建索引流程,这是一个超高性能的实时数据转换框架。 这对我们来说意味着很多,如果你能放下一颗星 ,如果这个教程是有用的。 CocoIndex 在 GitHub 上 技术 可口可乐 是AI的高性能实时数据转换框架。 可口可乐 / 14 它是一个强大的视觉语言模型,可以理解图像和文本,它被训练在共享嵌入空间中对视觉和文本表示进行对齐,使其完美适合我们的图像搜索用例。 / 14 在我们的项目中,我们使用Clip来: 
 
 
 
 直接生成图像嵌入式 将自然语言搜索查询转换为相同的嵌入空间 通过比较查询嵌入与标题嵌入来启用语义搜索 是一个高性能的矢量数据库,我们使用它来存储和查询嵌入式。 快速 是基于标准Python类型提示的现代,快速(高性能)的Web框架,用于构建Python 3.7+的API。 快速 前提条件 
 
 
 安装 Postgres. CocoIndex 使用 Postgres 来跟踪数据线程以进行增量处理。 安装Qdrant。 定义索引流量 流程设计 流程图说明了我们将如何处理我们的代码库: 
 
 
 
 从本地文件系统读取图像文件 使用 CLIP 来理解和嵌入图像 将嵌入式存储在矢量数据库中以进行检索 1、插入图像。 @cocoindex.flow_def(name="ImageObjectEmbedding")
def image_object_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    data_scope["images"] = flow_builder.add_source(
        cocoindex.sources.LocalFile(path="img", included_patterns=["*.jpg", "*.jpeg", "*.png"], binary=True),
        refresh_interval=datetime.timedelta(minutes=1)  # Poll for changes every 1 minute
    )
    img_embeddings = data_scope.add_collector()
 将创建一个带子字段的表( , ) ,我们可以提到 为了更多细节。 flow_builder.add_source filename content 文档 2、处理每个图像并收集信息。 2.1 使用 CLIP 插入图像 @functools.cache
def get_clip_model() -> tuple[CLIPModel, CLIPProcessor]:
    model = CLIPModel.from_pretrained(CLIP_MODEL_NAME)
    processor = CLIPProcessor.from_pretrained(CLIP_MODEL_NAME)
    return model, processor
 该 在这种情况下,它确保我们只加载 CLIP 模型和处理器一次。 @functools.cache @cocoindex.op.function(cache=True, behavior_version=1, gpu=True)
def embed_image(img_bytes: bytes) -> cocoindex.Vector[cocoindex.Float32, Literal[384]]:
    """
    Convert image to embedding using CLIP model.
    """
    model, processor = get_clip_model()
    image = Image.open(io.BytesIO(img_bytes)).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")
    with torch.no_grad():
        features = model.get_image_features(**inputs)
    return features[0].tolist()
 是一个自定义函数,使用CLIP模型将图像转换为矢量嵌入式,它接受字节格式的图像数据,并返回代表图像嵌入的浮点数列表。 embed_image 该功能支持通过 参数. 当启用时,执行器将存储函数的结果,以便在重新处理过程中重复使用,这对于计算密集型操作尤其有用。 . cache 文档 然后我们将处理每个图像并收集信息。 with data_scope["images"].row() as img:
    img["embedding"] = img["content"].transform(embed_image)
    img_embeddings.collect(
        id=cocoindex.GeneratedField.UUID,
        filename=img["filename"],
        embedding=img["embedding"],
    )
 2.3 收集嵌入式 将嵌入式导出到Qdrant中的表格。 img_embeddings.export(
    "img_embeddings",
    cocoindex.storages.Qdrant(
        collection_name="image_search",
        grpc_url=QDRANT_GRPC_URL,
    ),
    primary_key_fields=["id"],
    setup_by_user=True,
)
 3、要索引 使用 CLIP 嵌入查询,该查询将文本和图像映射到相同的嵌入空间中,允许跨模式相似性搜索。 def embed_query(text: str) -> list[float]:
    model, processor = get_clip_model()
    inputs = processor(text=[text], return_tensors="pt", padding=True)
    with torch.no_grad():
        features = model.get_text_features(**inputs)
    return features[0].tolist()
 定义 FastAPI 终端 它执行了语义图像搜索。 /search @app.get("/search")
def search(q: str = Query(..., description="Search query"), limit: int = Query(5, description="Number of results")):
    # Get the embedding for the query
    query_embedding = embed_query(q)
    
    # Search in Qdrant
    search_results = app.state.qdrant_client.search(
        collection_name="image_search",
        query_vector=("embedding", query_embedding),
        limit=limit
    )
    
 这会搜索Qdrant矢量数据库以获取类似的嵌入。 结果 limit # Format results
out = []
for result in search_results:
    out.append({
        "filename": result.payload["filename"],
        "score": result.score
    })
return {"results": out}
 这个终端允许语义图像搜索,用户可以通过用自然语言描述图像,而不是使用准确的关键字匹配来找到图像。 应用 快速火 app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)
# Serve images from the 'img' directory at /img
app.mount("/img", StaticFiles(directory="img"), name="img")
 FastAPI 应用程序设置与 CORS 中间软件和静态文件服务 应用程序配置为: 
 
 
 
 允许来自任何来源的交叉请求 从“img”目录中服务静态图像文件 处理图像搜索功能的 API 终端 @app.on_event("startup")
def startup_event():
    load_dotenv()
    cocoindex.init()
    # Initialize Qdrant client
    app.state.qdrant_client = QdrantClient(
        url=QDRANT_GRPC_URL,
        prefer_grpc=True
    )
    app.state.live_updater = cocoindex.FlowLiveUpdater(image_object_embedding_flow)
    app.state.live_updater.start()
 启动事件处理器在首次启动时启动应用程序. 以下是每个部分的功能: 
 
 
 
 
 load_dotenv():从 .env 文件中加载环境变量,可用于配置,例如 API 密钥和 URL cocoindex.init():初始化CocoIndex框架,设置必要的组件和配置 Qdrant Client Setup:
 
 
 
 
 
 
 Creates a new   instance QdrantClient Configures it to use the gRPC URL specified in environment variables Enables gRPC preference for better performance Stores the client in the FastAPI app state for access across requests Live Updater Setup:
 
 
 
 
 
 Creates a   instance for the  FlowLiveUpdater image_object_embedding_flow This enables real-time updates to the image search index Starts the live updater to begin monitoring for changes 这种初始化确保所有必要的组件在应用程序启动时正确配置和运行。 前线 你可以检查前端代码。 我们故意保持专注于图像搜索功能的简单和简洁。 这里 时间玩得开心! 
 
 
 
 
 
 
 
 Create a collection in Qdrant curl -X PUT 'http://localhost:6333/collections/image_search' \
-H 'Content-Type: application/json' \
-d '{
    "vectors": {
    "embedding": {
        "size": 768,
        "distance": "Cosine"
    }
    }
}'
 
 
 
 
 Setup indexing flow cocoindex setup main.py
 It is setup with a live updater, so you can add new files to the folder and it will be indexed within a minute. 
 
 
 Run backend uvicorn main:app --reload --host 0.0.0.0 --port 8000
 
 
 
 Run frontend cd frontend
npm install
npm run dev
 去吧 二 搜索 http://localhost:5174 现在将另一个图像添加到 举个例子,这个 ,或您喜欢的任何图像. 等待一分钟,以便新图像进行处理和索引。 img 可爱的蜘蛛 如果你想监控索引进度,你可以在CocoInsight中查看它。 . cocoindex server -ci main.py  Finally - we are constantly improving, and more features and examples are coming soon. If you love this article, please give us a star ⭐ at   to help us grow. Thanks for reading! GitHub 吉普赛

This story contains new, firsthand information uncovered by the writer.

The code in this story is for educational purposes. The readers are solely responsible for whatever they build with it.

如何用自然语言构建视觉模型和查询的现场图像搜索

About Author

註釋

標籤

这篇文章刊登在

Related Stories

释放人工智能的力量。前沿技术的系统评价：摘要与介绍

想赢得 HackerNoon 写作比赛吗？以下是 #crypto-api 比赛获奖者的推荐

Floki 的 Valhalla 成为印度环斯里兰卡赛事联合赞助商

看不见的层面：为什么用户访谈是不可替代的资产

释放人工智能的力量。前沿技术的系统评价：摘要与介绍

想赢得 HackerNoon 写作比赛吗？以下是 #crypto-api 比赛获奖者的推荐

Floki 的 Valhalla 成为印度环斯里兰卡赛事联合赞助商

看不见的层面：为什么用户访谈是不可替代的资产

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps