CocoIndex now provides robust and flexible support for typed vector data — from simple numeric arrays to deeply nested multi-dimensional vectors. This support is designed for seamless integration with high-performance vector databases such as Qdrant, and enables advanced indexing, embedding, and retrieval workflows across diverse data modalities. Qdrant In this post, we’ll break down: What types of vectors CocoIndex supports How they are represented in Python How we handle dimensionality How they map to Qdrant vectors or payloads Practical examples of usage What types of vectors CocoIndex supports How they are represented in Python How we handle dimensionality How they map to Qdrant vectors or payloads Practical examples of usage ✅ Supported Python Vector Types CocoIndex accepts a range of vector types with strong typing guarantees. Note that CocoIndex automatically infer types, so if you’re defining a flow, you don’t need to explicitly specify any types like this. You only need to explicitly specify types when you define custom functions. ✅ 1. One-dimensional vectors from cocoindex import Vector, Float32 from typing import Literal Vector[Float32] # dynamic dimension Vector[Float32, Literal[384]] # fixed dimension of 384 from cocoindex import Vector, Float32 from typing import Literal Vector[Float32] # dynamic dimension Vector[Float32, Literal[384]] # fixed dimension of 384 These types are interpreted as: Underlying Python type: numpy.typing.NDArray[np.float32] or list[float] Dimension-aware: When used with Literal[N], dimension size is explicitly enforced Supported number types: Python native types: float, int CocoIndex type aliases: Float32, Float64, Int64, they're aliases for Python native types but annotated with CocoIndex type information to indicate the exact bit size. Using them as custom function return types, you can control the exact type used in downstream, within engine and target and more numpy number types: numpy.float32, numpy.float64, numpy.int64 Underlying Python type: numpy.typing.NDArray[np.float32] or list[float] Underlying Python type numpy.typing.NDArray[np.float32] list[float] Dimension-aware: When used with Literal[N], dimension size is explicitly enforced Dimension-aware Literal[N] Supported number types: Python native types: float, int CocoIndex type aliases: Float32, Float64, Int64, they're aliases for Python native types but annotated with CocoIndex type information to indicate the exact bit size. Using them as custom function return types, you can control the exact type used in downstream, within engine and target and more numpy number types: numpy.float32, numpy.float64, numpy.int64 Supported number types Python native types: float, int CocoIndex type aliases: Float32, Float64, Int64, they're aliases for Python native types but annotated with CocoIndex type information to indicate the exact bit size. Using them as custom function return types, you can control the exact type used in downstream, within engine and target and more numpy number types: numpy.float32, numpy.float64, numpy.int64 Python native types: float, int float int CocoIndex type aliases: Float32, Float64, Int64, they're aliases for Python native types but annotated with CocoIndex type information to indicate the exact bit size. Using them as custom function return types, you can control the exact type used in downstream, within engine and target and more Float32 Float64 Int64 numpy number types: numpy.float32, numpy.float64, numpy.int64 numpy.float32 numpy.float64 numpy.int64 ✅ 2. Two-dimensional vectors (multi-vectors) Vector[Vector[Float32, Literal[3]], Literal[2]] Vector[Vector[Float32, Literal[3]], Literal[2]] This declares a vector with two rows of 3-element vectors. These are treated as multi-vectors (e.g., multiple embeddings per point). multi-vectors Underlying Python type: numpy.typing.NDArray[np.float32, Literal[3, 2]] or nested list[list[float]] Semantics: Useful for scenarios like comparing sets of embeddings per item, multi-view representations, or batched encodings Underlying Python type: numpy.typing.NDArray[np.float32, Literal[3, 2]] or nested list[list[float]] Underlying Python type numpy.typing.NDArray[np.float32, Literal[3, 2]] list[list[float]] Semantics: Useful for scenarios like comparing sets of embeddings per item, multi-view representations, or batched encodings Semantics 🧠 What is a Multi-Dimensional Vector? A multi-dimensional vector is simply a vector whose elements are themselves vectors — essentially a matrix or a nested list. In CocoIndex, we represent this using Vector[Vector[T, N], M], meaning M vectors, each of dimension N. M and N are optional - CocoIndex doesn't require them to be fixed, while some targets have requirements, e.g. a multi-vector exported to Qdrant needs to have a fixed inner dimension, i.e. Vector[Vector[T, N]]. multi-dimensional vector Vector[Vector[T, N], M] M N M N Vector[Vector[T, N]] This concept is crucial in deep learning and multimodal applications — for example: An image might be represented as a collection of patch-level embeddings. A document could be broken down into paragraph-level vectors. A user session might be a sequence of behavioral vectors. An image might be represented as a collection of patch-level embeddings. A document could be broken down into paragraph-level vectors. A user session might be a sequence of behavioral vectors. Instead of flattening these rich representations, CocoIndex supports them natively through nested Vector typing. Vector 🧭 When to Use Multi-Vector Embeddings Here are scenarios where multi-vector embeddings shine: multi-vector embeddings 🖼️ 1. Vision: Patch Embeddings Imagine a 1024x1024 image processed by a Vision Transformer (ViT). Instead of one global image vector, ViT outputs a vector per patch (e.g. 8x8 patches → 64 vectors). a vector per patch Use case: Vector[Vector[Float32, Literal[768]]] Vector[Vector[Float32, Literal[768]]] 196 patches, each with a 768-dim embedding Enables local feature matching and region-aware retrieval 196 patches, each with a 768-dim embedding Enables local feature matching and region-aware retrieval local feature matching region-aware retrieval 📄 2. Text: Document with Paragraphs A long document split into paragraphs or sentences: Vector[Vector[Float32, Literal[384]]] Vector[Vector[Float32, Literal[384]]] 10 paragraph vectors of 384 dims each Better than averaging all vectors — preserves context and structure 10 paragraph vectors of 384 dims each Better than averaging all vectors — preserves context and structure You can now: Search within sub-parts of a document Retrieve documents based on a single relevant paragraph Implement hierarchical search or reranking Search within sub-parts of a document Retrieve documents based on a single relevant paragraph Implement hierarchical search or reranking 👤 3. User Behavior: Sessions or Time Series Each user session can be viewed as a sequence of actions, clicks, or states: Vector[Vector[Float32, Literal[16]]] Vector[Vector[Float32, Literal[16]]] 20 steps in a session, each represented by a 16-dim vector 20 steps in a session, each represented by a 16-dim vector Useful in: E-commerce: click sequences Finance: time series embeddings UX analytics: multi-step interactions E-commerce: click sequences Finance: time series embeddings UX analytics: multi-step interactions 🧬 4. Scientific & Biomedical: Multi-view Embeddings A molecule or protein might have multiple conformations, graphs, or projections: Vector[Vector[Float32, Literal[128]]] Vector[Vector[Float32, Literal[128]]] 5 different views or measurement modalities Compare molecules not just by a single vector, but across their latent spaces 5 different views or measurement modalities Compare molecules not just by a single vector, but across their latent spaces 🔍 Why Not Just Use a Flat Vector? You could flatten a 2x3 multi-vector like [[1,2,3],[4,5,6]] into [1,2,3,4,5,6] — but you’d lose semantic boundaries. [[1,2,3],[4,5,6]] [1,2,3,4,5,6] semantic boundaries With true multi-vectors: Each sub-vector retains its meaning You can match/query at the sub-vector level Vector DBs like Qdrant can score across multiple embeddings, choosing the best match (e.g., closest patch, paragraph, or step) The inner vector can have a fixed dimension even if the outer dimension is variable - which is quite common, e.g. number of patches of images and number of paragraphs of articles depend on specific data. Vector having fix dimension is required by most vector databases, as it's essential for building vector index. Each sub-vector retains its meaning You can match/query at the sub-vector level sub-vector level Vector DBs like Qdrant can score across multiple embeddings, choosing the best match (e.g., closest patch, paragraph, or step) score across multiple embeddings The inner vector can have a fixed dimension even if the outer dimension is variable - which is quite common, e.g. number of patches of images and number of paragraphs of articles depend on specific data. Vector having fix dimension is required by most vector databases, as it's essential for building vector index. 🧬 CocoIndex → Qdrant Type Mapping Qdrant is a popular vector database that supports both dense vectors and multi-vectors. CocoIndex maps vector types to Qdrant-compatible formats as follows: dense vectors multi-vectors CocoIndex Type Qdrant Type Vector[Float32, Literal[N]] Dense Vector Vector[Vector[Float32, Literal[N]]] MultiVector Anything else Stored as part of Qdrant’s JSON payload CocoIndex Type Qdrant Type Vector[Float32, Literal[N]] Dense Vector Vector[Vector[Float32, Literal[N]]] MultiVector Anything else Stored as part of Qdrant’s JSON payload CocoIndex Type Qdrant Type CocoIndex Type CocoIndex Type Qdrant Type Qdrant Type Vector[Float32, Literal[N]] Dense Vector Vector[Float32, Literal[N]] Vector[Float32, Literal[N]] Vector[Float32, Literal[N]] Dense Vector Dense Vector Dense Vector Vector[Vector[Float32, Literal[N]]] MultiVector Vector[Vector[Float32, Literal[N]]] Vector[Vector[Float32, Literal[N]]] Vector[Vector[Float32, Literal[N]]] MultiVector MultiVector MultiVector Anything else Stored as part of Qdrant’s JSON payload Anything else Anything else Stored as part of Qdrant’s JSON payload Stored as part of Qdrant’s JSON payload payload Qdrant only accepts vectors with fixed dimension. CocoIndex automatically detects vector shapes and maps unsupported or dynamic-dimension vectors into payloads instead. Qdrant only accepts vectors with fixed dimension. CocoIndex automatically detects vector shapes and maps unsupported or dynamic-dimension vectors into payloads instead. fixed dimension 🧾 CocoIndex Data Model Recap In CocoIndex, data is structured as rows and fields: A row corresponds to a Qdrant point A field can either: Be a named vector, if it fits Qdrant’s vector constraints Be part of the payload, for non-conforming types A row corresponds to a Qdrant point row Qdrant point A field can either: Be a named vector, if it fits Qdrant’s vector constraints Be part of the payload, for non-conforming types field Be a named vector, if it fits Qdrant’s vector constraints Be part of the payload, for non-conforming types Be a named vector, if it fits Qdrant’s vector constraints named vector Be part of the payload, for non-conforming types payload This hybrid approach gives you the flexibility of structured metadata with the power of vector search. Final Thoughts With the ability to support deeply typed vectors and multi-dimensional embeddings, CocoIndex brings structure and semantic clarity to vector data pipelines. Whether you're indexing images, text, audio, or abstract graph representations — our typing system ensures compatibility, debuggability, and correctness at scale. Ready to bring your own LEGO to the vector world? Start building with CocoIndex today. Start building with CocoIndex