paint-brush
How Vector Search Cracks the Code on Contract Analytics by@datastax
710 reads
710 reads

How Vector Search Cracks the Code on Contract Analytics

by DataStaxDecember 9th, 2024
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

A look at the application architecture of wealthAPI, a data analytics provider for the financial sector that has created a highly accurate way to identify recurring payment entries.
featured image - How Vector Search Cracks the Code on Contract Analytics
DataStax HackerNoon profile picture

A look at the application architecture of wealthAPI, a data analytics provider for the financial sector that has created a highly accurate way to identify recurring payment entries.


At wealthAPI, we’ve always believed financial analytics should be smarter and faster, especially when identifying recurring payments hidden in transaction data. We’ve built a solution that transforms raw transaction data into actionable insights by leveraging AI. Our system uses vector embeddings to group transactions into recurring payment patterns, ensuring accuracy even when recurring payment entries contain subtle wording differences.


From subscriptions to insurance payments, our platform delivers reliable results while maintaining the speed and scalability financial companies need.


Here, we’ll show how we designed our architecture to solve these challenges, from data ingestion and vector embeddings to clustering transactions into meaningful groups. We’ll also explore how AI powers advanced features like semantic search, allowing users to find and analyze financial data effortlessly.

What The Application Does

wealthAPI tackles a common yet challenging problem for financial companies: identifying recurring payments, such as subscriptions, in bank transaction histories. Traditional methods struggled with scaling and often relied on exact matches, missing subtle differences (e.g., “Spotify” vs. “Spotify AB”).


wealthAPI addresses this problem with an AI-driven approach that delivers accuracy and speed. At the heart of this solution lies DataStax Astra DB, a database platform purpose-built for modern, scalable, and AI-integrated workflows.

Architecture

wealthAPI’s system takes raw bank transactions, processes them into embeddings, and groups them into recurring payment patterns — all powered by Astra DB’s vector similarity search capabilities. The architecture ensures scalability and responsiveness at each stage, even under high data volumes.


Here’s a simplified flow of the process:

  1. Data ingestion - When bank transactions are received, the wealthAPI backend publishes them on a message queue for asynchronous processing.


  2. Embedding creation - Each transaction (e.g., “Spotify, -10€, 22.10.24”) is transformed into a numeric vector (e.g., [0.12, 0.65, 0.78, ..., 0.23]) using Astra DB’s vectorize feature.


  3. Vector storage and search in Astra DB - The embeddings are stored in Astra DB, where lightning-fast vector similarity searches allow the system to find and cluster similar transactions.


  4. Regularity analysis - The clusters are analyzed to identify recurring payments, categorizing them as contracts like “Spotify - music service - monthly" or "Health insurance - Health - yearly."


Astra DB ensures the entire process is scalable and responsive, even with high volumes of data. The process also adheres to strict data security measures to ensure that end users and their transactions remain anonymous and protected from external access.



Technical Implementation

Clustering transactions into contracts

Grouping transactions has always been a core challenge. Previous tools depended on exact matches (e.g., vendor name or payment amount), which often failed to capture variations and were slow to scale.


At wealthAPI, we tried searching for patterns among millions of transactions with traditional databases in the past, which was both slow and prone to errors. Even small variations in transaction details broke the clustering logic.


Because we’re using Astra DB, we can store embeddings and efficiently search for similar transactions, even with minor variations in details.


Here’s an example: A payment labeled “Spotify AB” for €10 on one day and “Spotify” for €10 the next is correctly grouped as the same recurring payment.

Handling Large Data Volumes

With thousands of transactions processed daily, wealthAPI required a database that could scale seamlessly while maintaining speed and accuracy.


Astra DB’s foundation is Apache Cassandra, so it’s built for scalability. It also integrates with AI workflows, enabling wealthAPI to maintain fast queries without compromising precision.

Transaction Search Engine

Because embeddings capture the underlying meaning of transactions, wealthAPI can also implement a search feature. Users can type a keyword like “health” to retrieve all health-related transactions without relying on predefined tags or categories.


The system generates an embedding from the user query and runs a simple similarity search using Astra DB; its vector search capability makes this kind of semantic search fast and accurate.


A user typing “health,” for example, will see all payments for health-related services, like insurance or gym memberships, even if the vendor names differ.

Wrapping Up

wealthAPI’s use of Astra DB demonstrates how advanced database technology can drive innovation in financial analytics. From precise transaction clustering to enabling a cutting-edge semantic search engine, Astra DB’s vector search and scalability empower wealthAPI to deliver faster, smarter solutions to its clients.


By integrating AI workflows directly into Astra DB’s architecture, wealthAPI has enhanced financial data processing and introduced a valuable new capability for contract analytics.


By Belkacem Berchiche, machine learning engineer, wealthAPI, and Dieter Flick, solution engineer, DataStax


Learn more about Astra DB and wealthAPI.