๐งฉย The Challenge: Drowning in Data During Incidents ๐งฉ In the critical moments of a production incident, engineering teams face a formidable challenge: navigating a deluge of log data to find the needle in the haystack. Traditional log analysis demands that engineers formulate precise, often complex, queries using specialized languages. This is effective when you know what to look for, but the real difficulty often lies in diagnosing the "unknown unknowns" - unexpected failures not captured by simple keyword searches. What if you could ask questions in plain English, like, "What were the most common errors for the checkout service in the last 15 minutes?" This article demonstrates how to build a powerful, serverless AIOps pipeline on AWS to create a natural language interface for your application logs, transforming log analysis from a rigid, query-based task into an intuitive, conversational experience. "What were the most common errors for the checkout service in the last 15 minutes?" ๐ฌย The Solution: Conversational AIOps with RAG ๐ฌ This solution leverages a powerful pattern in generative AI known as Retrieval-Augmented Generation (RAG). RAG enhances the capabilities of Large Language Models (LLMs) by connecting them to external knowledge sources - in this case, your real-time application logs. This approach is highly cost-effective as it avoids expensive model retraining, instead providing the LLM with relevant, live context to answer questions accurately. Retrieval-Augmented Generation (RAG) High-Level Architecture The system is composed of a series of integrated, serverless AWS services that form a complete AIOps pipeline, from ingestion to a conversational response. The data flows as follows: The data flows as follows: Ingestion & Embedding: Logs are streamed to an Amazon OpenSearch Ingestion pipeline. The pipeline uses an AWS Lambda function to call Amazon Bedrock's Titan Text Embeddings model, converting the semantic content of each log into a numerical vector. Indexing: The original log, now enriched with its vector embedding, is stored in an Amazon OpenSearch Serverless collection configured for high-performance vector search. Query & Retrieval: A user asks a question through a simple web app. The app converts the question into a vector using the same Titan model and performs a k-Nearest Neighbors (k-NN) similarity search against the OpenSearch collection to find the most semantically relevant logs. Synthesis & Response: The retrieved logs are passed as context, along with the original question, to a powerful generative LLM like Anthropic's Claude on Amazon Bedrock. Claude analyzes the logs, synthesizes the information, and generates a coherent, human-readable answer. Ingestion & Embedding: Logs are streamed to an Amazon OpenSearch Ingestion pipeline. The pipeline uses an AWS Lambda function to call Amazon Bedrock's Titan Text Embeddings model, converting the semantic content of each log into a numerical vector. Ingestion & Embedding: Indexing: The original log, now enriched with its vector embedding, is stored in an Amazon OpenSearch Serverless collection configured for high-performance vector search. Indexing: Query & Retrieval: A user asks a question through a simple web app. The app converts the question into a vector using the same Titan model and performs a k-Nearest Neighbors (k-NN) similarity search against the OpenSearch collection to find the most semantically relevant logs. Query & Retrieval: Synthesis & Response: The retrieved logs are passed as context, along with the original question, to a powerful generative LLM like Anthropic's Claude on Amazon Bedrock. Claude analyzes the logs, synthesizes the information, and generates a coherent, human-readable answer. Synthesis & Response: ๐ง The AIOps Pipeline: Key Components ๐ง How Ingestion and Embedding Work Together The core of the data processing is a seamless, serverless flow between the Amazon OpenSearch Ingestion pipeline and the embedding_lambda function. This is how raw logs are enriched with semantic meaning before they are ever stored. embedding_lambda Hereโs a step-by-step breakdown of their interaction: Data Arrives at the Pipeline: An application sends a log entry to the OpenSearch Ingestion pipeline's HTTP endpoint. Pipeline Invokes the Lambda Processor: The pipeline's configuration includes a processor stage that points to our embedding_lambda function. When the pipeline receives log data, it automatically invokes this Lambda, passing the batch of log records to it. Lambda Generates Embeddings: The embedding_lambda function executes its logic: it iterates through each log, extracts the text, and makes an API call to Amazon Bedrock's Titan Text Embeddings model. Bedrock returns a numerical vector (the embedding) that captures the log's meaning. Lambda Enriches the Data: The Lambda function adds this new vector as a field (e.g., log_embedding) to the original log record. Pipeline Sends Data to the Sink: The Lambda returns the modified, enriched log records back to the pipeline. The pipeline then sends this complete document to its configured sink - the OpenSearch Serverless vector collection - where it is indexed and becomes available for semantic search. Data Arrives at the Pipeline: An application sends a log entry to the OpenSearch Ingestion pipeline's HTTP endpoint. Data Arrives at the Pipeline: Pipeline Invokes the Lambda Processor: The pipeline's configuration includes a processor stage that points to our embedding_lambda function. When the pipeline receives log data, it automatically invokes this Lambda, passing the batch of log records to it. Pipeline Invokes the Lambda Processor: processor embedding_lambda Lambda Generates Embeddings: The embedding_lambda function executes its logic: it iterates through each log, extracts the text, and makes an API call to Amazon Bedrock's Titan Text Embeddings model. Bedrock returns a numerical vector (the embedding) that captures the log's meaning. Lambda Generates Embeddings: embedding_lambda Lambda Enriches the Data: The Lambda function adds this new vector as a field (e.g., log_embedding) to the original log record. Lambda Enriches the Data: log_embedding Pipeline Sends Data to the Sink: The Lambda returns the modified, enriched log records back to the pipeline. The pipeline then sends this complete document to its configured sink - the OpenSearch Serverless vector collection - where it is indexed and becomes available for semantic search. Pipeline Sends Data to the Sink: sink semantic search The Embedding Lambda: Adding Semantic Context The Embedding Lambda: Adding Semantic Context The embedding_lambda is a small but critical piece of the pipeline. Its sole job is to enrich the log data with semantic meaning. Triggered by the OpenSearch Ingestion pipeline for every new batch of logs, it performs three key steps: embedding_lambda enrich the log data Receives Logs: It accepts a batch of raw log entries from the ingestion pipeline. Generates Vectors: It extracts the text from each log and calls the Amazon Bedrock API, specifically requesting an embedding from the Titan Text Embeddings model. Bedrock returns a numerical vector (e.g., a list of 1,536 numbers) that represents the log's meaning. Returns Enriched Logs: The function adds this vector to the original log data under a new field, like log_embedding, and returns the modified batch to the ingestion pipeline, which then stores it in OpenSearch. Receives Logs: It accepts a batch of raw log entries from the ingestion pipeline. Receives Logs: Generates Vectors: It extracts the text from each log and calls the Amazon Bedrock API, specifically requesting an embedding from the Titan Text Embeddings model. Bedrock returns a numerical vector (e.g., a list of 1,536 numbers) that represents the log's meaning. Generates Vectors: Amazon Bedrock API Titan Text Embeddings Returns Enriched Logs: The function adds this vector to the original log data under a new field, like log_embedding, and returns the modified batch to the ingestion pipeline, which then stores it in OpenSearch. Returns Enriched Logs: log_embedding This function acts as a serverless, on-demand transformation engine, making our logs "smart" before they are even indexed. def generate_embedding(text): body = json.dumps({"inputText": text}) model_id = 'amazon.titan-embed-text-v2:0' try: response = bedrock_runtime.invoke_model( body=body, modelId=model_id, accept='application/json', contentType='application/json' ) response_body = json.loads(response.get('body').read()) return response_body.get('embedding') except Exception as e: print(f"Error generating embedding: {e}") return None def lambda_handler(event, context): for record in event: log_data = record.get('data', {}) log_message = log_data.get('message', '') if log_message: embedding = generate_embedding(log_message) if embedding: # Add the new embedding vector to the log data log_data['log_embedding'] = embedding ... def generate_embedding(text): body = json.dumps({"inputText": text}) model_id = 'amazon.titan-embed-text-v2:0' try: response = bedrock_runtime.invoke_model( body=body, modelId=model_id, accept='application/json', contentType='application/json' ) response_body = json.loads(response.get('body').read()) return response_body.get('embedding') except Exception as e: print(f"Error generating embedding: {e}") return None def lambda_handler(event, context): for record in event: log_data = record.get('data', {}) log_message = log_data.get('message', '') if log_message: embedding = generate_embedding(log_message) if embedding: # Add the new embedding vector to the log data log_data['log_embedding'] = embedding ... OpenSearch Serverless: The Vector Store We use an Amazon OpenSearch Serverless collection as our vector database. Its Vector search collection type is optimized for the high-performance similarity searches (k-NN) we need. vector database Vector search For this to work, we must configure the index mapping to treat our log_embedding field as a vector. This tells OpenSearch how to index the vector for efficient searching. log_embedding Here is a sample index mapping, which you would typically define in your Terraform configuration: "log_embedding": { "type": "knn_vector", "dimension": 1024, "method": { "name": "hnsw", "engine": "faiss", "space_type": "l2", "parameters": { "ef_construction": 512, "m": 16 } } } "log_embedding": { "type": "knn_vector", "dimension": 1024, "method": { "name": "hnsw", "engine": "faiss", "space_type": "l2", "parameters": { "ef_construction": 512, "m": 16 } } } ๐ก Key Configuration Details: "type": "knn_vector": This explicitly defines the log_embedding field for k-NN search. "dimension": 1024: This must match the output dimension of your embedding model. Amazon Titan Text Embeddings generates vectors of this size. "method": We specify the hnsw (Hierarchical Navigable Small World) algorithm, which provides an excellent balance of search speed and accuracy for large datasets. ๐ก Key Configuration Details: "type": "knn_vector": This explicitly defines the log_embedding field for k-NN search. "dimension": 1024: This must match the output dimension of your embedding model. Amazon Titan Text Embeddings generates vectors of this size. "method": We specify the hnsw (Hierarchical Navigable Small World) algorithm, which provides an excellent balance of search speed and accuracy for large datasets. "type": "knn_vector": This explicitly defines the log_embedding field for k-NN search. "type": "knn_vector" log_embedding "dimension": 1024: This must match the output dimension of your embedding model. Amazon Titan Text Embeddings generates vectors of this size. "dimension": 1024 must match "method": We specify the hnsw (Hierarchical Navigable Small World) algorithm, which provides an excellent balance of search speed and accuracy for large datasets. "method" hnsw ๐ ๏ธย Practical Implementation Guide ๐ ๏ธ The Git repository is structured using a modular approach, which is a best practice that promotes reusability and maintainability. โโโ README.md โโโ envs/ โ โโโ dev/ โ โ โโโ main.tf โ โ โโโ terraform.tfvars โโโ modules/ โ โโโ iam/ โ โโโ ingestion_pipeline/ โ โโโ embedding_lambda/ โ โโโ opensearch/ โโโ src/ โโโ embedding_lambda/ โโโ streamlit_app/ โโโ README.md โโโ envs/ โ โโโ dev/ โ โ โโโ main.tf โ โ โโโ terraform.tfvars โโโ modules/ โ โโโ iam/ โ โโโ ingestion_pipeline/ โ โโโ embedding_lambda/ โ โโโ opensearch/ โโโ src/ โโโ embedding_lambda/ โโโ streamlit_app/ ๐ก The repository separates the definition of the infrastructure (in a modules/ directory) from the configuration for specific deployments (in an envs/ directory). An engineer can deploy a complete development environment by simply running terraform apply within the envs/dev/ directory. ๐ก The repository separates the definition of the infrastructure (in a modules/ directory) from the configuration for specific deployments (in an envs/ directory). An engineer can deploy a complete development environment by simply running terraform apply within the envs/dev/ directory. ๐ก modules/ envs/ terraform apply envs/dev/ See Complete Code Repository for your reference. See Complete Code Repository for your reference. Complete Code Repository The User Interface and Prompt Engineering A simple web application built with Streamlit serves as the user-facing component. The quality of the final answer is heavily dependent on the quality of the prompt sent to the Claude model. A simple "Answer the question" prompt is insufficient. Instead, a robust prompt template is used to guide the model's behavior. File: src/streamlit_app/app.py (logic for generating the answer) File: src/streamlit_app/app.py def get_llm_response(question, logs): log_context = "\n".join(logs) prompt = f""" You are an expert AIOps assistant. Your task is to answer questions about application behavior based *only* on the provided log entries. Do not use any prior knowledge. If the answer cannot be found in the logs, you must state 'I cannot answer the question based on the provided logs.' Here are the relevant log entries retrieved: <logs> {log_context} </logs> Based on the logs above, please answer the following question: <question> {question} </question> """ body = json.dumps({ "anthropic_version": "bedrock-2023-05-31", "max_tokens": 4096, "messages": [{"role": "user", "content": prompt}] }) response = bedrock_runtime.invoke_model(body=body, modelId=BEDROCK_MODEL_ID_CLAUDE) response_body = json.loads(response.get('body').read()) return response_body['content']['text'] def get_llm_response(question, logs): log_context = "\n".join(logs) prompt = f""" You are an expert AIOps assistant. Your task is to answer questions about application behavior based *only* on the provided log entries. Do not use any prior knowledge. If the answer cannot be found in the logs, you must state 'I cannot answer the question based on the provided logs.' Here are the relevant log entries retrieved: <logs> {log_context} </logs> Based on the logs above, please answer the following question: <question> {question} </question> """ body = json.dumps({ "anthropic_version": "bedrock-2023-05-31", "max_tokens": 4096, "messages": [{"role": "user", "content": prompt}] }) response = bedrock_runtime.invoke_model(body=body, modelId=BEDROCK_MODEL_ID_CLAUDE) response_body = json.loads(response.get('body').read()) return response_body['content']['text'] ๐ก Prompting Tip: This prompt uses several best practices for Claude: it assigns a persona ("expert AIOps assistant"), provides clear constraints to prevent hallucination, and uses XML tags (<logs> and <question>) to structure the context, which significantly improves the model's ability to follow instructions. ๐ก Prompting Tip: This prompt uses several best practices for Claude: it assigns a persona ("expert AIOps assistant"), provides clear constraints to prevent hallucination, and uses XML tags (<logs> and <question>) to structure the context, which significantly improves the model's ability to follow instructions. ๐ก Prompting Tip: <logs> <question> Here are the latest model IDs we would use in 2025 ( replace BEDROCK_MODEL_ID_CLAUDE in the Python code ): For the highest capability (Opus): anthropic.claude-opus-4-1-20250805-v1:0 For a balance of performance and cost (Sonnet): anthropic.claude-sonnet-4-20250514-v1:0 Here are the latest model IDs we would use in 2025 ( replace BEDROCK_MODEL_ID_CLAUDE in the Python code ): BEDROCK_MODEL_ID_CLAUDE For the highest capability (Opus): anthropic.claude-opus-4-1-20250805-v1:0 For a balance of performance and cost (Sonnet): anthropic.claude-sonnet-4-20250514-v1:0 For the highest capability (Opus): anthropic.claude-opus-4-1-20250805-v1:0 For the highest capability (Opus): anthropic.claude-opus-4-1-20250805-v1:0 For a balance of performance and cost (Sonnet): anthropic.claude-sonnet-4-20250514-v1:0 For a balance of performance and cost (Sonnet): anthropic.claude-sonnet-4-20250514-v1:0 ๐ซย A New Paradigm for Observability ๐ซ This serverless RAG solution represents a new approach to log analysis, with different strategic considerations compared to traditional tools. Cost Model: Query vs. Ingestion The AIOps RAG architecture shifts the cost model. The cost of ingesting and creating embeddings for logs is relatively low. The primary cost driver is the LLM inference at query time. Each user question triggers an API call to the Claude model with a context of retrieved logs. This means the system's operational cost is driven not by log volume, but by query volume and complexity. This makes the system ideal for high-value, deep-investigation queries during incidents, rather than high-frequency, dashboard-style monitoring. query volume and complexity The Future of Observability: Beyond Q&A The vector embeddings generated during ingestion are a valuable data asset that can be leveraged for capabilities far beyond simple question-answering. Automated Semantic Anomaly Detection: By applying clustering algorithms to the stream of log embeddings, the system can identify the emergence of new clusters of logs that are semantically distinct from the normal baseline. This can detect novel error types or subtle shifts in application behavior that keyword-based alerting would miss. Automated Incident Summary Generation: The summarization capabilities of LLMs can be used to automatically generate a first draft of an incident summary. By retrieving logs from an incident's timeframe, the system can provide a timeline of key events, a likely root cause, and customer impact, drastically reducing the manual effort required for post-mortem analysis. Automated Semantic Anomaly Detection: By applying clustering algorithms to the stream of log embeddings, the system can identify the emergence of new clusters of logs that are semantically distinct from the normal baseline. This can detect novel error types or subtle shifts in application behavior that keyword-based alerting would miss. Automated Semantic Anomaly Detection: Automated Incident Summary Generation: The summarization capabilities of LLMs can be used to automatically generate a first draft of an incident summary. By retrieving logs from an incident's timeframe, the system can provide a timeline of key events, a likely root cause, and customer impact, drastically reducing the manual effort required for post-mortem analysis. Automated Incident Summary Generation: โ Conclusion โ The serverless RAG architecture presented here offers a transformative approach to log analysis on AWS. By combining the scalable vector search of Amazon OpenSearch Serverless with the advanced reasoning of foundation models on Amazon Bedrock, organizations can build powerful, conversational interfaces for their observability data. This approach lowers the barrier to deep log analysis, empowers a wider range of team members to participate in incident investigation, and opens the door to a new class of intelligent AIOps tools. ๐ Resources ๐ Complete Code Repository What is Retrieval-Augmented Generation (RAG)? https://aws.amazon.com/what-is/retrieval-augmented-generation/ Anthropic's Claude on Amazon Bedrock https://aws.amazon.com/bedrock/anthropic/ Vector Engine for Amazon OpenSearch Serverless https://aws.amazon.com/blogs/aws/vector-engine-for-amazon-opensearch-serverless-is-now-generally-available/ Amazon OpenSearch Ingestion https://aws.amazon.com/opensearch-service/features/ingestion/ Prompt Engineering for Anthropic's Claude https://aws.amazon.com/blogs/machine-learning/prompt-engineering-techniques-and-best-practices-learn-by-doing-with-anthropics-claude-3-on-amazon-bedrock/ ๐ Complete Code Repository Complete Code Repository What is Retrieval-Augmented Generation (RAG)? https://aws.amazon.com/what-is/retrieval-augmented-generation/ What is Retrieval-Augmented Generation (RAG)? https://aws.amazon.com/what-is/retrieval-augmented-generation/ https://aws.amazon.com/what-is/retrieval-augmented-generation/ https://aws.amazon.com/what-is/retrieval-augmented-generation/ Anthropic's Claude on Amazon Bedrock https://aws.amazon.com/bedrock/anthropic/ Anthropic's Claude on Amazon Bedrock https://aws.amazon.com/bedrock/anthropic/ https://aws.amazon.com/bedrock/anthropic/ https://aws.amazon.com/bedrock/anthropic/ Vector Engine for Amazon OpenSearch Serverless https://aws.amazon.com/blogs/aws/vector-engine-for-amazon-opensearch-serverless-is-now-generally-available/ Vector Engine for Amazon OpenSearch Serverless https://aws.amazon.com/blogs/aws/vector-engine-for-amazon-opensearch-serverless-is-now-generally-available/ https://aws.amazon.com/blogs/aws/vector-engine-for-amazon-opensearch-serverless-is-now-generally-available/ https://aws.amazon.com/blogs/aws/vector-engine-for-amazon-opensearch-serverless-is-now-generally-available/ Amazon OpenSearch Ingestion https://aws.amazon.com/opensearch-service/features/ingestion/ Amazon OpenSearch Ingestion https://aws.amazon.com/opensearch-service/features/ingestion/ https://aws.amazon.com/opensearch-service/features/ingestion/ https://aws.amazon.com/opensearch-service/features/ingestion/ Prompt Engineering for Anthropic's Claude https://aws.amazon.com/blogs/machine-learning/prompt-engineering-techniques-and-best-practices-learn-by-doing-with-anthropics-claude-3-on-amazon-bedrock/ Prompt Engineering for Anthropic's Claude https://aws.amazon.com/blogs/machine-learning/prompt-engineering-techniques-and-best-practices-learn-by-doing-with-anthropics-claude-3-on-amazon-bedrock/ https://aws.amazon.com/blogs/machine-learning/prompt-engineering-techniques-and-best-practices-learn-by-doing-with-anthropics-claude-3-on-amazon-bedrock/ https://aws.amazon.com/blogs/machine-learning/prompt-engineering-techniques-and-best-practices-learn-by-doing-with-anthropics-claude-3-on-amazon-bedrock/