How to Build a Production RAG System on AWS From Scratch (Complete Beginner’s Guide)
RAG (Retrieval-Augmented Generation) lets AI answer questions using YOUR organisation’s documents, not just what it was trained on. This guide teaches you to build a production-ready RAG system on AWS Bedrock from scratch. No ML experience needed. Every command included.
The Problem RAG Solves , And Why Every Organisation Needs It
Here is a scenario that plays out in every organisation.
A new employee joins your company. They have a question: “What is our policy on expense reimbursements?” They search the internal wiki. They get 47 results. They ask a colleague. The colleague is not sure and points them to a SharePoint folder with 200 documents. They spend 45 minutes reading through PDFs before finding the answer buried in paragraph 12 of a document called HR-Policy-V3-FINAL-updated-2024.pdf.
Now multiply that by every new employee, every contractor, every team member who needs to find information they know exists somewhere. McKinsey estimates employees spend an average of 2.5 hours per day searching for information. In an organisation of 500 people, that is 1,250 hours of lost productivity every single day.
RAG fixes this. With a properly built RAG system, that same employee types: “What is our expense reimbursement policy?” and gets an accurate, cited answer in under 3 seconds, drawn directly from your actual policy documents.
This is not theoretical. This is deployed and working at enterprises globally right now. And by the end of this article, you will have built exactly this for your organisation.
What Is RAG? (Explained Simply)
RAG stands for Retrieval-Augmented Generation. The name sounds complicated. The concept is simple.
A standard AI model (like Claude or GPT-4) knows only what it was trained on, information up to its training cutoff date, from public sources on the internet. It knows nothing about your company’s internal documents, your products, your policies, or your customers.
RAG solves this by adding a retrieval step before the AI generates an answer:
WITHOUT RAG:
User asks question → AI answers from training data only
Problem: AI knows nothing about your organisation
WITH RAG:
User asks question
↓
Search your documents for relevant content
↓
Give relevant content + question to AI
↓
AI answers using YOUR documents
Result: Accurate answers grounded in your actual information
The AI does not guess. It reads the relevant part of your document and answers based on what it finds. If the answer is not in your documents, it says so rather than making something up.
What We Are Building
A complete, production-ready RAG system that:
- Ingests your documents (PDFs, Word files, text files) from S3
- Chunks and embeds them into a searchable vector knowledge base
- Accepts natural language questions via an API
- Retrieves the most relevant document sections
- Generates accurate answers with citations showing which document the answer came from
- Runs serverlessly on AWS, no servers to manage
Architecture:
Your Documents (PDF, Word, TXT)
↓
Amazon S3
(document storage)
↓
Bedrock Knowledge Base
- Chunks documents into sections
- Embeds each section into vectors
- Stores vectors in OpenSearch Serverless
↓
Query API (Lambda + API Gateway)
↓
User gets answer + citations
AWS services used:
- Amazon S3: stores your documents
- Amazon Bedrock Knowledge Bases: managed RAG (chunking and embedding and retrieval)
- Amazon OpenSearch Serverless: vector database (created automatically by Bedrock)
- AWS Lambda: handles queries and formats responses
- Amazon API Gateway: gives the Lambda an HTTPS endpoint
- Amazon Titan Embeddings: converts text to vectors for search
What you need:
- AWS account (free tier, note OpenSearch Serverless costs ~$0.24/hour when active)
- AWS CLI configured
- Some PDF or text documents to test with
- About 90 minutes
Part 1: Understanding the Key Concepts
Before writing a single command, let us understand the three concepts that make RAG work. You do not need to understand the math, just what each step does.
Concept 1: Chunking
Your documents are too long to fit in a single AI prompt. A 50-page policy document might be 25,000 words. AI models have context limits (how much text they can process at once), and more importantly, sending 50 pages for every question is expensive and slow.
Chunking splits your documents into smaller pieces, typically 300–500 words each, with a small overlap between chunks so no sentence loses its context at a boundary.
Original document (50 pages):
"Section 1: Introduction... Section 2: Policy... Section 3: Procedures..."
After chunking (each ~400 words with 50-word overlap):
Chunk 1: "Section 1: Introduction... [first 400 words]"
Chunk 2: "[last 50 words of chunk 1]... Section 2: Policy... [next 350 words]"
Chunk 3: "[last 50 words of chunk 2]... [next 400 words]"
...and so on
Concept 2: Embeddings
Once your document is chunked, each chunk is converted into a vector, a list of numbers that represents its meaning mathematically.
The magic is that text with similar meaning produces similar vectors, even if the words are different. So the chunk about “expense reimbursement policy” will have a vector close to the question “how do I get reimbursed for travel costs”, even though those exact words do not appear together.
This is what makes semantic search possible: finding relevant content by meaning, not just keyword matching.
Concept 3: Retrieval
When a user asks a question, the question is also converted to a vector. The system then searches the vector database for the chunks whose vectors are closest to the question vector, these are the most semantically relevant chunks.
The top 3-5 chunks are retrieved and included in the prompt sent to the AI model.
Question: "How do I claim expenses for a business trip?"
↓
Question → vector → [0.234, -0.891, 0.127, ...]
↓
Search vector DB for similar vectors
↓
Top 3 matching chunks retrieved:
- Chunk from HR-Policy.pdf: "Section 4.2: Travel Expense Claims..."
- Chunk from Finance-Guide.pdf: "Business Travel Reimbursement..."
- Chunk from FAQ.pdf: "Q: What receipts do I need for expense claims..."
↓
AI reads these 3 chunks + question → generates answer with citations
Now you understand RAG. Let us build it.
Part 2: Prepare Your Documents
Step 1: Create the S3 Bucket
bash
# Set your variables — change these to match your setup
REGION="eu-west-1"
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
BUCKET_NAME="my-rag-documents-$(date +%s)"
# Create the S3 bucket
aws s3 mb s3://$BUCKET_NAME --region $REGION
# Block all public access — documents should never be public
aws s3api put-public-access-block
--bucket $BUCKET_NAME
--public-access-block-configuration
BlockPublicAcls=true,IgnorePublicAcls=true,
BlockPublicPolicy=true,RestrictPublicBuckets=true
echo "S3 bucket created: $BUCKET_NAME"
echo "Save this: export BUCKET_NAME=$BUCKET_NAME"
Step 2: Upload Your Documents
If you have your own documents (PDFs, Word files, text files), upload them now. If not, create some sample documents to test with:
bash
# Create sample documents if you do not have your own
mkdir -p sample-docs
cat > sample-docs/expense-policy.txt << 'EOF'
EXPENSE REIMBURSEMENT POLICY
Last Updated: January 2026
1. OVERVIEW
This policy governs the reimbursement of business expenses incurred by employees
in the course of their work duties. All expenses must be pre-approved where
indicated and submitted within 30 days of being incurred.
2. ELIGIBLE EXPENSES
The following expenses are eligible for reimbursement:
- Business travel (flights, trains, taxis to/from client sites)
- Accommodation (up to £150 per night in London, £100 elsewhere in the UK)
- Business meals (up to £50 per person, must have 2+ attendees)
- Client entertainment (pre-approval required, up to £100 per person)
- Home office equipment (pre-approval required for items over £200)
- Professional development courses (pre-approval required)
3. HOW TO SUBMIT CLAIMS
All expense claims must be submitted through the Expenses portal at
expenses.company.internal within 30 days of the expense being incurred.
Required documentation:
- Original receipts for all expenses over £10
- Business justification for each expense
- Names of attendees for meals and entertainment
- Manager approval for expenses over £500
4. PAYMENT TIMELINE
Approved expenses are reimbursed in the next monthly payroll run, provided
the claim is submitted by the 15th of the month. Claims submitted after the
15th will be processed the following month.
5. INELIGIBLE EXPENSES
The following will not be reimbursed:
- Personal travel or accommodation
- Alcohol consumed outside of approved client entertainment
- Fines, penalties, or legal fees
- Personal mobile phone contracts (BYOD allowance is separate)
- First-class travel without VP-level approval
EOF
cat > sample-docs/remote-work-policy.txt << 'EOF'
REMOTE WORK POLICY
Last Updated: March 2026
1. ELIGIBILITY
All permanent employees who have completed their 3-month probationary period
are eligible for remote work arrangements. Contractors and temporary staff
require manager approval on a case-by-case basis.
2. HYBRID WORK ARRANGEMENT
The company operates a hybrid model requiring employees to be in the office:
- Minimum 3 days per week for team members
- Minimum 2 days per week for senior individual contributors
- As required for managers (typically 4 days per week)
Office days must include Tuesday and Wednesday (core collaboration days).
3. HOME OFFICE REQUIREMENTS
Employees working remotely must have:
- A dedicated workspace free from significant distractions
- Reliable broadband connection (minimum 25 Mbps download)
- Company-issued laptop (personal devices not permitted for security reasons)
- A webcam and headset suitable for video calls
4. EQUIPMENT AND EXPENSES
The company provides:
- Laptop and peripherals (mouse, keyboard) upon joining
- £400 home office setup allowance (one-time, claim through expenses)
- £30 per month broadband contribution (add to monthly expenses)
Employees are responsible for their own desk and chair.
5. AVAILABILITY REQUIREMENTS
Remote employees must:
- Be available during core hours: 9am-5pm in their local timezone
- Respond to messages within 2 hours during working hours
- Attend all required meetings with camera on unless exceptional circumstances
- Notify their manager in advance if unavailable during core hours
EOF
cat > sample-docs/annual-leave-policy.txt << 'EOF'
ANNUAL LEAVE POLICY
Last Updated: February 2026
1. ENTITLEMENT
Full-time permanent employees receive:
- 25 days annual leave per year (pro-rated for part-time employees)
- 8 UK bank holidays (fixed days off)
- 1 additional day for each year of service, up to 5 additional days
- Birthday leave (1 day, to be taken within the birthday month)
2. HOW TO REQUEST LEAVE
Annual leave must be requested through the HR portal at hr.company.internal.
Notice requirements:
- Up to 3 days: minimum 1 week notice
- 4-9 days: minimum 2 weeks notice
- 10+ consecutive days: minimum 4 weeks notice
Leave requests are subject to manager approval and team capacity.
3. CARRY OVER
Up to 5 days of unused annual leave may be carried over to the following year.
Carried-over leave must be used by 31 March of the following year or it is forfeited.
Employees may purchase up to 5 additional days of leave per year through salary
sacrifice (request by 1 December for the following year).
4. SICKNESS DURING ANNUAL LEAVE
If an employee falls ill during a period of annual leave and provides a medical
certificate, the days of illness may be recredited as annual leave. The employee
must notify their manager on the first day of illness.
5. LEAVING THE COMPANY
On leaving the company, employees will be paid for any unused annual leave accrued
in the current leave year. Employees who have taken more leave than accrued will
have the excess deducted from their final salary.
EOF
# Upload documents to S3
aws s3 cp sample-docs/ s3://$BUCKET_NAME/documents/ --recursive
echo "Documents uploaded:"
aws s3 ls s3://$BUCKET_NAME/documents/
Part 3: Create the Bedrock Knowledge Base
This is the core of the RAG system. Bedrock Knowledge Bases handles everything: chunking, embedding, and storing your documents in a searchable vector database.
Step 1: Create the IAM Role for Bedrock
Bedrock needs permission to read from your S3 bucket.
bash
# Create trust policy for Bedrock
cat > bedrock-trust-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {
"Service": "bedrock.amazonaws.com"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"aws:SourceAccount": "YOUR_ACCOUNT_ID"
}
}
}]
}
EOF
# Replace placeholder with actual account ID
sed -i "s/YOUR_ACCOUNT_ID/$ACCOUNT_ID/g" bedrock-trust-policy.json
# Create the role
aws iam create-role
--role-name bedrock-knowledge-base-role
--assume-role-policy-document file://bedrock-trust-policy.json
# Create permissions policy
cat > bedrock-kb-policy.json << EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::$BUCKET_NAME",
"arn:aws:s3:::$BUCKET_NAME/*"
]
},
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel"
],
"Resource": "arn:aws:bedrock:$REGION::foundation-model/amazon.titan-embed-text-v2:0"
},
{
"Effect": "Allow",
"Action": [
"aoss:APIAccessAll"
],
"Resource": "*"
}
]
}
EOF
aws iam put-role-policy
--role-name bedrock-knowledge-base-role
--policy-name bedrock-kb-permissions
--policy-document file://bedrock-kb-policy.json
KB_ROLE_ARN=$(aws iam get-role
--role-name bedrock-knowledge-base-role
--query 'Role.Arn'
--output text)
echo "Bedrock role ARN: $KB_ROLE_ARN"
Step 2: Create the Knowledge Base via AWS Console
The Knowledge Base creation is easiest via the console because it automatically sets up OpenSearch Serverless for you:
1. Open AWS Console → Amazon Bedrock → Knowledge Bases → Create knowledge base
2. Knowledge base details:
Name: company-knowledge-base
Description: Internal company policies and documentation
IAM Role: bedrock-knowledge-base-role (select the one you created)
3. Data source:
Type: Amazon S3
S3 URI: s3://YOUR_BUCKET_NAME/documents/
Name: company-documents
4. Embeddings model:
Select: Titan Text Embeddings V2
(This converts your text into vectors)
5. Vector store:
Select: Quick create a new vector store
Type: Amazon OpenSearch Serverless
(Bedrock creates and configures this automatically)
6. Review and create → Create knowledge base
Wait 3-5 minutes for creation to complete.
Get the Knowledge Base ID:
bash
KB_ID=$(aws bedrock-agent list-knowledge-bases
--region $REGION
--query 'knowledgeBaseSummaries[?name==`company-knowledge-base`].knowledgeBaseId'
--output text)
echo "Knowledge Base ID: $KB_ID"
echo "Save this: export KB_ID=$KB_ID"
Step 3: Sync Your Documents
bash
# Get the data source ID
DATA_SOURCE_ID=$(aws bedrock-agent list-data-sources
--knowledge-base-id $KB_ID
--region $REGION
--query 'dataSourceSummaries[0].dataSourceId'
--output text)
echo "Data Source ID: $DATA_SOURCE_ID"
# Start the ingestion job (chunks, embeds, and indexes your documents)
INGESTION_JOB_ID=$(aws bedrock-agent start-ingestion-job
--knowledge-base-id $KB_ID
--data-source-id $DATA_SOURCE_ID
--region $REGION
--query 'ingestionJob.ingestionJobId'
--output text)
echo "Ingestion job started: $INGESTION_JOB_ID"
echo "Waiting for ingestion to complete..."
# Poll until complete
while true; do
STATUS=$(aws bedrock-agent get-ingestion-job
--knowledge-base-id $KB_ID
--data-source-id $DATA_SOURCE_ID
--ingestion-job-id $INGESTION_JOB_ID
--region $REGION
--query 'ingestionJob.status'
--output text)
echo "Status: $STATUS"
if [ "$STATUS" = "COMPLETE" ]; then
echo "Ingestion complete. Your documents are now searchable."
break
elif [ "$STATUS" = "FAILED" ]; then
echo "Ingestion failed. Check the AWS console for details."
exit 1
fi
sleep 15
done
Part 4: Test the Knowledge Base Directly
Before building the API, test that the knowledge base works:
bash
# Test: ask a question directly using the AWS CLI
aws bedrock-agent-runtime retrieve-and-generate
--region $REGION
--input '{"text": "What is the expense reimbursement policy for business travel?"}'
--retrieve-and-generate-configuration "{
"type": "KNOWLEDGE_BASE",
"knowledgeBaseConfiguration": {
"knowledgeBaseId": "$KB_ID",
"modelArn": "arn:aws:bedrock:$REGION::foundation-model/anthropic.claude-3-haiku-20240307-v1:0"
}
}" | python3 -m json.tool
You should see an answer like:
json
{
"output": {
"text": "For business travel, you can claim reimbursement for flights, trains, and taxis to client sites. Accommodation is reimbursed up to £150 per night in London and £100 per night elsewhere in the UK. All claims must be submitted within 30 days through the Expenses portal."
},
"citations": [
{
"retrievedReferences": [
{
"content": {
"text": "Business travel (flights, trains, taxis to/from client sites)..."
},
"location": {
"s3Location": {
"uri": "s3://your-bucket/documents/expense-policy.txt"
}
}
}
]
}
]
}
The citation shows exactly which document the answer came from. This is one of the most valuable features of RAG, your users can verify the source.
Part 5: Build the Query Lambda Function
Now we wrap the knowledge base in a Lambda function that handles validation, formats responses cleanly, and logs everything.
python
# rag_handler.py
# Production RAG query handler
import boto3
import json
import logging
import os
import time
from datetime import datetime, timezone
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Clients — initialised outside handler for warm reuse
bedrock_agent = boto3.client('bedrock-agent-runtime', region_name='eu-west-1')
# Configuration from environment variables
KNOWLEDGE_BASE_ID = os.environ.get('KNOWLEDGE_BASE_ID', '')
MODEL_ARN = f"arn:aws:bedrock:eu-west-1::foundation-model/anthropic.claude-3-haiku-20240307-v1:0"
MAX_QUESTION_LENGTH = 1000
MIN_QUESTION_LENGTH = 3
NUM_RESULTS = 5 # How many document chunks to retrieve
def log_event(level: str, event: str, **kwargs):
"""Structured JSON logging for CloudWatch"""
entry = {
"level": level,
"event": event,
"timestamp": datetime.now(timezone.utc).isoformat(),
**kwargs
}
getattr(logger, level.lower(), logger.info)(json.dumps(entry))
def validate_question(question: str) -> tuple[bool, str]:
"""Validate the user's question"""
if not question or not isinstance(question, str):
return False, "question must be a non-empty string"
question = question.strip()
if len(question) < MIN_QUESTION_LENGTH:
return False, f"question must be at least {MIN_QUESTION_LENGTH} characters"
if len(question) > MAX_QUESTION_LENGTH:
return False, f"question must not exceed {MAX_QUESTION_LENGTH} characters"
return True, question
def format_citations(citations: list) -> list:
"""
Extract and format citation information from Bedrock response.
Returns a clean list of sources that users can reference.
"""
formatted = []
seen_sources = set() # Avoid duplicate citations
for citation in citations:
for ref in citation.get('retrievedReferences', []):
# Get source document location
location = ref.get('location', {})
s3_uri = location.get('s3Location', {}).get('uri', '')
# Extract just the filename from the full S3 URI
# s3://bucket-name/documents/expense-policy.txt → expense-policy.txt
if s3_uri and s3_uri not in seen_sources:
seen_sources.add(s3_uri)
filename = s3_uri.split('/')[-1]
# Get a short excerpt from the retrieved content
content_text = ref.get('content', {}).get('text', '')
excerpt = content_text[:200].strip()
if len(content_text) > 200:
excerpt += '...'
formatted.append({
'document': filename,
'source_uri': s3_uri,
'excerpt': excerpt
})
return formatted
def query_knowledge_base(question: str) -> dict:
"""
Query the Bedrock Knowledge Base and return answer with citations.
"""
# Custom prompt template to improve answer quality
# This instructs Claude on how to use the retrieved context
prompt_template = """You are a helpful assistant for company employees.
Answer the question using ONLY the information provided in the search results below.
Important rules:
- If the answer is clearly in the search results, provide it directly and concisely
- If the search results do not contain enough information to answer the question, say:
"I don't have specific information about that in the available documents. Please contact HR or your manager."
- Never make up information not found in the search results
- Keep answers professional and easy to understand
- If there are specific numbers, dates, or limits mentioned, include them exactly
$search_results$
Question: $query$
Answer:"""
response = bedrock_agent.retrieve_and_generate(
input={'text': question},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': KNOWLEDGE_BASE_ID,
'modelArn': MODEL_ARN,
'retrievalConfiguration': {
'vectorSearchConfiguration': {
'numberOfResults': NUM_RESULTS
}
},
'generationConfiguration': {
'promptTemplate': {
'textPromptTemplate': prompt_template
},
'inferenceConfig': {
'textInferenceConfig': {
'maxTokens': 800,
'temperature': 0.1 # Low temperature = factual, consistent answers
}
}
}
}
}
)
answer = response['output']['text']
citations = format_citations(response.get('citations', []))
return {
'answer': answer,
'citations': citations,
'session_id': response.get('sessionId', '')
}
def build_response(status_code: int, body: dict, request_id: str) -> dict:
"""Build a consistent HTTP response"""
return {
'statusCode': status_code,
'headers': {
'Content-Type': 'application/json',
'X-Request-ID': request_id,
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Headers': 'Content-Type',
'Access-Control-Allow-Methods': 'POST,OPTIONS'
},
'body': json.dumps(body)
}
def lambda_handler(event, context):
"""Main Lambda handler"""
request_id = context.aws_request_id
start_time = time.time()
# Handle CORS preflight
http_method = event.get('requestContext', {}).get('http', {}).get('method', '')
if http_method == 'OPTIONS':
return build_response(200, {}, request_id)
log_event("INFO", "query_received", request_id=request_id)
# Parse request body
try:
body = json.loads(event.get('body', '{}'))
except json.JSONDecodeError:
return build_response(400, {
'success': False,
'error': 'Request body must be valid JSON'
}, request_id)
question = body.get('question', '')
# Validate
is_valid, result = validate_question(question)
if not is_valid:
return build_response(400, {
'success': False,
'error': result
}, request_id)
question = result # cleaned question
# Query the knowledge base
try:
rag_result = query_knowledge_base(question)
duration_ms = int((time.time() - start_time) * 1000)
log_event("INFO", "query_completed",
request_id=request_id,
question_length=len(question),
answer_length=len(rag_result['answer']),
citation_count=len(rag_result['citations']),
duration_ms=duration_ms)
return build_response(200, {
'success': True,
'answer': rag_result['answer'],
'citations': rag_result['citations'],
'metadata': {
'citation_count': len(rag_result['citations']),
'duration_ms': duration_ms,
'request_id': request_id
}
}, request_id)
except bedrock_agent.exceptions.ThrottlingException:
log_event("WARN", "throttling", request_id=request_id)
return build_response(503, {
'success': False,
'error': 'Service temporarily busy. Please retry in a moment.'
}, request_id)
except Exception as e:
log_event("ERROR", "query_failed",
request_id=request_id,
error_type=type(e).__name__,
error_message=str(e))
return build_response(500, {
'success': False,
'error': 'Something went wrong. Please try again.'
}, request_id)
Part 6: Deploy the Lambda and API Gateway
bash
# Package and deploy Lambda
zip -j rag-function.zip rag_handler.py
# Create Lambda IAM role
cat > lambda-rag-trust.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"Service": "lambda.amazonaws.com"},
"Action": "sts:AssumeRole"
}]
}
EOF
aws iam create-role
--role-name rag-lambda-role
--assume-role-policy-document file://lambda-rag-trust.json
aws iam attach-role-policy
--role-name rag-lambda-role
--policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
# Add Bedrock Knowledge Base permission
cat > rag-lambda-policy.json << EOF
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"bedrock:Retrieve",
"bedrock:RetrieveAndGenerate",
"bedrock:InvokeModel"
],
"Resource": [
"arn:aws:bedrock:$REGION:$ACCOUNT_ID:knowledge-base/$KB_ID",
"arn:aws:bedrock:$REGION::foundation-model/*"
]
}]
}
EOF
aws iam put-role-policy
--role-name rag-lambda-role
--policy-name rag-bedrock-access
--policy-document file://rag-lambda-policy.json
LAMBDA_ROLE_ARN=$(aws iam get-role
--role-name rag-lambda-role
--query 'Role.Arn' --output text)
# Wait for role propagation
sleep 10
# Create Lambda function
aws lambda create-function
--function-name rag-query-handler
--runtime python3.12
--role $LAMBDA_ROLE_ARN
--handler rag_handler.lambda_handler
--zip-file fileb://rag-function.zip
--timeout 30
--memory-size 512
--region $REGION
--environment Variables="{"KNOWLEDGE_BASE_ID":"$KB_ID"}"
echo "Lambda deployed"
# Create API Gateway
API_ID=$(aws apigatewayv2 create-api
--name "rag-api"
--protocol-type HTTP
--cors-configuration
AllowOrigins='*'
AllowHeaders='Content-Type'
AllowMethods='POST,OPTIONS'
--region $REGION
--query 'ApiId' --output text)
# Create integration
INTEGRATION_ID=$(aws apigatewayv2 create-integration
--api-id $API_ID
--integration-type AWS_PROXY
--integration-uri arn:aws:lambda:$REGION:$ACCOUNT_ID:function:rag-query-handler
--payload-format-version 2.0
--region $REGION
--query 'IntegrationId' --output text)
# Create route
aws apigatewayv2 create-route
--api-id $API_ID
--route-key 'POST /ask'
--target integrations/$INTEGRATION_ID
--region $REGION
# Deploy
aws apigatewayv2 create-stage
--api-id $API_ID
--stage-name production
--auto-deploy
--region $REGION
# Permission for API Gateway to invoke Lambda
aws lambda add-permission
--function-name rag-query-handler
--statement-id allow-api-gateway
--action lambda:InvokeFunction
--principal apigateway.amazonaws.com
--source-arn "arn:aws:execute-api:$REGION:$ACCOUNT_ID:$API_ID/*/*"
--region $REGION
API_URL=$(aws apigatewayv2 get-api
--api-id $API_ID
--region $REGION
--query 'ApiEndpoint' --output text)
echo ""
echo "==============================="
echo "RAG API is live at:"
echo "$API_URL/production/ask"
echo "==============================="
Part 7: Test Your RAG System
bash
API_ENDPOINT="$API_URL/production/ask"
echo "=== Test 1: Expense policy question ==="
curl -s -X POST $API_ENDPOINT
-H "Content-Type: application/json"
-d '{"question": "What is the maximum hotel rate I can claim for a trip to London?"}'
| python3 -m json.tool
echo ""
echo "=== Test 2: Annual leave question ==="
curl -s -X POST $API_ENDPOINT
-H "Content-Type: application/json"
-d '{"question": "How many days notice do I need to give for a 2-week holiday?"}'
| python3 -m json.tool
echo ""
echo "=== Test 3: Remote work question ==="
curl -s -X POST $API_ENDPOINT
-H "Content-Type: application/json"
-d '{"question": "Do I need to be in the office on Tuesdays?"}'
| python3 -m json.tool
echo ""
echo "=== Test 4: Question not in documents ==="
curl -s -X POST $API_ENDPOINT
-H "Content-Type: application/json"
-d '{"question": "What is the capital of France?"}'
| python3 -m json.tool
Expected response for Test 1:
json
{
"success": true,
"answer": "For business trips to London, accommodation is reimbursed up to £150 per night. For other UK locations, the limit is £100 per night. All accommodation claims must be submitted through the Expenses portal within 30 days.",
"citations": [
{
"document": "expense-policy.txt",
"source_uri": "s3://your-bucket/documents/expense-policy.txt",
"excerpt": "Accommodation (up to £150 per night in London, £100 elsewhere in the UK)..."
}
],
"metadata": {
"citation_count": 1,
"duration_ms": 2341,
"request_id": "abc123"
}
}
Expected response for Test 4 (not in documents):
json
{
"success": true,
"answer": "I don't have specific information about that in the available documents. Please contact HR or your manager.",
"citations": [],
"metadata": {
"citation_count": 0,
"duration_ms": 1823,
"request_id": "xyz789"
}
}
This is the critical difference from a standard AI: when the answer is not in your documents, the system says so honestly rather than guessing.
Part 8: Add More Documents and Keep It Current
The RAG system only knows about documents you have ingested. Add new documents any time and re-sync:
bash
# Upload a new document
aws s3 cp new-policy.pdf s3://$BUCKET_NAME/documents/
# Trigger a new ingestion job to index the new document
aws bedrock-agent start-ingestion-job
--knowledge-base-id $KB_ID
--data-source-id $DATA_SOURCE_ID
--region $REGION
echo "New document ingestion started"
For production, set up an automatic sync using EventBridge:
bash
# Create a rule that syncs every night at midnight UTC
aws events put-rule
--name rag-nightly-sync
--schedule-expression "cron(0 0 * * ? *)"
--state ENABLED
--region $REGION
This ensures new documents added to S3 are automatically indexed overnight without manual intervention.
Part 9: Build a Simple Web Interface
Your API is working. Now give it a user interface so anyone in your organisation can use it without writing code:
html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Company Knowledge Base</title>
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
background: #f8fafc;
min-height: 100vh;
padding: 40px 20px;
}
.container { max-width: 720px; margin: 0 auto; }
h1 { font-size: 1.75rem; color: #0f172a; margin-bottom: 6px; }
.subtitle { color: #64748b; margin-bottom: 32px; }
.search-box {
display: flex;
gap: 10px;
margin-bottom: 24px;
}
input {
flex: 1;
padding: 14px 16px;
border: 1px solid #e2e8f0;
border-radius: 8px;
font-size: 1rem;
outline: none;
transition: border-color 0.2s;
}
input:focus { border-color: #3b82f6; box-shadow: 0 0 0 3px rgba(59,130,246,0.1); }
button {
padding: 14px 24px;
background: #3b82f6;
color: white;
border: none;
border-radius: 8px;
font-size: 1rem;
font-weight: 600;
cursor: pointer;
white-space: nowrap;
}
button:hover { background: #2563eb; }
button:disabled { background: #94a3b8; cursor: not-allowed; }
.answer-card {
background: white;
border: 1px solid #e2e8f0;
border-radius: 10px;
overflow: hidden;
display: none;
}
.answer-card.visible { display: block; }
.answer-header {
padding: 14px 20px;
background: #f1f5f9;
border-bottom: 1px solid #e2e8f0;
font-size: 0.85rem;
font-weight: 600;
color: #475569;
text-transform: uppercase;
letter-spacing: 0.5px;
}
.answer-body {
padding: 20px;
line-height: 1.7;
color: #334155;
}
.citations {
padding: 16px 20px;
border-top: 1px solid #e2e8f0;
background: #f8fafc;
}
.citations h3 {
font-size: 0.8rem;
color: #64748b;
text-transform: uppercase;
letter-spacing: 0.5px;
margin-bottom: 10px;
}
.citation-item {
padding: 10px 12px;
background: white;
border: 1px solid #e2e8f0;
border-radius: 6px;
margin-bottom: 8px;
font-size: 0.85rem;
}
.citation-doc { font-weight: 600; color: #3b82f6; margin-bottom: 4px; }
.citation-excerpt { color: #64748b; font-style: italic; }
.loading { color: #3b82f6; padding: 20px; text-align: center; display: none; }
.error { padding: 16px 20px; background: #fef2f2; border: 1px solid #fecaca; border-radius: 8px; color: #dc2626; display: none; }
.no-citations { color: #64748b; font-size: 0.85rem; font-style: italic; }
</style>
</head>
<body>
<div class="container">
<h1> Company Knowledge Base</h1>
<p class="subtitle">Ask any question about company policies, procedures, and guidelines.</p>
<div class="search-box">
<input
type="text"
id="questionInput"
placeholder="e.g. How many days notice do I need for annual leave?"
onkeypress="if(event.key==='Enter') askQuestion()"
/>
<button onclick="askQuestion()" id="askBtn">Ask</button>
</div>
<div class="error" id="errorDiv"></div>
<div class="loading" id="loadingDiv"> Searching company documents...</div>
<div class="answer-card" id="answerCard">
<div class="answer-header">Answer</div>
<div class="answer-body" id="answerBody"></div>
<div class="citations" id="citationsDiv">
<h3> Sources</h3>
<div id="citationsList"></div>
</div>
</div>
</div>
<script>
// Replace with your actual API URL
const API_URL = 'YOUR_API_GATEWAY_URL/production/ask';
async function askQuestion() {
const question = document.getElementById('questionInput').value.trim();
if (!question) return;
const btn = document.getElementById('askBtn');
const loading = document.getElementById('loadingDiv');
const error = document.getElementById('errorDiv');
const card = document.getElementById('answerCard');
btn.disabled = true;
loading.style.display = 'block';
error.style.display = 'none';
card.classList.remove('visible');
try {
const res = await fetch(API_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ question })
});
const data = await res.json();
if (!res.ok || !data.success) throw new Error(data.error || 'Request failed');
document.getElementById('answerBody').textContent = data.answer;
const citationsList = document.getElementById('citationsList');
if (data.citations && data.citations.length > 0) {
citationsList.innerHTML = data.citations.map(c => `
<div class="citation-item">
<div class="citation-doc">📄 ${c.document}</div>
<div class="citation-excerpt">"${c.excerpt}"</div>
</div>
`).join('');
} else {
citationsList.innerHTML = '<p class="no-citations">No specific sources cited for this answer.</p>';
}
card.classList.add('visible');
} catch (err) {
error.textContent = `Error: ${err.message}`;
error.style.display = 'block';
} finally {
btn.disabled = false;
loading.style.display = 'none';
}
}
</script>
</body>
</html>
Save this as index.html, replace YOUR_API_GATEWAY_URL, and host it on S3 static website hosting (covered in Article 4 of this series).
What You Have Built , And What It Means for Your Organisation
Let us step back and look at what this system does:
Before RAG: Employee has a question → 45 minutes searching documents → finds the answer (if lucky)
After RAG: Employee has a question → types it → gets a cited answer in 3 seconds
For a 100-person organisation where employees each save 30 minutes per day searching for information, that is 50 person-hours saved per day, roughly 3 full-time employees’ worth of time redirected from searching to actual work.
The citations are not just nice to have. They are essential for enterprise trust. Your employees can see exactly which document the answer came from and verify it themselves. The AI is not guessing, it is reading your actual documents and reporting what they say.
Common Issues and How to Fix Them
“The knowledge base is not finding relevant documents” The chunking strategy might not be optimal for your document types. In the AWS console → Knowledge Base → Data Source → Edit → change the chunking strategy to “Semantic chunking” for better results with long documents.
“The AI is making up answers not in the documents” Increase the strictness of the prompt template. Add: “If you are not 100% certain the answer is in the provided context, say you do not know.”
“Ingestion is taking a long time” Large PDF files with many images can be slow to process. For best performance, use text-based PDFs or convert Word documents to plain text before uploading.
“I get throttling errors” Amazon Bedrock has per-account quotas. For production scale, request a quota increase in the AWS Service Quotas console.