Building a Multilingual, Multi‑Modal Document Analysis Pipeline with AWS Lambda and Claude Sonnet 4.6

In my last article How I Delivered a Cross‑Account, Cross‑Region, Multilingual, Multi‑Modal AWS Bedrock Solution in a Zero Trust Environment, I walked through the architecture that makes secure, multilingual, multi‑modal AI processing possible across AWS accounts and regions. That foundation solved the infrastructure challenge, but it left one critical question unanswered: how does the system actually understand real documents?

This article picks up exactly where the last one ended. Now that the pipeline is built, it’s time to open the black box and examine the Lambda Python function that turns raw files into structured, multilingual insights. This is where OCR, model selection, prompt engineering, and Claude Sonnet 4.6 all come together to form the intelligence layer of the entire solution.

Selecting the Right LLM Model

The core requirement was clear: the model must handle multilingual content (English + Chinese) and multi‑modal inputs including text files, PDFs, and images. Several AWS‑native and third‑party models were evaluated.

Evaluation Results

AWS Textract + Titan performs OCR well for English, but Titan consistently ignored Chinese content, making it unsuitable for bilingual documents.
Amazon Nova attempted to interpret Chinese text but produced random guesses with a very low success rate.
Qwen3 handled Chinese better, with an acceptable success rate for mixed‑language documents.
Claude Sonnet 4.6 delivered 5× higher accuracy than Qwen3 while also offering lower total cost. It consistently extracted structured fields correctly across English‑only, Chinese‑only, and mixed‑language documents.

Final Decision
Claude Sonnet 4.6 was selected due to its superior multilingual reasoning, stable multi‑modal performance, and cost efficiency.

Activating Claude Sonnet 4.6 in AWS Bedrock
Before the Lambda function can invoke Claude, the model must be activated in the AWS account.

Step 1: Submit the Use Case
Submit the required use case form in the Bedrock console. Approval typically takes 15–30 minutes.

Step 2: Test in the Model Playground
Run a first inference in the Claude Sonnet 4.6 playground to confirm access.

Step 3: Resolve Marketplace Permission Errors
Some accounts encounter the following error during the first invocation:

“Model access is denied due to IAM user or service role not authorized to perform the required AWS Marketplace actions (aws-marketplace:ViewSubscriptions, aws-marketplace:Subscribe)...”

To resolve this, attach the following IAM policy to the IAM user or role:

{
  "Effect": "Allow",
  "Action": [
    "aws-marketplace:Subscribe",
    "aws-marketplace:ViewSubscriptions"
  ],
  "Resource": "*"
}

After applying the policy, retry the model activation. Once the subscription completes, the Lambda function can invoke Claude normally.

Lambda Function Architecture
The Lambda function is the operational heart of the pipeline. It performs ingestion, OCR, reasoning, and structured extraction.

Handler
The handler contains the main execution logic:

Receive request payload from the EC2 caller
Download the uploaded file from the S3 bucket
Perform OCR text extraction
Invoke Claude Sonnet 4.6 for reasoning and field extraction
Clean and normalize the output
Return the structured JSON result back to the EC2 instance

Claude handles PDFs and images differently during OCR and target‑field extraction, so the Lambda logic must normalize and clean the model’s output to ensure consistent, structured results:

# s3_doc_in_bytes , the s3 document as bytes format  
# content_type , the content type matched by s3 document extension  

# covert the bytes to Claude supported base64 type  
doc_base64 = base64.standard_b64encode(s3_doc_in_bytes).decode("utf-8")

# Build the content block based on document type
if content_type == "application/pdf":
    doc_config = {
	"type": "document",
	"source": {
	    "type": "base64",
		"media_type": content_type,
		"data": doc_base64,
	},
    }
else:
    doc_config = {
	"type": "image",
	"source": {
	    "type": "base64",
	    "media_type": content_type,
	    "data": doc_base64,
	},
    }

request_body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 4096,
    "temperature": 0,  # no creative answer  
    "system": SYSTEM_PROMPT,
    "messages": [
	{
	    "role": "user",
	    "content": [
		    doc_config,
		    {"type": "text", "text": TARGET_FIELD_PROMPT},
		],
	},
    ],
}

response = bedrock_client.invoke_model(
    modelId=BEDROCK_MODEL_ID,
    contentType="application/json",
    accept="application/json",
    body=json.dumps(request_body),
)

response_body = json.loads(response["body"].read())
assistant_text = response_body["content"][0]["text"]

# remove any markdown code fences from Claude answer, if present
cleaned = assistant_text.strip()
if cleaned.startswith("```"):
    cleaned = cleaned.split("\n", 1)[1]
if cleaned.endswith("```"):
    cleaned = cleaned.rsplit("```", 1)[0]
cleaned = cleaned.strip()

# other logic

Prompts
Two categories of prompts guide the LLM’s behavior.

SYSTEM Prompt Defines the system role, extraction rules, and reasoning instructions. It explains how the LLM should interpret different document scenarios.
Target Field Prompt Defines how to handle different file types (PDF, image, text). Provides a list of target fields with descriptions and common pattern multilingual examples, like: | total_deposit | Total deposit collected in HKD (numeric) — include ALL deposits: deposit (按金), renovation deposit, electricity deposit (電費按金), etc. Use the grand total from the receipt/table if available. |
Specifies strict output format rules such as:
- no markdown
- no explanation
- return only a JSON object

These prompts ensure deterministic, repeatable extraction across diverse document types.

Deploying the Lambda Function
Deployment is straightforward but requires attention to packaging:

Zip only the Python files, not the parent folder.
The zip file must contain the .py files at the root level.
Use the Upload ZIP button in the Lambda console.

Configuring the Handler
In the Lambda configuration tab, set the handler to match your file and function name. For example, document_handler.handler where:

document_handler.py is the file
handler is the function inside that file

Closing Thoughts

This architecture demonstrates how to build a multilingual, multi‑modal document analysis pipeline using AWS Lambda and Claude Sonnet 4.6. The key is not just choosing the right model, but designing the prompts, IAM permissions, and Lambda workflow so the system behaves predictably under real production workloads.

About the Author
Jonathan Wong is an IT and AI consultant with 20+ years of experience leading engineering teams across Vancouver and Hong Kong. He specializes in modernizing legacy platforms, cloud security, and building AI-ready systems for startups and large enterprises while advising leadership on using strategic technology to drive business growth.
Connect with me on LinkedIn

Categorized in:

AI, AWS,

Tagged in:

ai, aws

Comments

The Modernization Journey: How to Take a Legacy System From Zero Trust to AI Ready Without Rewrites or Downtime and Big Costs - Behind the Build on April 26, 2026

[…] Building a Multilingual Multi Modal Document Analysis Pipeline with AWS Lambda and Claude Sonnet 4.6 […]

Building a Multilingual, Multi‑Modal Document Analysis Pipeline with AWS Lambda and Claude Sonnet 4.6

Closing Thoughts

Inside TECHSPO Vancouver 2026 at Paradox Vancouver — Innovation, Community, and Real Conversations

The Modernization Journey: How to Take a Legacy System From Zero Trust to AI Ready Without Rewrites or Downtime and Big Costs

Comments

Comments are closed.

Press ESC to close

Closing Thoughts

Share Article:

Inside TECHSPO Vancouver 2026 at Paradox Vancouver — Innovation, Community, and Real Conversations

The Modernization Journey: How to Take a Legacy System From Zero Trust to AI Ready Without Rewrites or Downtime and Big Costs

Comments

Comments are closed.