In my last article How I Delivered a Cross‑Account, Cross‑Region, Multilingual, Multi‑Modal AWS Bedrock Solution in a Zero Trust Environment, I walked through the architecture that makes secure, multilingual, multi‑modal AI processing possible across AWS accounts and regions. That foundation solved the infrastructure challenge, but it left one critical question unanswered: how does the system actually understand real documents?
This article picks up exactly where the last one ended. Now that the pipeline is built, it’s time to open the black box and examine the Lambda Python function that turns raw files into structured, multilingual insights. This is where OCR, model selection, prompt engineering, and Claude Sonnet 4.6 all come together to form the intelligence layer of the entire solution.
Selecting the Right LLM Model
The core requirement was clear: the model must handle multilingual content (English + Chinese) and multi‑modal inputs including text files, PDFs, and images. Several AWS‑native and third‑party models were evaluated.
Evaluation Results
- AWS Textract + Titan performs OCR well for English, but Titan consistently ignored Chinese content, making it unsuitable for bilingual documents.
- Amazon Nova attempted to interpret Chinese text but produced random guesses with a very low success rate.
- Qwen3 handled Chinese better, with an acceptable success rate for mixed‑language documents.
- Claude Sonnet 4.6 delivered 5× higher accuracy than Qwen3 while also offering lower total cost. It consistently extracted structured fields correctly across English‑only, Chinese‑only, and mixed‑language documents.
Final Decision
Claude Sonnet 4.6 was selected due to its superior multilingual reasoning, stable multi‑modal performance, and cost efficiency.
Activating Claude Sonnet 4.6 in AWS Bedrock
Before the Lambda function can invoke Claude, the model must be activated in the AWS account.
Step 1: Submit the Use Case
Submit the required use case form in the Bedrock console. Approval typically takes 15–30 minutes.
Step 2: Test in the Model Playground
Run a first inference in the Claude Sonnet 4.6 playground to confirm access.
Step 3: Resolve Marketplace Permission Errors
Some accounts encounter the following error during the first invocation:
“Model access is denied due to IAM user or service role not authorized to perform the required AWS Marketplace actions (aws-marketplace:ViewSubscriptions, aws-marketplace:Subscribe)...”
To resolve this, attach the following IAM policy to the IAM user or role:
{
"Effect": "Allow",
"Action": [
"aws-marketplace:Subscribe",
"aws-marketplace:ViewSubscriptions"
],
"Resource": "*"
}
After applying the policy, retry the model activation. Once the subscription completes, the Lambda function can invoke Claude normally.
Lambda Function Architecture
The Lambda function is the operational heart of the pipeline. It performs ingestion, OCR, reasoning, and structured extraction.
Handler
The handler contains the main execution logic:
- Receive request payload from the EC2 caller
- Download the uploaded file from the S3 bucket
- Perform OCR text extraction
- Invoke Claude Sonnet 4.6 for reasoning and field extraction
- Clean and normalize the output
- Return the structured JSON result back to the EC2 instance
Claude handles PDFs and images differently during OCR and target‑field extraction, so the Lambda logic must normalize and clean the model’s output to ensure consistent, structured results:
# s3_doc_in_bytes , the s3 document as bytes format
# content_type , the content type matched by s3 document extension
# covert the bytes to Claude supported base64 type
doc_base64 = base64.standard_b64encode(s3_doc_in_bytes).decode("utf-8")
# Build the content block based on document type
if content_type == "application/pdf":
doc_config = {
"type": "document",
"source": {
"type": "base64",
"media_type": content_type,
"data": doc_base64,
},
}
else:
doc_config = {
"type": "image",
"source": {
"type": "base64",
"media_type": content_type,
"data": doc_base64,
},
}
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 4096,
"temperature": 0, # no creative answer
"system": OCR_PROMPT,
"messages": [
{
"role": "user",
"content": [
doc_config,
{"type": "text", "text": TARGET_FIELD_PROMPT},
],
},
],
}
response = bedrock_client.invoke_model(
modelId=BEDROCK_MODEL_ID,
contentType="application/json",
accept="application/json",
body=json.dumps(request_body),
)
response_body = json.loads(response["body"].read())
assistant_text = response_body["content"][0]["text"]
# remove any markdown code fences from Claude answer, if present
cleaned = assistant_text.strip()
if cleaned.startswith("```"):
cleaned = cleaned.split("\n", 1)[1]
if cleaned.endswith("```"):
cleaned = cleaned.rsplit("```", 1)[0]
cleaned = cleaned.strip()
# other logic
Prompts
Two categories of prompts guide the LLM’s behavior.
- OCR Prompt Defines the system role, extraction rules, and reasoning instructions. It explains how the LLM should interpret different document scenarios.
- Target Field Prompt Defines how to handle different file types (PDF, image, text). Provides a list of target fields with descriptions and common pattern multilingual examples, like:
| total_deposit | Total deposit collected in HKD (numeric) — include ALL deposits: deposit (按金), electricity deposit (電費按金), renovation deposit, etc. Use the grand total from the receipt/table if available. |
Specifies strict output format rules such as:- no markdown
- no explanation
- return only a JSON object
These prompts ensure deterministic, repeatable extraction across diverse document types.
Deploying the Lambda Function
Deployment is straightforward but requires attention to packaging:
- Zip only the Python files, not the parent folder.
- The zip file must contain the
.pyfiles at the root level. - Use the Upload ZIP button in the Lambda console.
Configuring the Handler
In the Lambda configuration tab, set the handler to match your file and function name. For example, document_handler.handler where:
document_handler.pyis the filehandleris the function inside that file
Closing Thoughts
This architecture demonstrates how to build a multilingual, multi‑modal document analysis pipeline using AWS Lambda and Claude Sonnet 4.6. The key is not just choosing the right model, but designing the prompts, IAM permissions, and Lambda workflow so the system behaves predictably under real production workloads.
About the Author
Jonathan Wong is an IT and AI consultant with 20+ years of experience leading engineering teams across Vancouver and Hong Kong. He specializes in modernizing legacy platforms, cloud security, and building AI-ready systems for startups and large enterprises while advising leadership on using strategic technology to drive business growth.
Connect with me on LinkedIn