Incoming transmission · Bengaluru, Earth

Amal Shaji

Senior Data Scientist @ Veracode · ex-AWS Bedrock

I build autonomous AI systems — multi-agent architectures, retrieval-augmented generation, and the ML infrastructure that keeps them honest. Forged on AWS Bedrock, now pointed at application security.

01About

I'm a Senior Data Scientist at Veracode with ~7 years of engineering experience and an M.Tech in Data Science from BITS Pilani. Before this I spent 3½ years on AWS A.I. / Bedrock, working at the seam between research and production — taking ideas like multi-agent coordination and retrieval-augmented generation and turning them into systems that run unattended, at AWS scale.

The move to application security wasn't a detour. At Amazon I served as an Application Security Guardian alongside my day job; at Veracode, securing the world's software is the day job — with machine learning as the instrument.

The thread connecting all of it is the same one that makes Interstellar my favourite film: a fascination with systems bigger than ourselves — black holes, orbital mechanics, and machines that can reason. I'm drawn to Research Engineer and Applied Scientist problems where engineering rigor meets state-of-the-art AI.

  • ~7years across Veracode, AWS & TCS
  • 2 wks → 4 hrsAWS region build time, automated end-to-end with agents
  • 95%data-quality issues caught automatically in RLHF pipelines
  • 8.53CGPA — M.Tech Data Science, BITS Pilani

02Mission Log· experience

LOG 03 · CURRENT ORBIT Apr 2026 — Present Bengaluru

Veracode · via Accion Labs

Senior Data Scientist

  • Applying machine learning to application security — the discipline I practised as a guardian at Amazon, now as the mission itself.
LOG 02 Oct 2022 — Apr 2026 Bengaluru

Amazon Web Services · AWS A.I. / Bedrock

System Development Engineer Nov 2024 — Apr 2026

Application Engineer Oct 2022 — Nov 2024

  • Architected MARES, a Coordinator–Delegator–Worker multi-agent system on Amazon Bedrock that autonomously manages Bedrock region expansion — cutting build time from 2 weeks of manual effort to 4 hours.
  • Designed VISAR, a production-grade RAG system (SageMaker embeddings + OpenSearch Serverless k-NN) supporting unlimited document volume and multi-format ingestion — retrieval latency down from hours to seconds.
  • Built an end-to-end data-quality pipeline for RLHF training datasets with custom transformers (BERTScore similarity, Detoxify filtering), reducing manual review by 40% and catching 95% of quality issues.
  • Served as Application Security Guardian — embedding secure design and risk mitigation into the development lifecycle through design reviews.
LOG 01 · LAUNCH Jul 2019 — Oct 2022 Chennai

Tata Consultancy Services

System Engineer Apr 2021 — Oct 2022

Assistant System Engineer Jul 2019 — Mar 2021

  • Developed automated anomaly-detection scripts with Elasticsearch and Kibana, reducing manual log-analysis time by 40%.
  • Led a team maintaining 100% SLA compliance for critical production systems.
  • Recognised with the “TCS Digital High Talent” tag for top-tier technical performance.

03Systems I've Built

MARES

multi-agent

Multi-Agent Region Expansion System

A novel Coordinator–Delegator–Worker architecture on Amazon Bedrock that autonomously runs AWS region expansion. Agents retrieve historical deployment issues via RAG and resolve infrastructure failures without human intervention — a hybrid mesh of LLM reasoning and Lambda execution.

IMPACT 2 weeks of manual effort → 4 hours, fully autonomous.

  • Bedrock
  • Knowledge Bases
  • Lambda
  • RAG

VISAR

retrieval

Vector Integrated Search & Retrieval

A production-grade RAG system built past the limits of native knowledge bases: SageMaker for embedding generation, OpenSearch Serverless for k-NN indexing, and an AWS Batch ingestion pipeline that chunks and preprocesses large corpuses across PDF, Excel, and text.

IMPACT Unlimited document volume; retrieval latency from hours → seconds.

  • SageMaker
  • OpenSearch
  • AWS Batch
  • k-NN

RLHF Data Quality Pipeline

ml-infra

Reproducible sanitisation for LLM training data

A modular preprocessing pipeline for high-volume RLHF datasets, with custom transformers (TransformerMixin pattern) encapsulating BERTScore similarity and Detoxify filtering, plus automated outlier and noise handling.

IMPACT −40% manual review overhead; 95% of data-quality issues caught.

  • Python
  • BERTScore
  • Detoxify
  • scikit-learn

Legal Document Summarisation

research

M.Tech thesis · BITS Pilani

An abstractive summarisation model for Indian legal constitutional documents, built on Bi-Directional LSTM and GRU networks with an attention mechanism to preserve context across long-form legal text.

IMPACT Bridged NLP research and a hard real-world domain — long, dense legal language.

  • Bi-LSTM
  • GRU
  • Attention
  • NLP

04Skill Constellations

AI / ML

  • Multi-Agent Systems
  • RAG
  • LLM Evaluation
  • NLP
  • Deep Learning
  • PyTorch
  • TensorFlow
  • CNNs
  • LSTM / GRU
  • Attention Mechanisms
  • RLHF Data Pipelines
  • Recommendation Systems

Cloud & Infrastructure

  • Amazon Bedrock
  • SageMaker
  • OpenSearch Serverless
  • AWS Lambda
  • AWS Batch
  • Serverless Architecture
  • Elasticsearch
  • Kibana

Engineering

  • Python
  • Distributed Systems
  • Data Pipelines
  • Application Security
  • Production Operations
  • Information Retrieval

05Education & Certifications

Education

M.Tech, Data Science & Engineering

BITS Pilani · 2020 — 2022 · CGPA 8.53/10

Thesis: AI-based legal document summarisation (LSTM/GRU). Coursework: Deep Learning, NLP, Information Retrieval.

B.Tech, Mechanical Engineering

NSS College of Engineering · 2015 — 2019 · CGPA 8.19/10

Certifications

  • Google Cloud Professional Machine Learning Engineer
  • Certified A.I. Professional — Defense Institute of Advanced Technology (DRDO)
  • Applied Data Science with Python Specialization — University of Michigan
  • TensorFlow for AI & ML — DeepLearning.AI

Recognition

  • Application Security Guardian — Amazon
  • TCS Digital High Talent

06Open a Channel

Hiring for a Research Engineer or Applied Scientist role? I'd love to talk. Email is the fastest channel — I usually reply within a day.

Bengaluru, India · UTC+5:30 · no time dilation observed