Amal Shaji — Senior Data Scientist · AI/ML, Multi-Agent Systems & RAG

01About

I'm a Senior Data Scientist at Veracode with ~7 years of engineering experience and an M.Tech in Data Science from BITS Pilani. Before this I spent 3½ years on AWS A.I. / Bedrock, working at the seam between research and production — taking ideas like multi-agent coordination and retrieval-augmented generation and turning them into systems that run unattended, at AWS scale.

The move to application security wasn't a detour. At Amazon I served as an Application Security Guardian alongside my day job; at Veracode, securing the world's software is the day job — with machine learning as the instrument.

The thread connecting all of it is the same one that makes Interstellar my favourite film: a fascination with systems bigger than ourselves — black holes, orbital mechanics, and machines that can reason. I'm drawn to Research Engineer and Applied Scientist problems where engineering rigor meets state-of-the-art AI.

~7years across Veracode, AWS & TCS
2 wks → 4 hrsAWS region build time, automated end-to-end with agents
95%data-quality issues caught automatically in RLHF pipelines
8.53CGPA — M.Tech Data Science, BITS Pilani

02Mission Log· experience

LOG 03 · CURRENT ORBIT Apr 2026 — Present Bengaluru

Veracode · via Accion Labs

Senior Data Scientist

Applying machine learning to application security — the discipline I practised as a guardian at Amazon, now as the mission itself.

LOG 02 Oct 2022 — Apr 2026 Bengaluru

Amazon Web Services · AWS A.I. / Bedrock

System Development Engineer Nov 2024 — Apr 2026

Application Engineer Oct 2022 — Nov 2024

Architected MARES, a Coordinator–Delegator–Worker multi-agent system on Amazon Bedrock that autonomously manages Bedrock region expansion — cutting build time from 2 weeks of manual effort to 4 hours.
Designed VISAR, a production-grade RAG system (SageMaker embeddings + OpenSearch Serverless k-NN) supporting unlimited document volume and multi-format ingestion — retrieval latency down from hours to seconds.
Built an end-to-end data-quality pipeline for RLHF training datasets with custom transformers (BERTScore similarity, Detoxify filtering), reducing manual review by 40% and catching 95% of quality issues.
Served as Application Security Guardian — embedding secure design and risk mitigation into the development lifecycle through design reviews.

LOG 01 · LAUNCH Jul 2019 — Oct 2022 Chennai

Tata Consultancy Services

System Engineer Apr 2021 — Oct 2022

Assistant System Engineer Jul 2019 — Mar 2021

Developed automated anomaly-detection scripts with Elasticsearch and Kibana, reducing manual log-analysis time by 40%.
Led a team maintaining 100% SLA compliance for critical production systems.
Recognised with the “TCS Digital High Talent” tag for top-tier technical performance.

03Systems I've Built

MARES

multi-agent

Multi-Agent Region Expansion System

A novel Coordinator–Delegator–Worker architecture on Amazon Bedrock that autonomously runs AWS region expansion. Agents retrieve historical deployment issues via RAG and resolve infrastructure failures without human intervention — a hybrid mesh of LLM reasoning and Lambda execution.

IMPACT 2 weeks of manual effort → 4 hours, fully autonomous.

Bedrock
Knowledge Bases
Lambda
RAG

VISAR

retrieval

Vector Integrated Search & Retrieval

A production-grade RAG system built past the limits of native knowledge bases: SageMaker for embedding generation, OpenSearch Serverless for k-NN indexing, and an AWS Batch ingestion pipeline that chunks and preprocesses large corpuses across PDF, Excel, and text.

IMPACT Unlimited document volume; retrieval latency from hours → seconds.

SageMaker
OpenSearch
AWS Batch
k-NN

RLHF Data Quality Pipeline

ml-infra

Reproducible sanitisation for LLM training data

A modular preprocessing pipeline for high-volume RLHF datasets, with custom transformers (TransformerMixin pattern) encapsulating BERTScore similarity and Detoxify filtering, plus automated outlier and noise handling.

IMPACT −40% manual review overhead; 95% of data-quality issues caught.

Python
BERTScore
Detoxify
scikit-learn

Legal Document Summarisation

research

M.Tech thesis · BITS Pilani

An abstractive summarisation model for Indian legal constitutional documents, built on Bi-Directional LSTM and GRU networks with an attention mechanism to preserve context across long-form legal text.

IMPACT Bridged NLP research and a hard real-world domain — long, dense legal language.

Bi-LSTM
GRU
Attention
NLP

04Skill Constellations

AI / ML

Multi-Agent Systems
RAG
LLM Evaluation
NLP
Deep Learning
PyTorch
TensorFlow
CNNs
LSTM / GRU
Attention Mechanisms
RLHF Data Pipelines
Recommendation Systems

Cloud & Infrastructure

Amazon Bedrock
SageMaker
OpenSearch Serverless
AWS Lambda
AWS Batch
Serverless Architecture
Elasticsearch
Kibana

Engineering

Python
Distributed Systems
Data Pipelines
Application Security
Production Operations
Information Retrieval

05Education & Certifications

Education

M.Tech, Data Science & Engineering

BITS Pilani · 2020 — 2022 · CGPA 8.53/10

Thesis: AI-based legal document summarisation (LSTM/GRU). Coursework: Deep Learning, NLP, Information Retrieval.

B.Tech, Mechanical Engineering

NSS College of Engineering · 2015 — 2019 · CGPA 8.19/10

Certifications

Google Cloud Professional Machine Learning Engineer
Certified A.I. Professional — Defense Institute of Advanced Technology (DRDO)
Applied Data Science with Python Specialization — University of Michigan
TensorFlow for AI & ML — DeepLearning.AI

Recognition

Application Security Guardian — Amazon
TCS Digital High Talent

06Open a Channel

Hiring for a Research Engineer or Applied Scientist role? I'd love to talk. Email is the fastest channel — I usually reply within a day.

amalshajiprof@gmail.com LinkedIn ↗ Résumé ↓

Bengaluru, India · UTC+5:30 · no time dilation observed