Recruiter View

Siddhant
Bhujade

Full-stack engineer building backend systems, AI workflows, and data platforms at scale

MS Information Science - UT Dallas (GPA 3.97)
3+ years full-time experience
Open to relocation across the U.S.
F-1 STEM OPT, requires H-1B sponsorship
Hiring Manager View

Siddhant
Bhujade

I build systems that do not fall over, and AI features that actually work in production.

Project 01 - Distributed Backend Platform
Multi-Tenant Data Processing and Analytics Engine for Loan (LOS) Data.
A backend-heavy full-stack platform for loan data ingestion, asynchronous processing, human-in-the-loop validation, and automated report generation with real-time job monitoring across 100+ institutions. Key challenges included strict tenant data isolation, orchestrating background workflows with indefinite human review checkpoints, and offloading heavy analytical computations — including geocoding and report generation — without impacting user-facing API performance.
Why Temporal was used over a simple queue or Airflow?
Loan ingestion workflows have long-running human-in-the-loop review steps that could pause for hours or days. Temporal's durable execution model gave us replay and retry semantics for free — without custom retry logic or polling — cutting failure recovery from 3 hours to 15 seconds. Airflow was ruled out because it's built for scheduled, predictable DAGs, not event-driven workflows with indefinite human checkpoints and dynamic branching logic.
How did you scale heavy tasks like report generation and geocoding?
Both were too slow and resource-intensive to run inline. Report generation was offloaded as async Celery tasks, queued via RabbitMQ and processed by dedicated workers — keeping the API responsive while reports generated in the background. Geocoding was batched using the Census API with PostGIS caching, so repeated address lookups across 700+ institutions never hit the external API twice.
How was tenant isolation enforced?
SQL Server row-level security scoped all queries to the tenant context at the DB layer, not the application layer. Combined with dynamic PII masking at the API boundary to handle zero-trust cross-tenant access.
Key outcome and Impact
Scaled the system to automate loan data processing and compliance reporting across 100+ institutions — reducing manual file reviews from days to hours, with real-time job monitoring and strict data isolation — directly enabling credit unions to achieve CDFI certification and secure grants worth millions of dollars.
ASP.NET CoreFastAPI TemporalCelery RabbitMQSQL Server RLS RedisAzure
Project 02 - AI / LLM Platform
Platform to enable AI-powered Financial Analytics and Performance Benchmarking
Built an AI personalization layer on top of the analytics platform, including a multi-tenant LLM gateway, a hybrid retrieval pipeline, and an evaluation framework to measure and improve response quality. The hard part was not getting an LLM to answer questions. It was making it reliably accurate on deterministic financial data where hallucinations are a regulatory risk.
How does the LLM gateway work?
Routes requests across multiple OpenAI and Claude models based on institution subscription tier and prompt complexity classification, with API rate limiting, priority queuing and exponential backoff retries — cutting API failures from ~20% to under 5% and reducing inference costs by 35% via smart model routing.
How was response quality improved?
Built an LLM-as-Judge prompt evaluation layer combined with real user feedback loops, driving iterative prompt engineering and few-shot optimization — enforcing structured API contracts for deterministic, schema-validated responses and continuously acting on feedback signals to lift response accuracy from 71% to 94% over successive refinement cycles
Why hybrid RAG (vector + SQL)?
Pure vector search surfaces semantically similar content but cannot enforce precise numerical constraints from financial datasets. Grounding responses in SQL-verified facts reduced hallucinations by 85%, which was critical in a regulated financial context.
Scale and latency
Created dedicated microservices to serve AI-related API traffic and embedding workloads, decoupling them from the core platform backend — cutting P95 latency from 22s to 5s across 500K+ monthly queries with Redis response caching.
FastAPI asyncpgvector OpenAI APIClaude API LangChain Redis cacheLLM-as-Judge