Tuesday, June 23, 2026

Interview at Dentsu for Lead AI Engineer Role (2026 Jun 18)

Index For Interviews Preparation    « Previously

Interview Reconstruction & Critical Analysis

Lead AI Engineer — Interview Report

Candidate: Ashish
Interviewer: Swaroop
Format: Video Call
Total Topics: 13
Section I

Organized Candidate Transcript

T-01  ·  Self Introduction
13 years of experience: 11 in AI/ML, 2 in software engineering. Progression from scikit-learn, TensorFlow, PyTorch (traditional ML) through generative AI to agentic systems (LangGraph, CrewAI, AI Suite). Last project was a Text2SQL-based AI capability suite — deployed as Network Engineering Assistant (telecom), Business Intelligence via NLQ (healthcare), Data Analytics Platform (telco), and AOBI — comprising four capabilities: Text2SQL generation, RAG, a generic fallback bot, and a visualization + narrator agent, all routed by an orchestrating agent.
T-02  ·  Career History & IBM Tenure
Confirmed 7 companies in 13 years. Currently at IBM since 24th April; serving notice until 29th June. Stated reason: was promised a project, client did not agree, was moved to CSR work which felt stressful, decided to seek outside opportunities. No offer currently in hand.
T-03  ·  Cloud Technologies
Has Azure familiarity; has not specifically used Azure AI Foundry or GCP Vertex AI. Explained that in most use cases, LLMs are accessed via API directly. When pressed on Azure resources: Azure Functions / App Services / AKS for backend; Azure PostgreSQL for database; Azure Blob Storage for RAG document storage; Azure DevOps for project management; Azure AI Search for vector retrieval; Azure Databricks for data integrations.
T-04  ·  Multi-Agent Frameworks
Has worked with LangGraph and CrewAI (two.ai). LangGraph preference over cloud ADKs argued on four grounds: (1) stateful graph model with shared state across agent nodes vs. LangChain's sequential chains; (2) company platform (AI Refinery) was already built on LangGraph; (3) architectural fit — LangGraph suits defined workflows, CrewAI/Autogen suits autonomous multi-agent swarms; (4) client preferences on open-source, budget, and deployment environment.
T-05  ·  Figma AI
Has not personally used Figma AI. Identified it as a design, editing, and animation tool. Stated willingness to learn quickly if required.
T-06  ·  LLM Cost Reduction (50%)
Initial answers: downgrade model variant (nano/mini), switch to cheaper provider (DeepSeek), use self-hosted LLMs. Raised fine-tuning but noted it increases upfront cost. After interviewer prompt: acknowledged that reducing token count per request — by embedding repetitive behavioral instructions into a fine-tuned model — directly reduces cost at scale.
T-07  ·  Governance for a $10K Claims Approval Agent
Proposed three governance layers: (1) Guardrails — PII/confidential data protection, output formatting validation; (2) Human-in-the-loop — review node in the workflow; (3) Monitoring — accuracy tracking over time, detection of data drift (e.g., claim value thresholds shifting), logging to database or log files.
T-08  ·  Azure AI Search vs. Custom RAG Stack
Offered cost, non-Azure client deployment, and open-source preference as reasons to build custom. Could not independently reach the architecture-level answer. Interviewer supplied it: the need for advanced hybrid retrieval customization, domain-specific re-ranking, and large-scale knowledge indexing strategies beyond managed vector search capabilities.
T-09  ·  When NOT to Use Agentic AI
Four examples: (1) Autonomous coding agents that can corrupt codebases; (2) Defense — autonomous lethal weapons raise accountability questions; (3) Medicine — robotic surgery still needs human oversight; (4) Legal — courts and juries should not be autonomous agents. Framed as "how far can we take agentic AI" rather than engineering trade-off scenarios.
T-10  ·  RAG Pipeline Degradation at Scale
Approach: collect erroneous production cases, sample 100, perform manual error analysis, identify most frequent failure type (prompt/context/hallucination/logic), prioritize, trace to responsible pipeline component, build a golden dataset, iterate. Interviewer redirected to architecture-level answer: retrieval metrics analysis, multi-stage retrieval, semantic caching, context optimization, chunking strategy revision, vector DB overload management.
T-11  ·  Coding Experience
Writes 70-80% of code independently; uses documentation or AI assistance for the remaining 20-30%. Primary languages: Python and SQL.
T-12  ·  Candidate Questions
Asked about the project and company. Interviewer explained: AI innovation team delivering POCs across domains — media, creative, ad design, market research, data science — either as short-cycle innovation work or as longer client-embedded engagements (1.8–2.5 years).
Section II

Reconstructed Q&A — Full Interview

Could you start with a brief introduction about yourself?
13 years in AI/ML and software engineering. Built production systems from traditional ML through to agentic AI — LangGraph, CrewAI. Last project: a multi-modal Text2SQL platform deployed across telecom, healthcare, and other enterprise clients under different product names, combining Text2SQL, RAG, a generic fallback bot, and a visualization agent, all orchestrated by a router.
I see you've worked at 7 companies in 13 years — how do you account for that?
Yes, that's correct. [No narrative offered to contextualize the moves.]
You're currently at IBM — what's your tenure, and what's your situation there?
Joined 24th April, serving notice until 29th June. Was promised a project, client declined, was moved to CSR work, which I found stressful. No offer currently in hand.
What's your experience with cloud AI platforms — specifically Azure AI Foundry and GCP Vertex AI?
I have not used those specifically. In my work, we access LLMs directly via API. My Azure experience has been at the service level — AKS, Blob Storage, Azure AI Search, Azure DevOps, Databricks.
Can you walk me through the Azure AI resources you've worked with and how they fit together architecturally?
For backend hosting: Azure Functions, App Services, or AKS. For storage: Azure PostgreSQL and Blob Storage (used Blob for RAG document ingestion). For retrieval: Azure AI Search as vector database. For project management: Azure DevOps. For data integrations: Azure Databricks.
What multi-agent frameworks have you worked with, and why would a developer choose LangGraph over a cloud-native Agent Development Kit?
LangGraph and CrewAI. Four reasons for LangGraph: its stateful graph model is architecturally superior to LangChain's sequential chains; our internal platform (AI Refinery) was built on it; LangGraph fits structured workflows better than autonomous swarms, for which CrewAI/Autogen is preferable; and client preferences on open-source and budget.
Have you worked with Figma AI or Figma-to-HTML use cases?
Not personally. I understand it's for design, editing, and animations. I learn quickly if that's a requirement here.
How would you reduce LLM cost by 50% without reducing answer quality?
Options include downgrading to a lighter model variant after benchmarking, switching to a lower-cost provider like DeepSeek, using self-hosted LLMs, or fine-tuning the model to compress prompt tokens — so repetitive behavioral instructions don't need to be sent on every request.
I have an AI agent approving insurance claims up to $10,000. How would you enforce a governance layer?
Three layers: guardrails for PII protection and output formatting/validation; human-in-the-loop as a review node for edge cases; and monitoring for accuracy drift, data drift, and full audit logging.
Under what circumstances would you avoid Azure AI Search — or GCP/AWS equivalents — and build your own custom RAG stack?
Cost, client's non-Azure deployment requirement, or preference for open-source tooling. [Could not independently arrive at the retrieval-architecture reason; interviewer provided: advanced hybrid search customization and domain-specific re-ranking needs.]
You're being hired for a senior agentic AI role. When would you tell a client NOT to use an agentic AI solution? Give me real-world examples.
Autonomous coding agents that modify codebases without oversight. Autonomous defense weapons with no human accountability. Robotic surgery without a human in the room. Autonomous legal representation in courtrooms. [Framed as societal/ethical limits rather than engineering trade-offs.]
A customer support RAG system performed at 90%+ accuracy at pilot (500–1,000 users), but at 50,000–1 lakh users: hallucinations increased, latency rose from 3s to 15s, retrieval quality dropped, and LLM cost increased 8–10x. As lead architect, how do you diagnose and redesign?
Collect erroneous production instances, sample 100, manually classify failure type (prompt / context / hallucination / logic), identify the most frequent failure, trace it to the responsible pipeline component, build a golden dataset, and iterate. [Interviewer redirected: should have started with retrieval metrics, multi-stage retrieval, semantic caching, chunking strategy, and vector DB overload.]
What's your hands-on coding experience? Do you write from scratch or rely on AI assistance?
I write 70–80% of code independently in Python and SQL. The remaining 20–30% I look up in documentation or use AI assistance.
Section III

Critical Analysis & Better Answers

The critique below evaluates each answer against the altitude expected of a Lead / Principal AI Engineer — where the standard is architectural decision-making, not implementation narration.
Q-01 Self Introduction
What Went Wrong

This was a portfolio recitation, not a leadership pitch. The answer listed tools (scikit-learn → PyTorch → LangGraph) and project names without anchoring any of it to outcomes, decisions, or the scale of problems solved. Over two minutes of speaking produced no memorable claim the interviewer could hold onto. The question "why hire you?" was left unanswered.

Better Answer

Lead with impact, not inventory. Open with one high-signal sentence about what you built and why it was hard.

"I'm Ashish — Lead AI Engineer with 13 years of experience, the last four focused on designing production agentic and RAG systems. Most recently I led the architecture for a multi-client Text2SQL platform that processed natural language queries over both structured and unstructured data, deployed across telecom and healthcare enterprises. My M.Tech from BITS Pilani gave me the mathematical foundation; building these systems at production scale gave me everything else."
Q-02 7 Companies in 13 Years
What Went Wrong

Confirming "yes, that's correct" and staying silent is the worst possible response to this pattern. It leaves the interviewer to fill the silence with skepticism. No career arc was offered, no intentionality was demonstrated.

Better Answer

Pre-empt with a story. Name the turning point. Show that recent moves were deliberate, not reactive.

"The first few were early-career exploration until I found my domain in AI around 2015-16. Since then, every move has been toward increasing ownership of AI architecture. My IBM role is an exception — I've been bench-allocated and I've decided not to wait. Every other move has been a promotion of scope."
Q-03 Leaving IBM / No Offer in Hand
What Went Wrong

Volunteering "no offer in hand" weakens every subsequent negotiation. Explaining departure as "CSR work was stressful" frames the move as running away from a problem rather than toward an opportunity. Both facts, while honest, cost leverage.

Better Answer

Keep the reason forward-looking and factual without over-sharing.

"At IBM I've been on the bench — the client engagement I was being prepared for didn't materialize. Rather than wait indefinitely for allocation, I decided to be proactive. I'm looking for a role with genuine architectural ownership from day one, which is what this position seems to offer."
Q-04 Azure AI Resources — Architectural Walkthrough
What Went Wrong

The answer was a laundry list of Azure service names without any architectural rationale. "We used AKS" is not a decision — the decision is why AKS over Azure Functions for this workload, or why Azure AI Search over a standalone Pinecone deployment. The interviewer was testing architectural reasoning, not Azure documentation recall.

Better Answer

Organize the answer as a layered architecture with at least one explicit trade-off decision at each layer.

"Our stack had four layers: compute (AKS for the FastAPI orchestration layer — chosen over Azure Functions because our LangGraph workflows exceeded Function timeout limits); storage (Blob for raw documents, PostgreSQL for structured client data); retrieval (Azure AI Search as the vector index — we chose it over standalone Pinecone because of native Azure AD integration and data residency compliance); and observability (Azure Monitor plus custom logging). The hardest call was Azure OpenAI vs. direct OpenAI API — we chose Azure OpenAI for the enterprise data security guarantees."
Q-05 LangGraph vs. Cloud-Native ADKs
What Went Wrong

This was one of the stronger answers — four structured reasons, a genuine architectural insight about stateful graphs vs. sequential chains. Minor issue: the LangChain history detour consumed too much time before reaching the core insight. The answer also buried the most important reason (stateful graph model) after the less important ones.

Better Answer

Lead with the architectural differentiator, then supporting context.

"LangGraph's defining advantage is its stateful graph model — each node is an agent with read/write access to a shared state object, which makes conditional flows, human-in-the-loop interrupts, and multi-step tool-use tractable in a way LangChain's sequential chains never were. Cloud ADKs are tightly coupled to their vendor's runtime and observability stack, which matters when clients need cloud-neutral contracts or full auditability. For fully autonomous multi-agent swarms with back-and-forth communication, CrewAI or AutoGen is the stronger fit — LangGraph excels when the control flow is known and needs to be deterministic."
Q-06 LLM Cost Reduction by 50%
What Went Wrong

The first three answers — model downgrade, provider switch, self-hosting — are procurement and operations decisions, not engineering solutions. Fine-tuning was raised but immediately self-contradicted ("it elevates cost"). The correct engineering answer — semantic caching, prompt compression, intelligent model routing — only emerged after the interviewer had to prompt for it. An architect should have led with the highest-leverage technical levers.

Better Answer "The fastest path to 50% cost reduction in a production RAG system is semantic caching — store embedding-matched responses and serve cache hits for semantically similar queries. In high-volume customer support, 30–50% of queries cluster around the same 20–30 canonical intents. Second: prompt compression via fine-tuning — if your system prompt is 800 tokens of behavioral instructions repeated on every request, one fine-tuning run amortizes that cost across millions of calls. Third: intelligent model routing — classify query complexity and route straightforward queries to a mini/nano model, reserving the full model for ambiguous or multi-step reasoning. Model-switching and self-hosting are last-resort levers because they introduce latency and reliability risk that typically outweigh the savings at moderate scale."
Q-07 Governance for a $10K Claims Approval Agent
What Went Wrong

The answer wandered into PII guardrails and data formatting before reaching the core of the question. Governance for a financial approval agent is fundamentally about accountability, auditability, and decision boundaries — none of which were named clearly. Missing entirely: approval thresholds with hard business rules, explainability requirements, role-based escalation paths, and regulatory compliance dimensions.

Better Answer "Three layers. First, hard decision boundaries: the agent cannot override certain rules regardless of model confidence — e.g., any claim above $8,000 must pass through a human review node. Second, full auditability: every approval decision must be logged with the input claim, retrieved evidence, the agent's reasoning chain, and the output — to satisfy SOC 2 and internal audit requirements. Third, drift monitoring: track the approval rate distribution over rolling windows. If the model begins approving 92% of claims where the historical baseline is 75%, that's a signal of data drift, prompt injection, or adversarial input — and should trigger automatic hold and review. I'd also add a confidence gate: structured output with a confidence score below a threshold automatically routes to human review."
Q-08 Azure AI Search vs. Custom RAG Stack
What Went Wrong

All three answers (cost, cloud preference, open-source vs. closed-source) are procurement/policy decisions — none are engineering reasons. The interviewer had to provide the answer directly. This was a significant gap: a Lead AI Engineer should know immediately that the architectural reason to build custom is when retrieval quality is the primary differentiator and managed vector search doesn't support the required retrieval strategies.

Better Answer "The core engineering reason to build custom is when your retrieval logic exceeds what managed vector search can configure. Azure AI Search offers hybrid dense+sparse search with configurable blend weights — that covers most enterprise RAG cases. But if you need late-interaction re-ranking (ColBERT trained on your domain corpus), multi-hop iterative query refinement, or a custom scoring function incorporating document freshness, entity salience, and proprietary business rules — none of that is configurable in a managed service. You build custom when retrieval quality is your primary differentiator and the algorithm is part of the IP."
Q-09 When NOT to Use Agentic AI — Real-World Examples
What Went Wrong

The examples given (autonomous weapons, robotic surgery, courtroom lawyers) are philosophical and societal — macro-level ethical questions about AI in civilization, not engineering trade-off decisions. The interviewer explicitly said "I'm looking for real-world examples from a senior agentic AI position," which means: scenarios where you would tell an engineering client that agentic AI is the wrong technical choice for their specific problem.

Better Answer "Agentic AI is the wrong choice when: (1) the problem is fully deterministic and bounded — a tax calculation engine or a document format validator; agents introduce latency, cost, and non-determinism where a decision tree is faster, cheaper, and auditable. (2) Sub-100ms latency is required — real-time fraud scoring at transaction time cannot have an LLM in the critical path. (3) High-frequency, low-complexity classification — routing 10,000 support tickets per hour to the right team is a fine-tuned classifier problem, not an agent problem. (4) Regulated determinism — a medical device calculating drug dosage requires certified, deterministic software, not probabilistic inference. The general principle: if the solution space is fully specified and bounded, deterministic code wins on every dimension. Agents earn their cost when the task requires open-ended reasoning, multi-step planning, or natural language understanding."
Q-10 RAG Pipeline Degradation at Scale (3s → 15s latency, 8-10x cost)
What Went Wrong

The answer addressed the problem from a QA / data-science perspective (error sampling, error classification, golden dataset) rather than an infrastructure and retrieval architecture perspective. At a Lead Architect level, the first question is not "which errors are most common?" — it's "which layer is the bottleneck?" The interviewer had to intervene and supply the correct altitude: retrieval metrics, multi-stage retrieval, semantic caching, chunking, vector DB overload.

Better Answer "I'd diagnose in layers. Latency (3s → 15s): separate the LLM call latency from the vector DB query latency using distributed tracing. At 50x user scale, the vector index is almost certainly the bottleneck — crowded ANN indices degrade query time. Fix: switch from flat index to HNSW, add read replicas, or shard the index. Retrieval quality: measure recall@5 and MRR at current scale. Crowded indices push relevant chunks below the top-k cutoff. Fix: add a cross-encoder re-ranker as a second retrieval stage, tighten similarity thresholds, and revisit chunk size — smaller, denser chunks improve precision at scale. Cost (8-10x): almost always points to zero caching. Implement semantic caching with a similarity threshold — most support queries cluster around 20-30 intents. Hallucinations: typically a symptom of poor context — the LLM is receiving irrelevant chunks. Tightening retrieval and adding metadata filters usually resolves this without touching the LLM at all."
Overall Assessment

The interview demonstrated genuine hands-on depth — the LangGraph answer was well-structured, the governance and monitoring instincts were sound, and the coding honesty was appropriate. The recurring failure pattern, however, was consistent: answers were delivered at the altitude of an engineer describing what was built, not an architect explaining why decisions were made.

The interviewer explicitly redirected on Q-06 (cost reduction), Q-08 (custom RAG), Q-09 (anti-patterns), and Q-10 (scaling diagnosis) — all four times asking for architecture-level reasoning and receiving implementation-level narration. This pattern is the single most important thing to correct before the next interview.

The practical fix: for every technical question, answer the "why this, not that" question before the "what we built" question. Lead with the decision and the trade-off, not the outcome.

✓ Strong — LangGraph rationale ✓ Strong — Monitoring & governance instincts ✓ Strong — Coding honesty ✗ Weak — Cost reduction (missed caching) ✗ Weak — Custom RAG trade-offs ✗ Weak — Scale diagnosis altitude △ Mid — Introduction (no impact framing) △ Mid — Career narrative (no story)

Index For Interviews Preparation    « Previously
Tags: Interview Preparation,

No comments:

Post a Comment