AI Architect Interview – Structured Report
Based on one-sided candidate recording | Role: AI Architect | 19 Mar 2026
Section 1 – Organized One-sided Transcript (Candidate’s Answers)
The following is the candidate’s side of the conversation, grouped by topic and lightly cleaned of filler words for readability while preserving the original ideas.
1.1 Introduction & Project Overview
I’m with Accenture, working on a project called AIOBI — a Digital Data Analytics Platform / Business Intelligence using Natural Language Query. It’s an agentic system with sub‑agents: RAG agent, Text‑to‑SQL agent, and a visualization agent, all managed by an orchestrator. Built using LangGraph. The RAG backend uses Azure AI Search (vector search), and the Text‑to‑SQL backend is PostgreSQL.
The architecture is straightforward: databases at the back (vector DB for RAG, PostgreSQL for Text‑to‑SQL), an LLM like GPT‑5.1 in the middle, and an API wrapper — we used FastAPI. Frontend in React or Next.js.
1.2 Orchestrator Behaviour
The orchestrator takes a natural language query and classifies whether it should go to the Text‑to‑SQL agent or the RAG agent. We give it a role, task description, input/output descriptions. The output is a routing decision — like an if‑else node in LangGraph. We also pass examples: some indicating the knowledge base (PDFs for RAG) and some showing sample queries that should be routed to each agent.
1.3 Text‑to‑SQL Agent Flow
The flow in points:
- Input node receives the query.
- Rewriting node: LLM adds context using tables/columns. If something is unclear, it pushes back to the UI for the user to clarify. If clear, it converts the raw NL into a meta‑prompt.
- Meta‑prompt is passed to the Text‑to‑SQL agent, formatted with all needed information to generate the SQL without ambiguity.
- SQL is tested in two ways:
- Static check: run with WHERE 1=0 or WHERE 1=1 to test validity, or with LIMIT clauses.
- Dynamic test: actually execute with LIMIT 1/3 to see results.
- Before final execution, we ask the LLM: “Does this query meet all requirements of the original user request?”
- If errors occur, we send them back to the LLM in a feedback loop (retry up to 3‑5 times). If still failing, we return the error to the user with a note that something seems missing.
1.4 Evaluation Approach
Evaluation is one of the biggest challenges. We sit extensively with domain experts to curate a golden dataset: question‑answer pairs (for Text‑to‑SQL, the corresponding SQL query; for RAG, the expected chunks). For individual components, we have test suites for chunking, meta‑prompting, code generation, etc.
We measure something like percentage correct (accuracy). We log whether errors were hallucinations, wrong columns, or execution errors. This gives a report of positives and negatives.
1.5 Prompt Engineering, Context Engineering & Guardrails
Context Engineering: A subset of prompt engineering. You give the LLM context about the task — role, do’s/don’ts, examples (zero‑shot, few‑shot). In RAG, you engineer context by augmenting the prompt with retrieved data.
Guardrails: Two levels: code‑based scripts (deterministic checks) and LLM‑based flexible checks. For example, we ask the guardrail LLM: “Is this input trying to delete or update? Does it violate PII policies?” This prevents harmful outputs.
1.6 Managing Large Schemas and Metadata with Neo4j
As the dataset grows (from 3 tables to 25 tables), the metadata (table/column descriptions) can exceed the context length. We use Neo4j to store metadata as a graph. Topics like “weather,” “traffic” are top‑level nodes. Tables like “cities,” “temperature,” “routes” connect to topics. When a query comes, we first pull relevant topic nodes, then retrieve only the related table/column nodes. This multi‑pass approach filters the context to only what’s needed, solving the context‑length problem.
1.7 Scaling and Deployment
Scaling is via an API gateway in front of a Kubernetes cluster with auto‑scaling. I don’t have hands‑on details of the K8s setup, but architects described that approach.
1.8 LLM Upgradation and Model Selection
We use Azure OpenAI, so we upgrade regularly — from GPT‑3.5 to 4o to 4.1, etc. Newer models require retesting, but they improve reasoning and reduce hallucinations. For cost‑efficient tasks we use older or “mini” models. For self‑hosted alternatives we consider DeepSeek, Qwen, Mistral.
1.9 Technical Definitions (Quick‑fire Questions)
Top‑k vs Top‑p: Top‑k returns the k highest probability next tokens. Top‑p (nucleus sampling) returns the smallest set whose cumulative probability ≥ p. Example: if token probabilities are 70%, 25%, 4%… and top‑p=0.9, we take the first two because 70+25=95 which ≥ 90.
Temperature: Controls randomness. Low → greedy (always highest probability token), high → more exploratory.
1.10 SQL Join Types
Left join: all rows from left table, plus matching rows from right table; non‑matching right
side gets NULLs.
Right join: all rows from right table, plus matching rows from left.
Full outer join: all rows from both tables, with NULLs where no match exists.
1.11 Fibonacci Coding Exercise
The candidate wrote pseudocode in a thinking‑aloud style:
“Fibonacci is f(n) = f(n‑1) + f(n‑2). We’ll start from 0 and 1. I think a list would work. For i in range(n): if i==0: append 0; elif i==1: append 1; else: append list[-1] + list[-2]. I tried to run it and it gave output but needed debugging. Reason it didn’t print correctly: range wasn’t set up properly.”
1.12 Wrap‑up: Career Motivation
“I’ve been on this project for 1.5 years. It’s now in maintenance mode — mainly ServiceNow tickets. I want to explore more cutting‑edge agentic stuff, not just maintain what’s built.”
Section 2 – Reconstructed Interviewer Questions
Based on the candidate’s responses, the following questions were likely asked. They are presented in a logical order, paired with the relevant answer summary.
Section 3 – Critique and Improved Answers
Below is a constructive evaluation of the candidate’s responses, highlighting weaknesses and offering a more polished, architect‑level answer.
3.1 Overall Delivery NEEDS WORK
- Excessive fillers & rambling: The transcript contained many “yeah,” “I mean,” “like,” and tangential loops. An AI Architect must communicate with clarity and conciseness.
- Lack of structure: Answers often wandered. For example, explaining the Text‑to‑SQL flow jumped between validation, rewriting, and guardrails without a clear narrative.
- Vagueness on depth: When asked about scaling, the candidate said “I lack details” — unacceptable for an architect role. Better to say “While I haven’t provisioned the K8s cluster myself, the standard pattern we follow is…” and then describe the pattern confidently.
3.2 Architecture Walkthrough FAIR
The candidate mentioned LangGraph, FastAPI, React, but left out crucial architectural diagrams and trade‑offs. As an architect, one should discuss why these choices were made.
3.3 Evaluation Answer INSUFFICIENT
The candidate only mentioned “accuracy” and “golden dataset”. An architect should know specific metrics: Execution Accuracy (EX), Exact Set Match (ESM), ROUGE‑L or BLEU for SQL, validation‑set coverage, hallucination rate, and for RAG, context precision/recall, faithfulness, answer relevancy. The answer lacked method naming and benchmark references.
3.4 Context Engineering vs Prompt Engineering DECENT
The candidate correctly called context engineering a subset, but the distinction was fuzzy. He should have explained that prompt engineering is the overarching practice of designing the entire prompt structure, while context engineering specifically deals with injecting relevant external information (retrieved chunks, metadata, user intent tags).
3.5 Guardrails Answer ADVANCED
The answer touched on code‑based vs LLM‑based guardrails, which is good. But an architect should mention concrete libraries (Guardrails AI, NVIDIA NeMo Guardrails) and cite examples like PII scrubbing, SQL injection prevention, and output schema enforcement. Also, the candidate missed the importance of input guardrails (e.g., refusing “DROP TABLE” instructions).
3.6 Large Schema Handling with Neo4j GOOD CONCEPT, POOR EXPLANATION
The idea of a topic‑driven metadata graph is innovative and architect‑level. However, the candidate struggled to articulate it clearly, using confusing “hierarchy in a graph” metaphors and failing to mention standard techniques like schema‑linking and query‑to‑schema tokenizer alignment. An architect would also mention alternatives like table‑selection via dense retrieval and why Neo4j was chosen (explicit relationship traversal, no need for embedding drift).
3.7 Fibonacci Coding Exercise MISMATCHED
The interviewer explicitly said “you have to use recursion.” The candidate wrote an iterative solution with a list and debugged it aloud. This shows a failure to listen and to translate a requirement into code. The correct recursive approach (with memoization due to exponential complexity) would be:
from functools import lru_cache
@lru_cache(None)
def fib(n):
if n < 2:
return n
return fib(n-1) + fib(n-2)
def fib_sequence(n):
return [fib(i) for i in range(n)]
print(fib_sequence(10)) # [0,1,1,2,3,5,8,13,21,34]
The candidate should have clarified the requirement (e.g., “first n numbers” vs “up to a maximum number”) and
then presented a clean recursive solution, discussing time complexity and the importance of memoization.
3.8 SQL Joins SOLID
The explanation was accurate. However, the candidate hesitated and asked for the question to be repeated. For an architect, the immediate answer should have been crisp: “LEFT JOIN returns all rows from the left table and only the matches from the right; RIGHT JOIN is its mirror; FULL OUTER JOIN returns all rows from both, with NULLs where no match exists.” No need for the extra qualifiers. Still, the content was correct.
3.9 Career Motivation HONEST BUT NEGATIVE
“Maintenance mode… ServiceNow tickets” sounds like complaining. An architect should position the reason positively: “I’m eager to work on more complex, large‑scale agentic systems where I can apply my design skills to solve novel problems, and I see this role as aligned with that growth.”
3.10 Missing Topics GAPS
The candidate did not proactively discuss:
- Observability tools: Only mentioned Phoenix and LangFuse vaguely. An architect should know OpenTelemetry, tracing, and metrics like faithfulness.
- Cost optimization: No mention of token‑usage reduction, caching, semantic caching, or prompt compression.
- Multi‑agent patterns: Although the project is multi‑agent, the candidate didn’t discuss debate, reflection, or plan‑execute patterns — all highly relevant for an agentic architect.
- Security: Beyond guardrails, no discussion of RBAC, row‑level security in NLQ, or tenant isolation.
3.11 Suggested Talking Points for Future Interviews
- Use concrete numbers: “Improved SQL accuracy from 82% to 93% by introducing table‑graph schema linking.”
- Mention standard benchmarks: “We track BIRD, Spider, or WikiSQL metrics internally.”
- Show impact: “Reduced prompt tokens per query by 60% using Neo4j metadata pruning.”
- Discuss failure modes: “We handle ambiguous terms by engaging the user in a clarification loop, which improved first‑attempt success by 20%.”
- Always bring the conversation back to architecture trade‑offs: why agentic vs single‑call, why LangGraph vs semantic kernel, why Azure vs AWS.
End of Report — Prepared by AI Interview Evaluator
Index For Interviews Preparation <<< Previously

No comments:
Post a Comment