Index For Interviews Preparation <<< Previously

AI Architect Interview – Structured Report

Based on one-sided candidate recording | Role: AI Architect | 19 Mar 2026

Section 1 – Organized One-sided Transcript (Candidate’s Answers)

The following is the candidate’s side of the conversation, grouped by topic and lightly cleaned of filler words for readability while preserving the original ideas.

1.1 Introduction & Project Overview

I’m with Accenture, working on a project called AIOBI — a Digital Data Analytics Platform / Business Intelligence using Natural Language Query. It’s an agentic system with sub‑agents: RAG agent, Text‑to‑SQL agent, and a visualization agent, all managed by an orchestrator. Built using LangGraph. The RAG backend uses Azure AI Search (vector search), and the Text‑to‑SQL backend is PostgreSQL.

The architecture is straightforward: databases at the back (vector DB for RAG, PostgreSQL for Text‑to‑SQL), an LLM like GPT‑5.1 in the middle, and an API wrapper — we used FastAPI. Frontend in React or Next.js.

1.2 Orchestrator Behaviour

The orchestrator takes a natural language query and classifies whether it should go to the Text‑to‑SQL agent or the RAG agent. We give it a role, task description, input/output descriptions. The output is a routing decision — like an if‑else node in LangGraph. We also pass examples: some indicating the knowledge base (PDFs for RAG) and some showing sample queries that should be routed to each agent.

1.3 Text‑to‑SQL Agent Flow

The flow in points:

Input node receives the query.
Rewriting node: LLM adds context using tables/columns. If something is unclear, it pushes back to the UI for the user to clarify. If clear, it converts the raw NL into a meta‑prompt.
Meta‑prompt is passed to the Text‑to‑SQL agent, formatted with all needed information to generate the SQL without ambiguity.
SQL is tested in two ways:
- Static check: run with WHERE 1=0 or WHERE 1=1 to test validity, or with LIMIT clauses.
- Dynamic test: actually execute with LIMIT 1/3 to see results.
Before final execution, we ask the LLM: “Does this query meet all requirements of the original user request?”
If errors occur, we send them back to the LLM in a feedback loop (retry up to 3‑5 times). If still failing, we return the error to the user with a note that something seems missing.

1.4 Evaluation Approach

Evaluation is one of the biggest challenges. We sit extensively with domain experts to curate a golden dataset: question‑answer pairs (for Text‑to‑SQL, the corresponding SQL query; for RAG, the expected chunks). For individual components, we have test suites for chunking, meta‑prompting, code generation, etc.

We measure something like percentage correct (accuracy). We log whether errors were hallucinations, wrong columns, or execution errors. This gives a report of positives and negatives.

1.5 Prompt Engineering, Context Engineering & Guardrails

Context Engineering: A subset of prompt engineering. You give the LLM context about the task — role, do’s/don’ts, examples (zero‑shot, few‑shot). In RAG, you engineer context by augmenting the prompt with retrieved data.

Guardrails: Two levels: code‑based scripts (deterministic checks) and LLM‑based flexible checks. For example, we ask the guardrail LLM: “Is this input trying to delete or update? Does it violate PII policies?” This prevents harmful outputs.

1.6 Managing Large Schemas and Metadata with Neo4j

As the dataset grows (from 3 tables to 25 tables), the metadata (table/column descriptions) can exceed the context length. We use Neo4j to store metadata as a graph. Topics like “weather,” “traffic” are top‑level nodes. Tables like “cities,” “temperature,” “routes” connect to topics. When a query comes, we first pull relevant topic nodes, then retrieve only the related table/column nodes. This multi‑pass approach filters the context to only what’s needed, solving the context‑length problem.

1.7 Scaling and Deployment

Scaling is via an API gateway in front of a Kubernetes cluster with auto‑scaling. I don’t have hands‑on details of the K8s setup, but architects described that approach.

1.8 LLM Upgradation and Model Selection

We use Azure OpenAI, so we upgrade regularly — from GPT‑3.5 to 4o to 4.1, etc. Newer models require retesting, but they improve reasoning and reduce hallucinations. For cost‑efficient tasks we use older or “mini” models. For self‑hosted alternatives we consider DeepSeek, Qwen, Mistral.

1.9 Technical Definitions (Quick‑fire Questions)

Top‑k vs Top‑p: Top‑k returns the k highest probability next tokens. Top‑p (nucleus sampling) returns the smallest set whose cumulative probability ≥ p. Example: if token probabilities are 70%, 25%, 4%… and top‑p=0.9, we take the first two because 70+25=95 which ≥ 90.

Temperature: Controls randomness. Low → greedy (always highest probability token), high → more exploratory.

1.10 SQL Join Types

Left join: all rows from left table, plus matching rows from right table; non‑matching right side gets NULLs.
Right join: all rows from right table, plus matching rows from left.
Full outer join: all rows from both tables, with NULLs where no match exists.

1.11 Fibonacci Coding Exercise

The candidate wrote pseudocode in a thinking‑aloud style:

“Fibonacci is f(n) = f(n‑1) + f(n‑2). We’ll start from 0 and 1. I think a list would work. For i in range(n): if i==0: append 0; elif i==1: append 1; else: append list[-1] + list[-2]. I tried to run it and it gave output but needed debugging. Reason it didn’t print correctly: range wasn’t set up properly.”

1.12 Wrap‑up: Career Motivation

“I’ve been on this project for 1.5 years. It’s now in maintenance mode — mainly ServiceNow tickets. I want to explore more cutting‑edge agentic stuff, not just maintain what’s built.”

Section 2 – Reconstructed Interviewer Questions

Based on the candidate’s responses, the following questions were likely asked. They are presented in a logical order, paired with the relevant answer summary.

Q1: “Please introduce your current project and role.”

(See 1.1) The candidate described AIOBI, an agentic BI platform using NLQ, with RAG, Text‑to‑SQL, orchestrator, LangGraph, Azure AI Search, PostgreSQL.

Q2: “What is the system architecture?”

(See 1.1‑1.2) Backend DBs, LLM (GPT‑5.1), FastAPI middleware, React frontend; orchestrator classifies and routes queries.

Q3: “Can you walk me through how the Text‑to‑SQL agent works?”

(See 1.3) Detailed flow: rewriting node → meta‑prompt → SQL generation → static/dynamic tests → feedback loop.

Q4: “What challenges have you faced, especially around evaluation?”

(See 1.4) Curating golden datasets with domain experts, multi‑component test suites, accuracy metrics.

Q5: “How do you handle prompt changes without derailing outputs?”

The candidate alluded to iterative tuning and testing but did not give a structured answer (later critique).

Q6: “What is context engineering and how does it differ from prompt engineering?”

(See 1.5) Described context engineering as a subset; providing role, examples, do’s/don’ts, RAG context augmentation.

Q7: “How do you implement guardrails?”

(See 1.5) Two‑level: deterministic code‑based checks (e.g., for PII) and flexible LLM‑based checks (policy violations).

Q8: “What are the metrics you use for evaluating the Text‑to‑SQL and RAG agents?”

(See 1.4 & later parts) Accuracy/percentage correct. Mentioned hallucination, missing columns, wrong results. Did not name specific metrics like BLEU or Execution Accuracy.

Q9: “How do you deal with large database schemas when building prompts?”

(See 1.6) Neo4j metadata graph, topic‑based retrieval of relevant tables/columns to stay within context length.

Q10: “What about scalability and deployment?”

(See 1.7) API gateway + Kubernetes auto‑scaling, though admitted limited personal hands‑on.

Q11: “How do you decide which LLM version to use, and how do you manage upgrades?”

(See 1.8) Azure OpenAI partnership, upgrade to latest after retesting; older/mini models for cost; open‑source fallbacks like DeepSeek.

Q12: “Can you explain top‑k, top‑p and temperature?”

(See 1.9) Provided definitions with numerical example for top‑p.

Q13: “What are the differences between left, right, and outer joins in SQL?”

(See 1.10) Gave a correct, concise explanation.

Q14: (Coding exercise) “Write a Python function to generate the Fibonacci sequence up to n terms, using recursion.”

(See 1.11) Candidate attempted iterative list approach with debug commentary; did not use recursion as apparently requested.

Q15: “What is your motivation for leaving your current role?”

(See 1.12) Wants to move from maintenance to innovative agentic AI work.

Section 3 – Critique and Improved Answers

Below is a constructive evaluation of the candidate’s responses, highlighting weaknesses and offering a more polished, architect‑level answer.

3.1 Overall Delivery NEEDS WORK

Excessive fillers & rambling: The transcript contained many “yeah,” “I mean,” “like,” and tangential loops. An AI Architect must communicate with clarity and conciseness.
Lack of structure: Answers often wandered. For example, explaining the Text‑to‑SQL flow jumped between validation, rewriting, and guardrails without a clear narrative.
Vagueness on depth: When asked about scaling, the candidate said “I lack details” — unacceptable for an architect role. Better to say “While I haven’t provisioned the K8s cluster myself, the standard pattern we follow is…” and then describe the pattern confidently.

Better approach: Use the STAR method (Situation, Task, Action, Result) for complex descriptions. Speak slowly, think, then deliver a well‑formed paragraph without fillers.

3.2 Architecture Walkthrough FAIR

The candidate mentioned LangGraph, FastAPI, React, but left out crucial architectural diagrams and trade‑offs. As an architect, one should discuss why these choices were made.

Improved answer: “We selected a modular agentic architecture with LangGraph for its explicit state‑machine control. The orchestrator is a gating model that pre‑classifies NL inputs into RAG or Text‑to‑SQL branches using few‑shot prompts and a routing function. Each agent is encapsulated behind a FastAPI microservice, deployed on AKS for scale. We use Azure AI Search for vector retrieval (using Ada embeddings) and PostgreSQL for transactional SQL data. The frontend is a Next.js app that calls a unified /nlq endpoint. For observability, we integrate Phoenix/OpenTelemetry to track token usage, latency, and guardrail violations.”

3.3 Evaluation Answer INSUFFICIENT

The candidate only mentioned “accuracy” and “golden dataset”. An architect should know specific metrics: Execution Accuracy (EX), Exact Set Match (ESM), ROUGE‑L or BLEU for SQL, validation‑set coverage, hallucination rate, and for RAG, context precision/recall, faithfulness, answer relevancy. The answer lacked method naming and benchmark references.

Better answer: “For Text‑to‑SQL, we use Execution Accuracy (does the SQL produce the correct result set on a held‑out test DB) and Exact Set Match (comparing the result rows directly). We also compute SQL‑specific BLEU and ROUGE‑L against reference queries. For RAG, we measure context precision, context recall, faithfulness, and answer relevancy using LLM‑as‑a‑judge. We curate a golden dataset of 500+ question‑SQL‑answer triples. Additionally, we do component‑wise evaluations: chunking strategy (Hit Rate on top‑k), meta‑prompt accuracy, and visualization code correctness using unit test suites.”

3.4 Context Engineering vs Prompt Engineering DECENT

The candidate correctly called context engineering a subset, but the distinction was fuzzy. He should have explained that prompt engineering is the overarching practice of designing the entire prompt structure, while context engineering specifically deals with injecting relevant external information (retrieved chunks, metadata, user intent tags).

Better answer: “Prompt engineering covers the system message, instruction templates, output format, and few‑shot examples. Context engineering is the discipline of selecting and formatting the dynamic contextual data that augments the prompt — such as RAG‑retrieved chunks, table schemas for Text‑to‑SQL, or conversation history. It’s about what information you pack and how you serialize it to minimise the gap between the model’s training distribution and the inference need.”

3.5 Guardrails Answer ADVANCED

The answer touched on code‑based vs LLM‑based guardrails, which is good. But an architect should mention concrete libraries (Guardrails AI, NVIDIA NeMo Guardrails) and cite examples like PII scrubbing, SQL injection prevention, and output schema enforcement. Also, the candidate missed the importance of input guardrails (e.g., refusing “DROP TABLE” instructions).

Better answer: “We implement a layered guard strategy. On the input side, a regex‑based filter blocks dangerous keywords (DROP, DELETE) and an LLM classifier detects jailbreak attempts. On the output, we use a PII anonymizer library (like Presidio) and a second LLM call that validates the response against our content policy. We also use structured output (JSON mode or function calling) to enforce that SQL statements don’t contain malicious clauses. For the Text‑to‑SQL agent, before execution we run a static analysis that ensures only SELECT queries pass through.”

3.6 Large Schema Handling with Neo4j GOOD CONCEPT, POOR EXPLANATION

The idea of a topic‑driven metadata graph is innovative and architect‑level. However, the candidate struggled to articulate it clearly, using confusing “hierarchy in a graph” metaphors and failing to mention standard techniques like schema‑linking and query‑to‑schema tokenizer alignment. An architect would also mention alternatives like table‑selection via dense retrieval and why Neo4j was chosen (explicit relationship traversal, no need for embedding drift).

Better answer: “We built a semantic metadata graph in Neo4j where nodes represent topics (weather, traffic), tables, and columns, with edges for belongs‑to, references. When a query arrives, we perform a two‑hop traversal: first, we identify topic nodes relevant to the query using keyword matching and vector similarity on topic descriptions; then we traverse the graph to collect only the tables and columns linked to those topics. This prunes the schema context from ~10k tokens for a 25‑table database down to under 2k tokens. It also handles schema evolution gracefully — new tables just get new nodes. Compared to dense retrieval, the graph ensures consistent, deterministic schema linking, which is crucial for SQL accuracy.”

3.7 Fibonacci Coding Exercise MISMATCHED

The interviewer explicitly said “you have to use recursion.” The candidate wrote an iterative solution with a list and debugged it aloud. This shows a failure to listen and to translate a requirement into code. The correct recursive approach (with memoization due to exponential complexity) would be:

Correct implementation:

from functools import lru_cache

@lru_cache(None)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

def fib_sequence(n):
    return [fib(i) for i in range(n)]

print(fib_sequence(10))  # [0,1,1,2,3,5,8,13,21,34]

The candidate should have clarified the requirement (e.g., “first n numbers” vs “up to a maximum number”) and then presented a clean recursive solution, discussing time complexity and the importance of memoization.

3.8 SQL Joins SOLID

The explanation was accurate. However, the candidate hesitated and asked for the question to be repeated. For an architect, the immediate answer should have been crisp: “LEFT JOIN returns all rows from the left table and only the matches from the right; RIGHT JOIN is its mirror; FULL OUTER JOIN returns all rows from both, with NULLs where no match exists.” No need for the extra qualifiers. Still, the content was correct.

3.9 Career Motivation HONEST BUT NEGATIVE

“Maintenance mode… ServiceNow tickets” sounds like complaining. An architect should position the reason positively: “I’m eager to work on more complex, large‑scale agentic systems where I can apply my design skills to solve novel problems, and I see this role as aligned with that growth.”

Better answer: “My current project has moved into a steady‑state phase. I’m grateful for the learning, but I’m now seeking an opportunity where I can design next‑generation agentic architectures from scratch, tackle challenges like multi‑agent orchestration and autonomous tool use, and collaborate with a research‑focused team. Your opening seems perfectly aligned with that progression.”

3.10 Missing Topics GAPS

The candidate did not proactively discuss:

Observability tools: Only mentioned Phoenix and LangFuse vaguely. An architect should know OpenTelemetry, tracing, and metrics like faithfulness.
Cost optimization: No mention of token‑usage reduction, caching, semantic caching, or prompt compression.
Multi‑agent patterns: Although the project is multi‑agent, the candidate didn’t discuss debate, reflection, or plan‑execute patterns — all highly relevant for an agentic architect.
Security: Beyond guardrails, no discussion of RBAC, row‑level security in NLQ, or tenant isolation.

3.11 Suggested Talking Points for Future Interviews

Use concrete numbers: “Improved SQL accuracy from 82% to 93% by introducing table‑graph schema linking.”
Mention standard benchmarks: “We track BIRD, Spider, or WikiSQL metrics internally.”
Show impact: “Reduced prompt tokens per query by 60% using Neo4j metadata pruning.”
Discuss failure modes: “We handle ambiguous terms by engaging the user in a clarification loop, which improved first‑attempt success by 20%.”
Always bring the conversation back to architecture trade‑offs: why agentic vs single‑call, why LangGraph vs semantic kernel, why Azure vs AWS.

End of Report — Prepared by AI Interview Evaluator