Thursday, May 14, 2026

Interview at Cognizant for Lead AI Engineer Role (2024 May 22)

Index For Interviews Preparation
<<< Previously    Next >>>

INTERVIEW INTELLIGENCE REPORT

Lead AI Engineer
Call Reconstruction & Critique

Position: Lead AI Engineer Format: Telephonic Screening Analyst: Claude Sonnet

Organised & Structured Transcript

The raw call recording was fragmented and conversational. Below is the cleaned, logically sequenced account of what the interviewee communicated, grouped by topic.

Topic A

The Anomaly Detection Project — Amex Loyalty Platform

  • Project involved detecting anomalies in credit card transaction data on the American Express loyalty platform.
  • Anomaly categories targeted: unusually large-amount transactions, unusually small-amount transactions, and anomalies by merchant type.
  • Business outcome: the client used these flagged anomalies to generate alerts in their platform and decide whether to block suspicious transactions.
  • New data was provided on a quarterly basis for ongoing inference.
Topic B

Data Engineering & Infrastructure

  • Historical training data spanned approximately one to two years of credit card transactions.
  • Data originated from Amex's mainframe systems.
  • A dedicated data engineering team was responsible for extracting and loading data from mainframes into Hive-based databases.
  • The data science team consumed this data via PySpark, running on a Jupyter-like notebook environment within a platform called Cornerstone (a mixed/managed compute platform).
  • Data was entirely structured (tabular credit card transaction records).
Topic C

Feature Engineering & Modelling Approach

  • Although the raw data had many columns, the team narrowed focus to four to five key features: transaction amount, merchant type, and time (used primarily for visualisation).
  • The problem was framed as unsupervised learning — no ground-truth labels existed.
  • Three model architectures were evaluated:
    1. Isolation Forest
    2. Autoencoder (neural network based)
    3. K-Medians clustering
Topic D

Contamination Factor & Model Validation

  • Because there were no labels, a Gaussian Mixture Model (GMM) was used to estimate the contamination factor — the expected proportion of anomalies in the dataset.
  • Anomaly scores from Isolation Forest and the Autoencoder were plotted in a scatter plot. Density analysis revealed two regions: a high-density core (normal) and a sparse periphery (anomalous).
  • The sparse cluster's percentage of total points became the contamination factor fed into the final models.
  • A human-in-the-loop existed: the loyalty/transaction team monitored alerts raised by the system, each alert triggering a ticket for review.
  • Precision was reported as above 75–80%, with acknowledged volatility during trend shifts (e.g., Christmas peak spend causing a temporary spike in false positives).
Topic E

Generative AI Experience — Semantic Search POC

  • While on the bench at Cognizant (first two months), developed an internal semantic search POC.
  • Source corpus: issues and Q&A threads scraped from GitHub and Stack Overflow.
  • Questions and answers were converted to vector embeddings and stored in a vector database.
  • At query time, the input was embedded and compared against the stored embeddings using similarity search to retrieve closest matches.
  • Self-characterised as a "POC-level" GenAI engagement — not a production deployment.
Topic F

Location Preferences & HR Discussion

  • Based in Delhi (Inderlok area), commuting to Tikri Sector 48 office; also stays at Sector 79.
  • Family constraints (mother) tie him to Delhi NCR.
  • First preference: Delhi. Acceptable: Gurgaon, Noida. Difficult: Pune, Indore. Not preferred: Bangalore, Chennai.
  • Asked about current HCM (Shivam Shrivastav, from AML CoE) — was told HCM may change on project allocation.

Reconstructed Q&A — Full Dialogue

The interviewer's questions have been inferred from context and the interviewee's responses. Each exchange is presented as a coherent dialogue unit.

QCould you walk me through your most recent project?
AMy last project was anomaly detection for the Amex loyalty platform — detecting anomalies in credit card transactions. We looked at large-amount transactions, small-amount transactions, and merchant type patterns. The client used the flagged anomalies to raise alerts in their system and decide whether to block those transactions. We got new data to run inference on every quarter.
QCan you elaborate on the training data — how did you start model training? What historical data did you use?
AWe had one to two years of historical transaction data — I believe it was closer to two years. The data came from Amex's mainframe systems. Their data engineering team was responsible for pulling it from the mainframe into Hive-based databases. From there, we connected using PySpark in a Jupyter-like environment on a managed platform called Cornerstone.
QWas the data structured or unstructured? And what were the data challenges — how did you handle data cleaning?
AIt was structured data. The main challenge was feature selection — there were many available columns, so we narrowed down to about four or five features: merchant type, transaction amount, and time. Time was used more for visualisation and trend plotting. For anomaly detection itself, we treated the data as a batch.
QSince there were no labels, how did you validate that your model was correctly detecting anomalies? What was the validation mechanism?
AWe used a contamination factor approach. We ran both the Autoencoder and the Isolation Forest models, obtained anomaly scores from each, and plotted them in a scatter plot. Using density analysis — essentially a Gaussian Mixture Model — we identified the dense normal cluster and the sparse anomalous periphery. The proportion of points in the sparse cluster gave us the contamination factor, which we used to tune how aggressive our anomaly threshold should be.
QWas there any human validation loop involved?
AYes — the loyalty and transaction monitoring team at Amex reviewed every alert raised. Each flag generated a ticket, so there was a human-in-the-loop reviewing the outputs.
QWhat was the accuracy — or in this case, the precision of the model?
ASince it's unsupervised, accuracy isn't the right metric — precision is more appropriate. We were above 75–80%. The number did fluctuate, especially during trend shifts like Christmas, where genuine transaction spikes initially generated more false positives before the model adjusted.
QWas there any attempt to improve that 75% number?
AIt's hard to pin down a single improvement target because precision fluctuates naturally with evolving spending behaviour. The contamination factor approach helps it self-adjust over time, but trend breaks do cause temporary dips.
QCan you walk me through a Generative AI project you've worked on?
AWhile on bench at Cognizant, I built a semantic search POC. We collected questions and answers from GitHub issues and Stack Overflow, converted them into embeddings, and stored them in a vector database. When a new query came in, we embedded it and retrieved the closest matching entries from the database using similarity search.
QAre you open to relocation, or do you have a location preference?
AMy first preference is Delhi NCR — I have family here. Gurgaon and Noida are also acceptable. Pune and Indore are more difficult due to distance. Bangalore and Chennai would be quite challenging for me at this point.

Critique & Better Answers

A frank, point-by-point evaluation of the responses — identifying weaknesses in communication, technical depth, and strategic framing, with the sharper answer each question deserved.

On introducing the Amex anomaly detection project
Needs Work
What Went Wrong

The introduction was rambling and repetitive — "large transactions or very small number of transactions, large amount transaction or small amount transactions" was said almost verbatim twice. The business impact was buried and vague ("should we block them"). There was no STAR-style framing: no clear statement of scale, no team context, no timeline, and no outcome lead. An interviewer for a Lead role expects structured, confident narration — not a stream-of-consciousness recall.

The Better Answer
"At Amex's loyalty platform, I led the data science effort on an unsupervised anomaly detection system for credit card transactions. The problem had three anomaly signals: outlier transaction amounts — both unusually large and unusually small — and abnormal merchant-type patterns that deviated from a cardholder's historical behaviour. The business use case was operational risk: alerts fed directly into the platform's transaction-blocking pipeline. We processed roughly two years of historical mainframe data, built our pipeline in PySpark on Hive, and ran quarterly inference cycles. I drove the model selection, contamination factor calibration, and the human-review integration with the loyalty operations team."
On data challenges and feature engineering
Superficial
What Went Wrong

You reduced an inherently rich challenge to "we zeroed down on four or five features." For a Lead role, the interviewer wants to understand how you chose those features — what was your methodology? Was there domain knowledge involved? Did you run correlation analysis, VIF, or feature importance from a supervised proxy? You also glossed over data quality issues entirely — mainframe-sourced transaction data is notoriously messy (encoding issues, missing fields, schema drift). Saying "it was structured data" and moving on was a missed opportunity.

The Better Answer
"The raw data had 30-plus columns from the mainframe. Feature selection was a deliberate process — we started by eliminating PII and low-variance fields, then used domain knowledge from the Amex loyalty team to shortlist candidates. We settled on transaction amount, merchant category code (MCC), transaction frequency per time window, time-of-day, and days since last transaction. One non-trivial challenge was schema drift — the mainframe schemas had evolved and we had to handle column remapping across data batches. We also had to normalise amounts for currency and seasonal effects before any modelling."
On the model architecture and contamination factor
Strong, But Unclear
What Went Wrong

The technical substance here was genuinely solid — the GMM-based contamination estimation, ensemble of Isolation Forest and Autoencoder anomaly scores, and density-based threshold-setting is a legitimate and thoughtful methodology. However, the explanation was confused and hard to follow. The phrase "we create two clusters… not two clusters, basically one cluster and outside" is almost incoherent when spoken. For a Lead AI Engineer, clarity of technical communication is as important as the technical knowledge itself. You also never explained why you chose Isolation Forest + Autoencoder specifically, or why you dropped K-Medians.

The Better Answer
"We chose Isolation Forest for its efficiency on high-dimensional tabular data — it's interpretable and handles sparse anomalies well. The Autoencoder complemented it by capturing non-linear feature interactions through reconstruction error. K-Medians was explored but dropped because it was sensitive to our choice of K and the clusters weren't semantically meaningful. To set the contamination threshold — the proportion of anomalies to expect — we used a Gaussian Mixture Model on the joint anomaly score distribution from both models. The GMM naturally separated the dense inlier mass from the diffuse anomalous tail, giving us an empirically grounded contamination estimate rather than a hand-tuned guess."
On precision of ~75% and improvement efforts
Defensive & Incomplete
What Went Wrong

When pushed on why 75% precision wasn't higher, the response became defensive and wandered into an explanation of seasonal false positives — which, while valid, sounded like excuse-making rather than engineering problem-solving. A Lead Engineer should respond to a precision ceiling by describing active remediation strategies: retraining cadence, concept drift detection, ensemble re-weighting, or feedback loop design. You also never mentioned whether you measured recall or framed a precision-recall tradeoff, which is critical in fraud/anomaly contexts where false negatives (missed frauds) are often costlier than false positives.

The Better Answer
"75–80% precision was our baseline. We tracked precision alongside alert-review conversion rates from the operations team to estimate recall indirectly. To address precision degradation during trend shifts, we implemented a quarterly model retraining pipeline where the contamination factor was recalibrated using the previous quarter's confirmed anomaly tickets as soft labels. We also explored a sliding-window retraining scheme for faster adaptation to spend pattern shifts — though that was still in progress when the engagement ended. The goal was to reach and sustain above 85% precision without increasing the operations team's alert review load."
On GenAI experience — semantic search POC
Significantly Undersold
What Went Wrong

This is the most damaging part of the interview. You were interviewing for a Lead AI Engineer role in 2024–25 — a role that almost certainly has significant GenAI expectations. You self-described your GenAI background as "not much experience" and spent fewer than five sentences on the only GenAI project you mentioned. You did not name the embedding model used, the vector database, the chunking strategy, the retrieval method (cosine similarity? FAISS? ANN?), or any evaluation approach. You also did not mention any current reading, self-directed learning, or projects in LLMs, RAG pipelines, or LangChain/LangGraph — all of which you've actually explored. This is a credibility-damaging gap for a Lead role.

The Better Answer
"The semantic search POC used sentence-transformers — specifically the all-MiniLM-L6-v2 model — to embed GitHub and Stack Overflow Q&A pairs. We stored embeddings in FAISS with an IVF index for efficient approximate nearest-neighbour retrieval. Beyond this POC, I've been deepening my GenAI stack — I've worked with RAG pipeline architectures, explored LangGraph for agentic workflows, and studied LLM evaluation frameworks including RAGAS for retrieval quality measurement. My core ML background in anomaly detection gives me strong fundamentals in embedding spaces and distance-based reasoning, which translates well into modern GenAI retrieval problems."
On location preferences — HR discussion
Manageable, But Risky
What Went Wrong

Raising the location constraint repeatedly and with evident anxiety signals to the interviewer that you may be inflexible. Asking whether it would "impact your availability to the client" reveals a self-awareness about the disadvantage — which, when voiced aloud, reinforces it. In a bench situation, flexibility is a competitive advantage. The better approach is to state a preference clearly and confidently once, without revisiting it or asking the interviewer to manage it for you.

The Better Answer
"My primary preference is Delhi NCR — Delhi, Gurgaon, or Noida. I can make that work immediately. For the right opportunity, I'm open to discussing other locations on a case-by-case basis — especially if there's flexibility in terms of hybrid or project-phase travel. I'd appreciate it if that preference is noted, but I don't want it to be a limiting factor in the evaluation."
Overall Assessment
Technical Depth
70%
Communication Clarity
45%
GenAI Readiness (as presented)
30%
Leadership Signalling
35%
Problem-Solving Framing
60%
HR / Positioning
55%
Verdict

The underlying expertise is real — the contamination factor methodology, the ensemble approach, and the PySpark/Hive stack show genuine ML engineering experience. But the presentation of that expertise was significantly below what a Lead AI Engineer role demands. Two changes would have materially improved the outcome: (1) preparing structured, confident narration for each project using a STAR framework, and (2) leading with GenAI competence rather than apologising for its limits. The knowledge is there — the packaging needs work.

No comments:

Post a Comment