C4 · Interview Pack

PEFT and RAG Interview Questions

This file targets high-depth interviews on adaptation strategy, retrieval architecture, and production-safe optimization.

advanced 28 questions vunknown

Not tracked yet

Jump to section

Scope
How To Use This File
Interviewer Probe Map
Question Clusters
Foundations
Q1: Prompting vs RAG vs LoRA decision
Q2: How do you choose LoRA rank?
Q3: Diagnose weak RAG faithfulness
Q4: When should you use hybrid retrieval?
Q5: How do you build a RAG eval set?
Q6: Why can larger chunks hurt relevance?
Q7: How do you reduce RAG latency without quality collapse?
Q8: Context-window expansion vs better retrieval quality
Q9: Multi-tenant access control in RAG
Q10: Metadata schema for traceable citations
Systems and Rollout
Q11: Embedding model migration plan with low risk
Q12: ANN index parameter tuning strategy
Q13: Retrieval latency budget decomposition
Q14: Query rewriting policy and safety controls
Q15: When PEFT beats prompt-only methods
Q16: QLoRA production caveats
Q17: Offline improvements but online regressions
Q18: Citation verification architecture
Q19: Cold-start strategy for new corpus
Q20: Multilingual retrieval architecture choices
Debugging and Incidents
Q21: Faithfulness dropped but recall@k is stable
Q22: Recall dropped after ingestion pipeline change
Q23: Increasing top-k made answers worse
Q24: Reranker helps offline but hurts p95 online
Q25: Model returns citations that look valid but are wrong
Q26: Tenant leakage incident in retrieval logs
Q27: Is regression from model swap or retriever drift?
Q28: Rollback criteria for RAG deployment
Rapid-Fire Round
Company Emphasis
References

Jump to question

Q1: Prompting vs RAG vs LoRA decision
Q2: How do you choose LoRA rank?
Q3: Diagnose weak RAG faithfulness
Q4: When should you use hybrid retrieval?
Q5: How do you build a RAG eval set?
Q6: Why can larger chunks hurt relevance?
Q7: How do you reduce RAG latency without quality collapse?
Q8: Context-window expansion vs better retrieval quality
Q9: Multi-tenant access control in RAG
Q10: Metadata schema for traceable citations
Q11: Embedding model migration plan with low risk
Q12: ANN index parameter tuning strategy
Q13: Retrieval latency budget decomposition
Q14: Query rewriting policy and safety controls
Q15: When PEFT beats prompt-only methods
Q16: QLoRA production caveats
Q17: Offline improvements but online regressions
Q18: Citation verification architecture
Q19: Cold-start strategy for new corpus
Q20: Multilingual retrieval architecture choices
Q21: Faithfulness dropped but recall@k is stable
Q22: Recall dropped after ingestion pipeline change
Q23: Increasing top-k made answers worse
Q24: Reranker helps offline but hurts p95 online
Q25: Model returns citations that look valid but are wrong
Q26: Tenant leakage incident in retrieval logs
Q27: Is regression from model swap or retriever drift?
Q28: Rollback criteria for RAG deployment

PEFT and RAG Interview Questions

Scope

This file targets high-depth interviews on adaptation strategy, retrieval architecture, and production-safe optimization.

How To Use This File

Practice top questions with four layers:
1. short answer
2. deep answer
3. follow-up ladder
4. anti-pattern answer to avoid
Use retrieval metrics and rollout controls in every systems answer.

Interviewer Probe Map

Can you separate knowledge problems from behavior problems?
Can you debug RAG by stage instead of guessing at prompts?
Can you justify cost, latency, and quality tradeoffs with metrics?

flowchart TD
    A[Need quality improvement] --> B{Knowledge missing or stale?}
    B -- Yes --> C[Use RAG baseline]
    B -- No --> D{Behavior/style mismatch?}
    D -- Yes --> E[Use prompt then PEFT if needed]
    D -- No --> F[Fix orchestration or data path]
    C --> G[Measure retrieval and faithfulness]
    E --> H[Measure behavior delta and drift risk]
    G --> I[Select least complex winning path]
    H --> I

Figure: Adaptation choice path for interviews and design rounds.

Question Clusters

Foundations: Q1 to Q10
Systems and Rollout: Q11 to Q20
Debugging and Incidents: Q21 to Q28

Foundations

Q1: Prompting vs RAG vs LoRA decision

What interviewer is probing:

Strategy selection under changing constraints.

Direct answer: Use prompting for lightweight behavior shaping, RAG for dynamic knowledge grounding, and LoRA/PEFT for durable behavior adaptation when prompt-only performance plateaus.

Deep answer:

Classify gap: missing facts, weak reasoning pattern, or style/control issue.
Start with the least invasive option that can be evaluated quickly.
If facts are stale or private, choose RAG first.
If behavior is consistently wrong despite strong context, evaluate PEFT.
Define rollback criteria before rollout.

Follow-up variants:

How would you compare options with the same latency budget?
When is PEFT a mistake for enterprise knowledge tasks?

Common mistakes and red flags: "Fine-tune first because it is more powerful" without diagnosing problem type.

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q2: How do you choose LoRA rank?

What interviewer is probing:

Practical PEFT tuning and overfitting awareness.

Direct answer: Treat rank as a capacity knob: increase until marginal quality gain flattens relative to latency and memory cost.

Deep answer: Run a small rank sweep (for example low, medium, high) on a fixed eval set. Track quality gain, variance across slices, and inference overhead. Prefer the smallest rank that meets target metrics and remains stable across domains.

Follow-up variants:

What signs indicate rank is too low?
How do you detect PEFT overfitting on narrow data?

Common mistakes and red flags: Choosing rank by convention with no eval evidence.

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q3: Diagnose weak RAG faithfulness

What interviewer is probing:

Retrieval-first debugging discipline.

Direct answer: Separate retrieval quality from generation behavior before changing prompts or models.

Deep answer:

Measure retrieval recall@k and precision@k on failing queries.
Validate citation correctness and context ordering.
Inspect chunk boundaries and metadata filters.
Add or tune reranker only if first-pass retrieval is noisy.
Tighten grounded prompting and abstention policy.

Follow-up variants:

If recall is high but faithfulness is low, where do you look next?
What online metric catches this regression early?

Common mistakes and red flags: Blindly increasing top-k and token budget.

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q4: When should you use hybrid retrieval?

What interviewer is probing:

Dense and lexical tradeoff reasoning.

Direct answer: Use hybrid when traffic includes IDs, product names, acronyms, or exact-match constraints that dense retrieval misses.

Deep answer: Dense handles semantic recall, lexical preserves exact constraints. Enterprise corpora usually require both. Use rank fusion and evaluate by query slice. Hybrid should be justified by measurable gains on ID-heavy and troubleshooting queries.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q5: How do you build a RAG eval set?

What interviewer is probing:

Evaluation design and regression safety.

Direct answer: Create a slice-balanced set with gold evidence references and expected answer properties.

Deep answer: Include high-frequency tasks, edge cases, and adversarial prompts. Label relevant chunks and define graders for retrieval and faithfulness. Keep the set versioned and tie each production change to eval deltas.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q6: Why can larger chunks hurt relevance?

What interviewer is probing:

Chunking granularity intuition.

Direct answer: Use a clear, constraint-first decision for why can larger chunks hurt relevance?, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q7: How do you reduce RAG latency without quality collapse?

What interviewer is probing:

Optimization under SLO pressure.

Direct answer: Use a clear, constraint-first decision for how do you reduce rag latency without quality collapse?, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q8: Context-window expansion vs better retrieval quality

What interviewer is probing:

Cost-aware architecture judgment.

Direct answer: Use a clear, constraint-first decision for context-window expansion vs better retrieval quality, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q9: Multi-tenant access control in RAG

What interviewer is probing:

Security and isolation in retrieval systems.

Direct answer: Use a clear, constraint-first decision for multi-tenant access control in rag, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q10: Metadata schema for traceable citations

What interviewer is probing:

Data modeling for auditability.

Systems and Rollout

Direct answer: Use a clear, constraint-first decision for metadata schema for traceable citations, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q11: Embedding model migration plan with low risk

What interviewer is probing:

Safe migration design.

Direct answer: Use a clear, constraint-first decision for embedding model migration plan with low risk, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q12: ANN index parameter tuning strategy

What interviewer is probing:

Recall-latency tuning under constraints.

Direct answer: Use a clear, constraint-first decision for ann index parameter tuning strategy, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q13: Retrieval latency budget decomposition

What interviewer is probing:

Component-level performance ownership.

Direct answer: Use a clear, constraint-first decision for retrieval latency budget decomposition, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q14: Query rewriting policy and safety controls

What interviewer is probing:

Correctness and observability of query transforms.

Direct answer: Use a clear, constraint-first decision for query rewriting policy and safety controls, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q15: When PEFT beats prompt-only methods

What interviewer is probing:

Long-term adaptation strategy.

Direct answer: Use a clear, constraint-first decision for when peft beats prompt-only methods, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q16: QLoRA production caveats

What interviewer is probing:

Quantization plus adaptation risk awareness.

Direct answer: Use a clear, constraint-first decision for qlora production caveats, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q17: Offline improvements but online regressions

What interviewer is probing:

Distribution-shift diagnosis.

Direct answer: Use a clear, constraint-first decision for offline improvements but online regressions, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q18: Citation verification architecture

What interviewer is probing:

Grounding enforcement beyond formatting checks.

Direct answer: Use a clear, constraint-first decision for citation verification architecture, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q19: Cold-start strategy for new corpus

What interviewer is probing:

Pragmatic bootstrapping decisions.

Direct answer: Use a clear, constraint-first decision for cold-start strategy for new corpus, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q20: Multilingual retrieval architecture choices

What interviewer is probing:

Cross-language search quality reasoning.

Debugging and Incidents

Direct answer: Use a clear, constraint-first decision for multilingual retrieval architecture choices, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q21: Faithfulness dropped but recall@k is stable

What interviewer is probing:

Stage isolation discipline.

Direct answer: Use a clear, constraint-first decision for faithfulness dropped but recall@k is stable, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q22: Recall dropped after ingestion pipeline change

What interviewer is probing:

Data pipeline regression debugging.

Direct answer: Use a clear, constraint-first decision for recall dropped after ingestion pipeline change, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q23: Increasing top-k made answers worse

What interviewer is probing:

Context-noise tradeoff understanding.

Direct answer: Use a clear, constraint-first decision for increasing top-k made answers worse, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q24: Reranker helps offline but hurts p95 online

What interviewer is probing:

Production gating and selective routing.

Direct answer: Use a clear, constraint-first decision for reranker helps offline but hurts p95 online, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q25: Model returns citations that look valid but are wrong

What interviewer is probing:

Semantic citation verification design.

Direct answer: Use a clear, constraint-first decision for model returns citations that look valid but are wrong, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q26: Tenant leakage incident in retrieval logs

What interviewer is probing:

Incident handling and containment.

Direct answer: Use a clear, constraint-first decision for tenant leakage incident in retrieval logs, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q27: Is regression from model swap or retriever drift?

What interviewer is probing:

Layered attribution methodology.

Direct answer: Use a clear, constraint-first decision for is regression from model swap or retriever drift?, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Q28: Rollback criteria for RAG deployment

What interviewer is probing:

Operational discipline under pressure.

flowchart TD
    A[RAG Failure Reported] --> B{Recall@k dropped?}
    B -- Yes --> C[Inspect ingestion freshness chunking filters]
    B -- No --> D{Citation correctness dropped?}
    D -- Yes --> E[Audit context builder and citation validator]
    D -- No --> F{Latency spike?}
    F -- Yes --> G[Tune top-k rerank thresholds and index params]
    F -- No --> H[Inspect prompt template and model changes]

Figure: Fast triage path for retrieval and grounding incidents.

Rapid-Fire Round

Two signs reranking is worth its latency tax.
Three causes of citation mismatch despite high recall.
One case where PEFT clearly outperforms prompt-only adaptation.
Two reasons query rewriting can reduce trust if unobserved.

Company Emphasis

Amazon:
- explicit cost and operational ownership.
- measurable rollback criteria.
Google:
- stronger retrieval metric fluency and ablation rigor.
- deeper follow-ups on embedding/index tradeoffs.
Startup:
- fast iterative loops and pragmatic architecture decisions.
- clear prioritization under small-team constraints.

References

lora-and-qlora-practical-guide.md
rag-pipeline-and-retrieval-optimization.md
RAG paper: https://arxiv.org/abs/2005.11401
BEIR benchmark: https://arxiv.org/abs/2104.08663

Direct answer: Use a clear, constraint-first decision for rollback criteria for rag deployment, then state one production tradeoff (latency, cost, or reliability).

Deep answer:

State assumptions, constraints, and success metric.
Explain the chosen design or algorithm and why alternatives are weaker.
Cover failure handling, observability, and rollback criteria.

Common mistakes and red flags:

Naming tools or algorithms without mapping them to constraints.
Ignoring edge cases, failure modes, or rollback triggers.
Skipping metrics needed to prove the design works in production.

Follow-up variants:

What changes if throughput doubles or latency budget is cut in half?
Which single metric would trigger rollback after deployment?

Sample code or pseudocode (when relevant):

# Interview outline
1) Validate inputs and constraints
2) Apply core strategy
3) Add failure handling and observability hooks

Related Explainers

Use these explainers to reinforce weak interview topics.

PEFT and RAG Interview Questions

PEFT and RAG Interview Questions

Scope

How To Use This File

Interviewer Probe Map

Question Clusters

Foundations

Q1: Prompting vs RAG vs LoRA decision

Q2: How do you choose LoRA rank?

Q3: Diagnose weak RAG faithfulness

Q4: When should you use hybrid retrieval?

Q5: How do you build a RAG eval set?

Q6: Why can larger chunks hurt relevance?

Q7: How do you reduce RAG latency without quality collapse?

Q8: Context-window expansion vs better retrieval quality

Q9: Multi-tenant access control in RAG

Q10: Metadata schema for traceable citations

Systems and Rollout

Q11: Embedding model migration plan with low risk

Q12: ANN index parameter tuning strategy

Q13: Retrieval latency budget decomposition

Q14: Query rewriting policy and safety controls

Q15: When PEFT beats prompt-only methods

Q16: QLoRA production caveats

Q17: Offline improvements but online regressions

Q18: Citation verification architecture

Q19: Cold-start strategy for new corpus

Q20: Multilingual retrieval architecture choices

Debugging and Incidents

Q21: Faithfulness dropped but recall@k is stable

Q22: Recall dropped after ingestion pipeline change

Q23: Increasing top-k made answers worse

Q24: Reranker helps offline but hurts p95 online

Q25: Model returns citations that look valid but are wrong

Q26: Tenant leakage incident in retrieval logs

Q27: Is regression from model swap or retriever drift?

Q28: Rollback criteria for RAG deployment

Rapid-Fire Round

Company Emphasis

References

Related Explainers

LoRA and QLoRA Practical Guide

RAG Pipeline and Retrieval Optimization