hiring 2025-12-29 ¡ Updated 2025-12-29

FSIQ vs. GAI in the Age of GPT — Rethinking Tech Hiring

bombing-interview

A memo for teams that want builders, not answer sheets.


TL;DR

  • GPT now produces canonical coding answers in ~30 seconds. If your loop measures recall under pressure, you’re optimizing for what machines already do well.
  • FSIQ ≈ speed + working memory + puzzle fluency. GAI ≈ judgment, transfer, and synthesis across contexts. EQ keeps solutions grounded in real customer needs.
  • Today’s interview monoculture overweights Family Feud–style trivia and underweights creativity, prioritization, and remix.
  • Redesign your loop to measure framing, prioritization, cross‑domain adaptation, and empathy—not just reproduction.

1) The Paradox We’re Living In

Give GPT: “Two arrays of characters. Find the longest common contiguous substring.” In ~30 seconds it returns:

  • Brute force, dynamic programming, and even a cross‑correlation / FFT approach.
  • Complexity analysis, edge cases, and runnable tests.

Meanwhile humans are asked to spend 15 minutes restating, walking the naïve path, then deriving the textbook optimum while narrating Big‑O. We’ve turned interviews into a quiz show—and then wonder why the results feel shallow.


2) FSIQ vs. GAI (and Why It Matters)

  • FSIQ (Full‑Scale IQ): Composite of processing speed, working memory, verbal reasoning, spatial reasoning—think horsepower.
  • GAI (General Ability Index): Derived from Verbal Comprehension and Perceptual Reasoning subtests, reducing influence of Working Memory and Processing Speed—think judgment, transfer, synthesis.
  • EQ: Empathy and communication that keep elegant solutions from becoming premature cleverness.

Interviews rarely use these labels, but most loops test FSIQ proxies: speed on a whiteboard and short‑term juggling under stress. In 2025, that’s the automatable part.


3) What Interviews Actually Measure (Mapped to WAIS‑IV)

Wechsler subtests → common tech loop behaviors:

  • Processing Speed (Coding, Symbol Search) → rapid array/link manipulations, typing speed, whiteboard fluency.
  • Working Memory (Digit Span, Arithmetic) → holding invariants, unrolled loops, pointer juggling.
  • Perceptual Reasoning (Block Design, Matrix Reasoning) → pattern spotting, algorithmic “A‑ha.”
  • Verbal Comprehension (Vocabulary, Similarities) → clarity of explanation, naming invariants, API design rationale.

Loop design today overweights the first two. That’s exactly where AI excels. (Alternatives exist—see Stripe’s laptop‑based interviews.)


4) Talent Profiles (and How to Hire Them)

  • Medium FSIQ • High GAI — Not the fastest on contrived puzzles, but exceptional at problem selection, simplification, and cross‑domain transfer. They may stumble on a timed board; given production context, they quietly build what users actually need.
  • High FSIQ • Medium GAI — Puzzle wizards. They ace loops, and may over‑optimize the unimportant. They shine with clear product constraints and a partner who anchors trade‑offs.
  • High FSIQ • High GAI — The unicorn. Beware confusing rehearsal with range; validate with remix drills and framing.
  • Medium • Medium, High Ramp — The underpriced bet. Six months of real reps often beat a spotless whiteboard.

Add the third axis: EQ. Empathy for customers and the problem space keeps elegant code from becoming premature cleverness. In a post‑AI shop, EQ prevents very smart teams from building the wrong thing faster.

Worked Examples (from live hiring signals)

  • Medium FSIQ • High GAI → Passes ~50% of puzzles, but chooses the 20% of work that drives 80% of impact. Proposes a cache instead of micro‑optimizing a substring routine.
  • High FSIQ • Medium GAI → Demolishes puzzles; ships a pristine but over‑scoped service. Improves dramatically when paired with a PM who enforces ruthless scope cuts.
  • Medium‑Medium • High Ramp → Starts slow; by month 6 has automated ops toil and removed two brittle systems.
  • All‑High → Extends an existing codebase with tests, then reframes the feature to halve maintenance.

5) What AI Automates vs. Where It Fails (Today)

Strong at:

  • Regurgitating canonical algorithms and patterns, with tests.
  • Explaining standard trade‑offs already in the literature.

Weaker at:

  • Remix: pulling a tool from a far domain and bending it coherently.
  • Prioritization: deciding what not to build; spotting leverage points.
  • Value‑laden judgment: making calls under ambiguity and human constraints.

If your loop selects for reproduction, you’re selecting against your edge.


6) The Monoculture Problem (and the FFT Cameo)

You can walk into a FAANG interview, drop an FFT‑based cross‑correlation, summon a Dirac delta spike, and look like a savage—and still fail. Why? Because the system isn’t selecting for creative challengers; it’s selecting for meek, obedient, controllable candidates who align with the answer sheet.

Once upon a time we rejected candidates who regurgitated the textbook. A decade later, we inverted the rule. Predictable results followed: employment diversity shrank; most talent concentrates on propping up monopolies—the same ones laying off by the thousands. Personally empowering projects (BitTorrent, early P2P, the wild hope of the early internet) got crowded out by engagement-max platforms.

If you optimize for answer-sheet obedience, you won’t just miss creative people—you’ll also ship AI initiatives that look busy but don’t move the needle.


7) Incentives and Outcomes

It’s not a skill shortage; it’s an incentive design problem:

  • Businesses optimize for speed, conformity, and answer‑sheet alignment—not dwell time, synthesis, or dissent.
  • MIT (2025): ~95% of GenAI pilots show no measurable P&L impact—workflow/integration failure.
  • Gartner (2025): >40% of agentic AI projects likely canceled by 2027—cost, governance, unclear value.
  • As Inc. (Sep 2025) frames it: organizations love innovation but hate innovators—celebrated after success, marginalized during the messy middle.

8) A Better Loop: Measure What Matters Now

Design your interview to surface judgment, transfer, and empathy. Adopt this template tomorrow:

Keep a 10–15 min fundamentals check (yes, reverse a linked list); then spend the bulk on judgment, transfer, and empathy:

A. Frame & Hypothesize (20–30 min)

Give a fuzzy, real constraint (throughput target, capex/opex bounds, compliance wrinkle). Ask:

  • What would you build, and why?
  • What would you not build yet?
  • What are the fastest falsifiable assumptions?

Score on: clarity of goals, risk‑based prioritization, cost sense, ability to say “no.”

B. Remix from Other Domains (15–20 min)

Prompt: “Borrow a tool from an unrelated domain to de‑risk this.” (Examples: search ranking for abuse triage, error‑correcting codes for data repair, control theory for autoscaling.)

Score on: relevance of analogy, limits awareness, pragmatic adaptation.

C. Relentlessly Simplify for Prod (10–15 min)

Take a shiny architecture and ask the candidate to remove components until a 2‑sprint MVP remains.

Score on: bias to leverage, ruthless scope cuts that preserve value.

D. Run Customer & Ethics Pass (10 min)

Who gets harmed by the “optimal” solution? What usability or fairness risks appear? What would you instrument first?

Score on: EQ, foreseeability, experiment design.

E. Code in Context (with Tests) (25–35 min)

Small, relevant task (e.g., build a thin vertical slice; extend an existing codebase with a constraint). Pair for 10 minutes on tests and interfaces. No trick puzzles.

Score on: taste in interfaces, test sensibility, communication.

Optional: 24–48h Take‑Home (Strictly Scoped)

A single‑evening assignment with clear rubric, paid when possible. Prioritize explanation over lines of code.

How to use this: Run A–E in one loop or split across panels; weight dimensions to match role (infra vs product).


9) Hiring Rubric (0–3 Scale)

Scale: 0 = Miss (doesn’t demonstrate the skill) · 1 = Basic (meets minimum) · 2 = Strong (clear, job‑ready skill) · 3 = Exceptional (teaches others; shows taste/judgment).

A ‘3’ is rare; repeated ‘3’s on a dimension are promotion-level signals.

Bar guidance: IC3/“mid” ≈ 9–11 aggregate · Senior ≈ 12–14 · Staff+ ≈ 15–18 (teams may weight dimensions differently).

Dimension 0 — Miss 1 — Basic 2 — Strong 3 — Exceptional
Problem Framing Restates prompt only Lists tasks without goals Identifies constraints & risks Prioritizes, defines success metrics, proposes falsifiable plan
Remix & Transfer No analogy Forced/fragile analogy Plausible cross‑domain borrow Coherent adaptation incl. limits & rollback plan
Prioritization Builds everything Hand‑waves scope cuts Cuts with rationale Ruthless MVP that preserves value & learning
Customer/EQ Ignores users Mentions personas Flags risks & usability pitfalls Surfaces trade‑offs, mitigation, and measurement plan
Code & Tests Compiles only Solves toy; minimal tests Idiomatic; focused tests Extendable; instrumented; tests that pin behavior & regressions
Communication & Collaboration Disorganized or defensive Understandable but rigid Clear, collaborative, invites critique Persuasive, crisp, adjusts in real time without losing rigor

10) Disclaimer (Because Basics Still Matter)

None of this absolves candidates from knowing fundamentals. You should be able to reverse a linked list live and reason clearly about time/space. The critique is about weights: today’s loops reward Family Feud–style trivia far more than creative judgment.


11) Conclusion

Dijkstra himself probably wouldn’t pass a FAANG interview today—not for lack of brilliance, but because brilliance needs dwell time and freedom to synthesize. Our loops optimize for speed and compliance. Our companies reward conformity and call it rigor.

Yes, AI can regurgitate the textbook answer instantly—that’s the point. Where it still struggles is the remix: pulling from far domains, reframing problems, and deciding what actually matters without an answer sheet. That’s where human GAI + EQ live.

That’s why your edge lives where FSIQ tapers off and GAI + EQ take over.

If the interview is a zoo, most teams aren’t actually looking for the tiger. They’re looking for the housecat.

Measure what machines can’t. Hire for judgment. Design for dwell.


Appendix: Quick Prompts You Can Try Tomorrow

  • “You have $50k and 6 weeks to improve onboarding conversion by 10%. What do you test first and why?”
  • “Redesign our abuse pipeline for a 10× spike with a 2× budget. What do you not build?”
  • “Pick a failure we’ve had. Show me two root causes from different domains (org/process/infra/product).”

References

  • GAI/FSIQ definitions: Pearson Clinical, Wechsler GAI Overview (WISC‑IV/WAIS‑IV). Confirms GAI is derived from Verbal Comprehension + Perceptual Reasoning and reduces the influence of Working Memory & Processing Speed.
  • Clinical perspective on GAI vs FSIQ: Kahalley et al., Utility of the General Ability Index and Cognitive Proficiency Index as Predictors of Academic and Psychosocial Outcomes (2016, PMC). Notes GAI is less influenced by WMI/PSI than FSIQ.
  • Interview escalation trend: WIRED, Why Tech Job Interviews Became Such a Nightmare (Mar 2024). Reports rising difficulty/intensity of tech interviews post‑pandemic/layoffs.
  • Alternative interview model: Business Insider, Former Stripe CTO shares the company’s technical interview process — and it doesn’t include a whiteboard (Aug 2025). Laptop‑based, realistic coding vs. whiteboards.
  • Enterprise AI outcomes: Fortune/Yahoo Finance coverage of MIT report (Aug 2025): ~95% of GenAI pilots show no measurable P&L impact; failures tied to workflow integration, not models.
  • Agentic AI cancellations forecast: Reuters on Gartner (Jun 2025): >40% of agentic AI projects expected to be scrapped by 2027 due to cost/governance/ROI.
  • On innovators vs. innovation: Inc. (Sep 2025), Jeff DeGraff, Why Organizations Love Innovation, but Hate Their Innovators.

Companion post: the FFT/Dirac‑delta parody lives on LinkedIn; this essay is the payload. If you came from that post: welcome to the part where we build better loops.