I spent some time with Andrej Karpathy’s excellent “2025 LLM Year in Review” and I think it is one of the clearest windows into where large language models are actually headed. You can read the original post here: 2025 LLM Year in Review, by Andrej Karpathy Karpathy is writing for a broad technical audience, not specifically for commercial real estate. In this post I want to translate his key ideas into what they mean for CRE professionals, and especially for clients of CRE Agents. At a high level, his message is simple: LLMs in 2025 are extremely useful, very weird, and nowhere near their ceiling. Below is how I would summarize his article through a CRE lens, followed by the three biggest takeaways for CRE Agents clients . 1. LLMs Learned To “Play For Points,” Not Just “Sound Smart” Karpathy’s first and most important theme is the rise of Reinforcement Learning from Verifiable Rewards (RLVR) . Historically, LLMs were trained in three main stages: Pretraining on internet-scale text Supervised finetuning to follow instructions RLHF (Reinforcement Learning from Human Feedback) to be more helpful and safe In 2025, a new stage became central: RLVR , where models are trained against objective, automatically checkable rewards in domains like math and code. Instead of “please sound like a helpful assistant,” the model is pushed to “get this answer exactly correct” across millions of small environments. Over long runs, the model starts to develop behaviors we would casually call “reasoning”: Breaking problems into intermediate steps Trying multiple solution paths Checking and revising its own work Why this matters for CRE Agents clients RLVR is exactly the kind of training that makes AI useful for work like: Building and checking complex underwriting models Debugging Excel logic or code for internal tools Systematically exploring scenarios and sensitivities Practically, it means the “digital coworker” you use is not just parroting patterns from text, it has been hardened against tasks where there is a right and wrong answer. You still have to review its work, but the baseline quality and consistency keep climbing because the underlying models are being trained to win at games where correctness is rewarded directly. 2. “Ghosts,” Not “Animals” – Why LLMs Feel Brilliant And Broken At The Same Time Karpathy argues that we should stop thinking about LLMs as “robots getting smarter” and start thinking of them as “ghosts” we summon with text. Humans and LLMs are optimized for completely different things: Humans: survival and social success in the physical world LLMs: predicting text, solving verifiable tasks, and getting rewarded in synthetic environments The result is jagged intelligence : In some domains (coding, math puzzles, some forms of writing) they are shockingly capable In others (basic common sense, subtle security awareness, avoiding traps) they can be childlike or worse Karpathy notes that benchmarks are increasingly unreliable signals of “general intelligence.” Labs can overfit models to beat specific tests, which produces impressive scores without solving broad reasoning. Why this matters for CRE Agents clients You should treat LLMs as: Specialist savants in certain workflows (data extraction, modeling, coding, document drafting) Unreliable generalists in others (unguided judgment, security, unsupervised autonomy) For you, that implies: Use AI heavily where the work is structured, checkable, and repeatable : underwriting, data cleaning, pipeline maintenance, memo drafting, process documentation Keep a human in the loop where the work is ambiguous, political, or irreversible : investment committee decisions, capital partner communication strategy, key negotiation positions Karpathy’s “ghosts vs. animals” framing is a good safety rail. You are not raising a junior analyst who will “grow up.” You are operating a non-human intelligence that will always have sharp spikes and deep holes. 3. A New Layer: “Cursor For X” And The Rise Of Vertical AI Apps Karpathy highlights Cursor as a turning point: not just a wrapper around an LLM, but a new kind of product that: Engineers context for a specific domain Orchestrates multiple LLM calls in a directed acyclic graph (DAG) Provides a domain-specific UI Gives the user an “autonomy slider” This is exactly the pattern CRE Agents is built on, just pointed at commercial real estate instead of software development. Karpathy’s view is that: Foundation model labs will ship “generally capable college students” Vertical apps will organize them into professionals in specific industries by adding data, tools, feedback loops, and workflow logic Why this matters for CRE Agents clients You should expect your AI stack to look less like “one chat window” and more like: A CRE-native front end that understands cap rates, DSCR, lease-up, expense ratios, tax reassessments A tool orchestration layer that chains models with your spreadsheets, data rooms, email, and calendar A governed autonomy