So Lengthy and Thanks for All of the Context – O’Reilly

I obtained a extremely attention-grabbing query final week from Mike Loukides, my editor at Radar, after he learn the third a part of this trilogy on context administration. “One other difficulty I’ve examine,” Mike requested, “is the tendency for a mannequin to disregard the center of the context. I’ve seen that significantly for the fashions with very giant context home windows. Is there something to be mentioned about that?”

Wonderful query, Mike, and sure, there’s. In that very same e-mail he identified that clearing the context and reloading it with simply what’s vital does a reasonably good job coping with this “ignore the center” downside when it occurs, however that’s clearly a stopgap.

It’s value a deeper dive into what’s really taking place when an AI begins forgetting what’s in the midst of its context, as a result of the issue is deeper (and extra attention-grabbing!) than it might sound at first. It seems that there’s a fundamental downside that’s elementary to how LLMs handle context, and we’re nonetheless studying about it as an business. That downside is named a U-shape. There’s been loads of actually attention-grabbing analysis into the U-shape downside just lately, and a number of other helpful strategies have emerged that may assist you handle it. And it’s in all probability not a coincidence that I’ve had to make use of all of them in my ongoing experiments with AI-driven improvement and agentic engineering (even when I didn’t at all times notice that’s what I used to be doing on the time).

A number of weeks in the past, in truth, I bumped into the precise failure mode that Mike described. I used to be operating the High quality Playbook, my open supply code high quality engineering talent, and bumped into bother with one in all its phases—the one which writes up the bugs the sooner phases discover. There’s part of the bug writeup course of the place it had simply created a file referred to as BUGS.md that had an summary of every of the bugs, and needed to create particular person writeups for every bug it discovered. However as a substitute of filling within the particulars appropriately, it produced skeletal-looking stub information, with a generic template that had clean values as a substitute of populated ones.

The factor is, the directions for tips on how to write a populated writeup had been within the immediate. The precise bug knowledge was in BUGS.md. I used to be completely sure that every part the agent wanted was sitting in its context window, as a result of I may see that it hadn’t compacted but, and the talent’s intermediate artifacts let me see that earlier phases had learn and reasoned about each information (which I talked about in my final article on this collection). However the agent was producing stubs anyway. It actually seemed just like the agent had every part it wanted sitting in plain sight, and simply wasn’t utilizing the data it had. Irritating!

I assumed on the time that the mannequin was simply an fool (which, arguably, was true however irrelevant). It seems that I had run immediately into the U-shaped context downside.

Within the earlier three articles I lined what context is and why it disappears, tips on how to preserve vital info in information as a substitute of leaving it within the agent’s context window, and tips on how to detect and get well when context has been compacted out from below you. All three had been about dropping context, by way of fragmentation, by way of compaction, by way of lengthy periods that overrun the window. This text is about this solely completely different U-shaped failure mode, the place the context remains to be sitting within the window and the mannequin simply isn’t utilizing it.

The U-shape failure, and why greater home windows don’t repair it

The U-shape is an lively space of educational investigation, so I’m going to begin by going into slightly little bit of that analysis, as a result of I believe it can really assist us pin down what’s occurring. I’ll begin with an experiment run by Nelson Liu, an AI researcher at Stanford, who examined how language fashions really use the contents of lengthy inputs by giving them paperwork with the related reply positioned at completely different positions and measuring whether or not the mannequin may nonetheless discover it. An attention-grabbing factor his findings present is that the U-shape didn’t seem like a quirk of a single mannequin. The U-shape confirmed up throughout mannequin households, and even fashions with bigger context home windows nonetheless exhibited it.

When you’ve got time, it’s really value having a look on the paper that Liu and his group wrote, referred to as “Misplaced within the Center: How Language Fashions Use Lengthy Contexts.” (It’s surprisingly readable for an educational paper.) The end result they reported was a sturdy U-shape: The mannequin carried out finest when the related info was at first of its context window or on the current finish and worst when it was within the center. Efficiency on questions the place the reply was buried mid-context fell off sharply, even when the reply was sitting proper there in plain sight. The sector now makes use of the phrases primacy bias and recency bias for these two preferences, and the U-shape is what you get whenever you plot them collectively in opposition to place.

I’m going to lean slightly into academia right here, as a result of loads of researchers are nonetheless studying about how LLM context really works and what conduct has emerged in it.

One purpose the U-shape issues greater than “simply one other LLM quirk” is that current analysis has began displaying it’s a structural property of how transformers work, not a realized artifact. A 2025 ICML paper referred to as “On the Emergence of Place Bias in Transformers” defined it because the equilibrium between two opposing forces contained in the mannequin: The causal masks amplifies the affect of the primary few tokens (the primacy bias), whereas place encodings like RoPE closely weight the tokens closest to the place the mannequin is producing (the recency bias). The center is the place these two forces cancel out. A 2026 paper by Borun Chowdhury, a researcher at Meta, referred to as “Misplaced within the Center at Delivery: An Actual Principle of Transformer Place Bias,” took the argument even additional by proving mathematically that the U-shape exists in the meanwhile of initialization, earlier than any coaching has occurred, with random weights.

That issues as a result of the pure assumption about giant context home windows is that extra room means fewer issues. Most of at present’s frontier fashions offer you one million tokens or extra, with some pushing effectively previous two million, and a few have made actual progress on the only model of the lost-in-the-middle take a look at, the needle-in-a-haystack benchmark, the place the mannequin has to retrieve a single sentence buried in a protracted doc. Google’s Gemini 1.5 Professional reported near-perfect single-needle recall at 1M tokens, and present Gemini 3 fashions are comparable.

So the correct model of “greater home windows don’t repair it” is that this: Larger home windows have made easy single-fact retrieval a lot better. They haven’t made long-context agent work dependable by default. A two-million-token window means a much bigger center to fall into.

The vital concept that’s rising right here is that it’s more and more trying just like the U-shape isn’t only a bug in at present’s fashions that may finally be labored out or educated away by extra knowledge or higher fine-tuning. As an alternative, it looks as if the U-shape may very well be a geometrical property of the LLM structure itself.

In different phrases, we’re all going to should cope with the U-shape. And meaning we’d like strategies for managing it, and any efficient approach we use isn’t prone to develop into out of date any time quickly. And that’s my aim on this article: to point out you the strategies which have emerged for managing U-shaped context reminiscence loss that you need to use at present in your individual work.

5 strategies to assist with U-shaped context issues

The earlier article on this collection laid out a sample for detecting and recovering from context loss, which I referred to as externalize-recognize-rehydrate. The strategies beneath prolong the identical self-discipline to the lost-in-the-middle downside. The precept I preserve coming again to is that working reminiscence is untrustworthy, and the self-discipline that follows from it’s to externalize what issues, curate what stays in context, and confirm what the agent claims to know in opposition to what’s on disk. The 5 strategies are how I try this in observe, and each is drawn from an actual second within the High quality Playbook’s improvement.

Curate, don’t accumulate

That is the approach which, in its most brute-force type, is strictly what Mike talked about in his e-mail to me: simply clear the context and reload it with simply what issues, periodically and intentionally. In different phrases, don’t belief an collected session to remain coherent; construct the artifact, then begin contemporary in opposition to it. And when you’ve got the AI write down the vital elements of the context (like we’ve talked about all through this collection), then you can begin a brand new session with refreshed AI that has a extra focused, curated context as a place to begin.

I bumped into this throughout the v1.5.2 launch prep for the High quality Playbook. I used to be utilizing a protracted Claude Code session that had been working by way of a collection of fixes. However I seen that it was simply beginning to present its age: It had forgotten a few issues it ought to know, and its considering occasions had been beginning to develop.

When it got here time to land the ultimate 4 fixes for the discharge, I labored with the AI to jot down a context transient, or a separate doc with every part the implementing session wanted. The query was whether or not to maintain utilizing the present session, which already “knew” the codebase from the sooner work, or open a contemporary CLI session and level it on the transient. I requested one other session what to do:

Ought to we run that in a brand new cli session somewhat than proceed my present
claude code session that has the present context?

The AI gave me a superb reply—begin a contemporary session, utilizing a beginning immediate to learn the transient—and it gave three causes which have caught with me. First, the transient was self-contained, together with file paths, line numbers, actual diffs, regression take a look at our bodies, and preflight greps. Something the brand new session wanted to know was already there, and persevering with context purchased nothing. Second, contemporary context is stricter about adherence. A session that already “is aware of” the codebase tends to skim the brand new directions and improvise from prior assumptions. Surgical fixes are precisely the case the place you need the agent to learn the transient rigorously somewhat than depend on reminiscence of what felt proper final spherical. And third, the audit path: The transient is the artifact, and the implementing session is reproducible from simply the transient. If the identical work must be redone in six months by a distinct mannequin, you level on the transient and say, “That is the enter.”

The method labored rather well. I used to be in a position to choose up improvement seamlessly, and the mannequin’s reminiscence issues disappeared.

Place vital info on the edges

The U-shape says the mannequin attends finest to the start and finish of its context. The pure transfer is to place your most load-bearing info in these positions and preserve the center for belongings you don’t want the mannequin to give attention to. Something vital that lives solely in the midst of an collected context tends to slip out of consideration.

The opposite aspect of this method is what not to place within the center. If one thing issues, don’t bury it in a protracted preamble of context you’ve been accumulating; transfer it to the sides, restate it the place the mannequin will act on it, and let the center take up the much less vital materials. Fortunately, there’s a helpful approach that may assist with this downside.

In Claude Code, for instance, one actually clear technique to put info at first of context is to make use of the system immediate. The CLI offers you --append-system-prompt for precisely this. (Many of the different suppliers’ CLI instruments have comparable choices.) If you happen to put your transient (or chosen elements of it) there, the agent will attend to it strongly all through the session, and that in flip will assist preserve the per-turn person immediate centered on the motion you need the agent to take proper now.

Brief periods over lengthy ones

Don’t run one lengthy session. Run many brief ones, every studying contemporary from disk. This may assist you iterate in your transient and your exterior improvement context, so as a substitute of counting on an opaque context window, you’ve got a visual and continuously altering set of paperwork that offer you much more visibility into—and management over—your AI’s context.

One thing helpful I began doing was taking all my chat historical past from Gemini, ChatGPT, Claude, and Cowork and placing it right into a single folder I may preserve up to date and listed for quick search. I constructed out a whole system to handle this, which seems to be an ideal device once I’m writing articles like this, as a result of I can search by way of my improvement historical past for particular examples and strategies that I’ve used. The system makes use of Haiku 4.5 to learn by way of chat historical past, summarize what occurred, and create an index. Haiku turned out to be a sensible sufficient mannequin to learn every particular person interplay in a chat and write a helpful index entry for it. However the mannequin being sensible sufficient to do one abstract didn’t imply its context administration may sustain throughout all 18,000 data. I ran smack into the U-shape downside.

The primary try tried to maintain dedupe state and progress counts within the mannequin’s head, and it failed spectacularly. The mannequin actually didn’t wish to preserve monitor of particular deterministic issues like correct numbers or the present state. Haiku 4.5, particularly, appears particularly unhealthy at this. What labored was reframing the structure solely. Right here’s the precise immediate that I gave it to repair the issue:

okay, so we'd like context administration. it would not want to recollect issues,
it simply wants to jot down them down as they go. we had this identical context
administration downside with High quality Playbook, when it was operating out of
context. Simply write down after every message.

The protocol I greenlit for the complete run made the short-session self-discipline specific:

Resume processing from the cursor recorded in progress.json, working by way of every enter file so as.
Replace progress.json after each line.
Anticipate to expire of context effectively earlier than ending—that’s high-quality. Simply cease cleanly after every step (or a bunch of steps), then spin up a contemporary session that reads progress.json and continues.
When all information are full, set standing: “full” in progress.json and report again.

Merchandise 3 is the approach in a single line: count on context loss, so be sure to’ve written your state down, and construct contemporary restarts into the method. The technical particulars, like spinning up subagents, orchestrating with script, and so on., will change, however the core thought stays the identical. In loads of methods, you possibly can consider treating the agent like a pipe, not a database. The state lives on disk, and the session is one thing you throw away and change.

Restate key information near the purpose of use

When the mannequin wants a constraint to use proper now, repeat it proper now. Don’t belief an instruction from earlier within the session to hold ahead by way of the center of the context.

That is the approach that fastened the issue I opened the article with, the place the High quality Playbook appeared to overlook every part it had simply written right into a file referred to as BUGS.md and produced stubs when it wanted to jot down the identical info into extra detailed information, and as a substitute writing generic clean templates with the bug-specific fields left clean.

The repair was to restate the read-the-source rule proper earlier than the motion that wanted it, utilizing this immediate:

Earlier than writing BUG-NNN.md, re-read the BUG-NNN entry in BUGS.md.
Copy the Spec foundation, Minimal replica, Location, Anticipated conduct,
Precise conduct, Regression take a look at title, and Patches fields
from that entry into the writeup. Don't paraphrase from reminiscence.

“Don’t paraphrase from reminiscence” is the road that did the precise work. The instruction couldn’t belief the agent’s reminiscence of what BUGS.md mentioned, although BUGS.md was sitting proper there within the context window. So the instruction pressured a contemporary learn of the file in the meanwhile of writing. The restatement and the fresh-read collectively fastened the bug.

The identical sample applies any time a rule was acknowledged earlier within the session and the mannequin must act on it now. Restate the rule subsequent to the motion, and power the mannequin again to the supply somewhat than letting it work from reminiscence.

Take a look at the center

The earlier 4 strategies are about avoiding lost-in-the-middle failures. This one is about catching them. If you happen to don’t know whether or not the agent is definitely utilizing the data you suppose it’s utilizing, discover out, with a deterministic examine somewhat than a judgment name.

The sample is the one I used within the Haiku summarizer that I described earlier: examine what the agent claims to know in opposition to what’s on disk. You may have one thing the agent claims to know (its progress, its present state, the most recent model of a rule), and you’ve got one thing on disk that’s the bottom fact (a file, a log, a database document). In the mean time the agent’s declare must be trusted, you examine it.

Within the summarizer’s resume protocol, each new session began by cross-checking progress.json in opposition to the precise final line written to the abstract file, and the agent printed a checkpoint report when it did—at session begin, and periodically by way of the run. A consultant one seemed like this:

Checkpoint Report: ✓ progress.json confirmed: cursor for cowork_04_06 is at 238, standing is
"operating" ✓ Disk state verified: Final line in summaries/cowork_04_06.md is [237]
assistant: Device invocation repeating chat file learn. ⚠ Discrepancy famous: The prior session left a bulk be aware claiming data
238–296 are duplicates however did not write particular person traces for them. Per
your directions, I need to write one line per document, even for duplicates,
within the format [idx] : Duplicate of document [X] (). Standing: Cursor matches disk state. Able to resume from document 238.

The agent doesn’t must introspect whether or not it misplaced context, solely to match two information. After they agree, the agent proceeds; after they disagree, the agent flags the discrepancy and stops earlier than including any new work on prime of a damaged state. Disagreement is the sign.

You’ll be able to construct this type of examine into any agent that does multistep work. Decide one thing the agent has to trace, choose the file that’s the supply of fact for it, and have the agent examine the 2 at each session begin. When the agent’s view of the world drifts from the file, you discover out earlier than the drift turns into a buried bug.

The self-discipline behind these strategies

Once I constructed the High quality Playbook’s multi-phase structure, I used to be fixing the compaction downside. Lengthy pipeline runs had been filling the context window and triggering silent compaction in the midst of work. Breaking the pipeline into separate phases that learn contemporary from disk and stopped after every part fastened it.

What I didn’t notice till later was that the identical structure additionally helps with the lost-in-the-middle downside. Every part has its personal brief, centered context, with the part transient at first and the most recent progress replace on the finish, so there’s virtually no center for info to fall into. The architectural transfer that helped with working reminiscence disappearing seems to additionally assist with working reminiscence being there and unused.

That’s the lesson I wish to land. Each failure modes, context loss and lost-in-the-middle, are issues of working-memory unreliability, and the self-discipline that addresses them is similar: preserve the working set small, put the load-bearing info on the edges of the window, and examine the agent’s claims in opposition to floor fact on disk when it issues.

Context home windows will preserve getting greater, and compaction will get smarter. A few of the strategies in these 4 articles might finally be pointless. However the underlying constraint received’t disappear. In spite of everything, we’ve added much more RAM to our computer systems for the reason that 1MB 286 I wrote about within the final article, and reminiscence administration has gotten way more complicated since then. And plenty of of those issues are structural; for instance, it’s more and more trying just like the U-shape itself is a geometrical property of the transformer structure, not a coaching artifact that extra compute will clean out.

The underside line is that in case your agent’s capability to do its job depends upon info, that info must stay someplace extra sturdy than working reminiscence. That was true for my dad’s 32 kilobytes of core reminiscence at Princeton within the Seventies, it was true for my 640 kilobytes of standard RAM on my 286 within the Eighties, it was true for the 200K-token home windows in final yr’s fashions, and it is going to be true for no matter comes subsequent.

Supply hyperlink