AI, MCP, and the Hidden Prices of Knowledge Hoarding – O’Reilly

The Mannequin Context Protocol (MCP) is genuinely helpful. It offers individuals who develop AI instruments a standardized method to name capabilities and entry information from exterior techniques. As a substitute of constructing customized integrations for every information supply, you may expose databases, APIs, and inside instruments by means of a standard protocol that any AI can perceive.

Nonetheless, I’ve been watching groups undertake MCP over the previous yr, and I’m seeing a disturbing sample. Builders are utilizing MCP to shortly join their AI assistants to each information supply they will discover—buyer databases, help tickets, inside APIs, doc shops—and dumping all of it into the AI’s context. And since the AI is sensible sufficient to kind by means of an enormous blob of knowledge and pick the elements which might be related, all of it simply works! Which, counterintuitively, is definitely an issue. The AI cheerfully processes huge quantities of knowledge and produces affordable solutions, so no one even thinks to query the strategy.

That is information hoarding. And like bodily hoarders who can’t throw something away till their houses change into so cluttered they’re unliveable, information hoarding has the potential to trigger critical issues for our groups. Builders be taught they will fetch much more information than the AI wants and supply it with little planning or construction, and the AI is sensible sufficient to cope with it and nonetheless give good outcomes.

When connecting a brand new information supply takes hours as an alternative of days, many builders don’t take the time to ask what information truly belongs within the context. That’s how you find yourself with techniques which might be costly to run and unimaginable to debug, whereas a whole cohort of builders misses the possibility to be taught the crucial information structure abilities they should construct strong and maintainable functions.

How Groups Study to Hoard

Anthropic launched MCP in late 2024 to offer builders a common method to join AI assistants to their information. As a substitute of sustaining separate code for connectors to let AI entry information from, say, S3, OneDrive, Jira, ServiceNow, and your inside DBs and APIs, you utilize the identical easy protocol to offer the AI with all types of knowledge to incorporate in its context. It shortly gained traction. Corporations like Block and Apollo adopted it, and groups all over the place began utilizing it. The promise is actual; in lots of circumstances, the work of connecting information sources to AI brokers that used to take weeks can now take minutes. However that pace can come at a price.

Let’s begin with an instance: a small group engaged on an AI instrument that reads buyer help tickets, categorizes them by urgency, suggests responses, and routes them to the suitable division. They wanted to get one thing working shortly however confronted a problem: They’d buyer information unfold throughout a number of techniques. After spending a morning arguing about what information to drag, which fields have been essential, and how you can construction the combination, one developer determined to simply construct it, making a single getCustomerData(customerId) MCP instrument that pulls the whole lot they’d mentioned—40 fields from three totally different techniques—into one huge response object. To the group’s reduction, it labored! The AI fortunately consumed all 40 fields and began answering questions, and no extra discussions or selections have been wanted. The AI dealt with all the brand new information simply high quality, and everybody felt just like the venture was heading in the right direction.

Day two, somebody added order historical past so the assistant may clarify refunds. Quickly the instrument pulled Zendesk standing, CRM standing, eligibility flags that contradicted one another, three totally different identify fields, 4 timestamps for “final seen,” plus complete dialog threads, and mixed all of them into an ever-growing information object.

The assistant stored producing reasonable-looking solutions, at the same time as the information it ingested stored rising in scale. Nonetheless, the mannequin now needed to wade by means of 1000’s of irrelevant tokens earlier than answering easy questions like “Is that this buyer eligible for a refund?” The group ended up with an information structure that buried the sign in noise. That extra load put stress on the AI to dig out that sign, resulting in critical potential long-term issues. However they didn’t understand it but, as a result of the AI stored producing reasonable-looking solutions. As they added extra information sources over the next weeks, the AI began taking longer to reply. Hallucinations crept in that they couldn’t monitor right down to any particular information supply. What had been a extremely useful instrument turned a bear to keep up.

The group had fallen into the information hoarding entice: Their early fast wins created a tradition the place folks simply threw no matter they wanted into the context, and ultimately it grew right into a upkeep nightmare that solely bought worse as they added extra information sources.

The Abilities That By no means Develop

There are as many opinions on information structure as there are builders, and there are normally some ways to resolve anybody downside. One factor that just about everybody agrees on is that it takes cautious selections and many expertise. However it’s additionally the topic of a number of debate, particularly inside groups, exactly as a result of there are such a lot of methods to design how your software shops, transmits, encodes, and makes use of information.

Most of us fall into just-in-case considering at one time or one other, particularly early in our careers—pulling all the information we’d presumably want simply in case we want it fairly than fetching solely what we want once we really need it (which is an instance of the other, just-in-time considering). Usually once we’re designing our information structure, we’re coping with quick constraints: ease of entry, dimension, indexing, efficiency, community latency, and reminiscence utilization. However once we use MCP to offer information to an AI, we will usually sidestep lots of these trade-offs…briefly.

The extra we work with information, the higher we get at designing how our apps use it. The extra early-career builders are uncovered to it, the extra they be taught by means of expertise why, for instance, System A ought to personal buyer standing whereas System B owns cost historical past. Wholesome debate is a vital a part of this studying course of. By way of all of those experiences, we develop an instinct for what “an excessive amount of information” seems like—and how you can deal with all of these difficult however crucial trade-offs that create friction all through our initiatives.

MCP can take away the friction that comes from these trade-offs by letting us keep away from having to make these selections in any respect. If a developer can wire up the whole lot in only a few minutes, there’s no want for dialogue or debate about what’s truly wanted. The AI appears to deal with no matter information you throw at it, so the code ships with out anybody questioning the design.

With out all of that have making, discussing, and debating information design selections, builders miss the possibility to construct crucial psychological fashions about information possession, system boundaries, and the price of transferring pointless information round. They spend their adolescence connecting as an alternative of architecting. That is one other instance of what I name the cognitive shortcut paradox—AI instruments that make growth simpler can stop builders from constructing the very abilities they should use these instruments successfully. Builders who rely solely on MCP to deal with messy information by no means be taught to acknowledge when information structure is problematic, similar to builders who rely solely on instruments like Copilot or Claude Code to generate code by no means be taught to debug what it creates.

The Hidden Prices of Knowledge Hoarding

Groups use MCP as a result of it really works. Many groups fastidiously plan their MCP information structure, and even groups that do fall into the information hoarding entice nonetheless ship profitable merchandise. However MCP continues to be comparatively new, and the hidden prices of knowledge hoarding take time to floor.

Groups usually don’t uncover the issues with an information hoarding strategy till they should scale their functions. That bloated context that hardly registered as a price to your first hundred queries begins displaying up as an actual line merchandise in your cloud invoice whenever you’re dealing with thousands and thousands of requests. Each pointless discipline you’re passing to the AI provides up, and also you’re paying for all that redundant information on each single AI name.

Any developer who’s handled tightly coupled lessons is aware of that when one thing goes mistaken—and it all the time does, ultimately—it’s loads more durable to debug. You usually find yourself coping with shotgun surgical procedure, that basically disagreeable scenario the place fixing one small downside requires modifications that cascade throughout a number of elements of your codebase. Hoarded information creates the identical sort of technical debt in your AI techniques: When the AI offers a mistaken reply, monitoring down which discipline it used or why it trusted one system over one other is tough, usually unimaginable.

There’s additionally a safety dimension to information hoarding that groups usually miss. Every bit of knowledge you expose by means of an MCP instrument is a possible vulnerability. If an attacker finds an unprotected endpoint, they will pull the whole lot that instrument offers. When you’re hoarding information, that’s your complete buyer database as an alternative of simply the three fields truly wanted for the duty. Groups that fall into the information hoarding entice discover themselves violating the precept of least privilege: Functions ought to have entry to the information they want, however no extra. That may carry an infinite safety danger to their complete group.

In an excessive case of knowledge hoarding infecting a whole firm, you would possibly uncover that each group in your group is constructing their very own blob. Help has one model of buyer information, gross sales has one other, product has a 3rd. The identical buyer seems utterly totally different relying on which AI assistant you ask. New groups come alongside, see what seems to be working, and duplicate the sample. Now you’ve bought information hoarding as organizational tradition.

Every group thought they have been being pragmatic, delivery quick, and avoiding pointless arguments about information structure. However the hoarding sample spreads by means of a company the identical approach technical debt spreads by means of a codebase. It begins small and manageable. Earlier than you recognize it, it’s all over the place.

Sensible Instruments for Avoiding the Knowledge Hoarding Lure

It may be actually tough to educate a group away from information hoarding once they’ve by no means skilled the issues it causes. Builders are very sensible—they wish to see proof of issues and aren’t going to take a seat by means of summary discussions about information possession and system boundaries when the whole lot they’ve performed thus far has labored simply high quality.

In Studying Agile, Jennifer Greene and I wrote about how groups resist change as a result of they know that what they’re doing right now works. To the individual making an attempt to get builders to alter, it might appear to be irrational resistance, nevertheless it’s truly fairly rational to push again in opposition to somebody from the surface telling them to throw out what works right now for one thing unproven. However similar to builders ultimately be taught that taking time for refactoring speeds them up in the long term, groups have to be taught the identical lesson about deliberate information design of their MCP instruments.

Listed here are some practices that may make these discussions simpler, by beginning with constraints that even skeptical builders can see the worth in:

Construct instruments round verbs, not nouns. Create checkEligibility() or getRecentTickets() as an alternative of getCustomer(). Verbs drive you to consider particular actions and naturally restrict scope.
Speak about minimizing information wants. Earlier than anybody creates an MCP instrument, have a dialogue about what the smallest piece of knowledge they should present for the AI to do its job is and what experiments they will run to determine what the AI really wants.
Break reads aside from reasoning. Separate information fetching from decision-making whenever you design your MCP instruments. A easy findCustomerId() instrument that returns simply an ID makes use of minimal tokens—and won’t even have to be an MCP instrument in any respect, if a easy API name will do. Then getCustomerDetailsForRefund(id) pulls solely the particular fields wanted for that call. This sample retains context centered and makes it apparent when somebody’s making an attempt to fetch the whole lot.
Dashboard the waste. One of the best argument in opposition to information hoarding is displaying the waste. Monitor the ratio of tokens fetched versus tokens used and show them in an “info radiator” type dashboard that everybody can see. When a instrument pulls 5,000 tokens however the AI solely references 200 in its reply, everybody can see the issue. As soon as builders see they’re paying for tokens they by no means use, they get very enthusiastic about fixing it.

Fast scent take a look at for information hoarding

Device names are nouns (getCustomer()) as an alternative of verbs (checkEligibility()).
No one’s ever requested, “Do we actually want all these fields?”
You possibly can’t inform which system owns which piece of knowledge.
Debugging requires detective work throughout a number of information sources.
Your group hardly ever or by no means discusses the information design of MCP instruments earlier than constructing them.

Trying Ahead

MCP is an easy however highly effective instrument with monumental potential for groups. However as a result of it may be a critically vital pillar of your complete software structure, issues you introduce on the MCP degree ripple all through your venture. Small errors have enormous penalties down the highway.

The very simplicity of MCP encourages information hoarding. It’s a simple entice to fall into, even for skilled builders. However what worries me most is that builders studying with these instruments proper now would possibly by no means be taught why information hoarding is an issue, and so they received’t develop the architectural judgment that comes from having to make laborious selections about information boundaries. Our job, particularly as leaders and senior engineers, is to assist everybody keep away from the information hoarding entice.

Whenever you deal with MCP selections with the identical care you give any core interface—maintaining context lean, setting boundaries, revisiting them as you be taught—MCP stays what it needs to be: a easy, dependable bridge between your AI and the techniques that energy it.

Supply hyperlink