The tragedy of AI memory
The Context Rot Problem
I was excited when ChatGPT introduced memory. I even wrote about it here- and for a while, I loved using it.
The thesis is simple to love - what if an AI doesn’t just know everything you’ve just told them in one single thread of conversation- but remembers all previous threads? I’ve already written some questions you could ask such an AI with memory enabled, and they all hold up fine.
But today I’m less excited about memory than I was
What if it’s really like the sci fi notion of AIs we grew up reading about - where, just through conversations, an AI intimately knows you, understands you- far deeper than any human around you does, and it always gives you the best advice and suggestions.
Future breakthroughs in continual learning may enable such a future - so I remain optimistic, but current AI “memory” is not memory.
Rather, if you have it turned on, you’re actively getting a worse experience out of any model you’re using- than what you’d be getting with it turned off.
Here’s why.
II.
Today’s AI “memory” is not memory.
It’s rather a sophisticated form of context injection, most commonly implemented through Retrieval-Augmented Generation (RAG), a technique that I’ve written about in my previous essay - which all sounds elegant in principle but reveals its mechanical crudeness under scrutiny.
When you ask about ice cream shops in Hawaii, the system doesn’t “recall” your aversion to mint or your passion for strawberry.
Instead, it performs a vector search through your conversation history, embedding your past words into mathematical space and retrieving whatever clusters nearby. This is less like human memory and more like having a research assistant who, every time you ask a question, dumps a pile of photocopied diary pages on your desk, some relevant, most not, and expects you to sort through them while simultaneously answering the question.
In theory, the system should identify your flavor preferences, cross-reference them with Hawaiian establishments, and deliver a pristine recommendation.
For simple queries, this actually works tolerably well like I have previously mentioned.
But the moment you need the raw, crystalline intelligence of the model itself, the rot begins to show.
I also think we don’t really understand or agree upon a definition of “memory” - because in the case of LLMs - the expectations are different than what we expect of human memory. For LLMs, we expect, perhaps not naively as we’re used to using apps with billions of people and trillions of data points all organized nicely in SQL tables and fetched when needed, 100% infallible memory. Unlike humans, we don’t want the AI’s memories of the past to degrade because then - what’s the point? If you’re not going to tell me that a year from now, you remember that I watched Episode 1 of a show that I’ve now forgotten about - then the memory functionality is useless. LLM memory is not supposed to degrade like ours does with time.
Or maybe, it should degrade? Because obviously, things happening right now and in the recent past are important than what happened 3 years ago or 10 years ago - and if you have to pick which memories to keep fresh, you’d pick the recent ones over the past - with the obvious exception of important life events which shape us subtly despite being in the past.
This tension of whether LLM memories should degrade or whether it shouldn’t is a technological problem rather than a philosophical one - as in a world of infinite context and intelligent models, you can just pass in all memories with timestamps and let the model decide intelligently what’s relevant and what’s not.
Unfortunately, today’s implementation of the above problem is lazy.
III.
For some reason, ChatGPT seems to always include context of your recent chats if you have memory turned on, even if they’re irrelevant. It’s likely because when people are asking ChatGPT something, and they tend to open multiple chats, they’re probably talking about the same thing.
But what if they’re not? What if they’re talking about something different in different chats? What if I have a chat about a software related issue on one chat and in a new chat, I’m talking about a philosophical text? You’d expect the LLM to be smart enough to ignore the included context, but it often doesn’t - and it tries to reference it because it needs to.
Such a feature feels bolted on- because someone somewhere realized people are mostly talking about the same thing at the same time and thus - include the context of recent chats anyway regardless of what’s in the latest chat, as it’d both likely be relevant and provide the “magical” moment of memory sooner for people using it.
This brings me to the core problem of memory in general which is “Context Rot”. We all have seen charts where LLM performance degrades the more context they handle, but this problem is only apparent when you cross 32k tokens.
Below that - and most chat queries likely don’t cross that threshold. Even if you consider that a lot of context is being included from recent chats, relevant chats about the topic, general information about you, it’s unlikely to cross 32k tokens.
But context rot shouldn’t just refer to going beyond 32k tokens.
Anything irrelevant that’s not related to your query shouldn’t be there- and I think, the best way to do this is to engineer all the context yourself.
Sure, if you ask for help in game design - context about every chat you’ve had regarding that game or game design in general might be helpful, but I see it as actively harmful as 1) You don’t really remember what’s being included in the context, so - other than your direct query, you have no idea what else the LLM is seeing to influence its answer to you 2) You have no idea what is subtly influencing it more than you realise.
If you’ve forgotten that you’ve once told the AI that you hate platformers - but the game level you’re working on requires a platformer design, you’ll never get it with memory enabled.
With memory disabled, you probably will.
If you’re always engineering the context yourself - you might find subtle things you missed when the LLM perhaps mentions it doesn’t know about something in the first response and is guessing it (For example - Forgetting to put in your app’s pricing details when asking if your pricing is competitive with competitors)- at which point, you can easily start a new chat with the new information included so that the LLM doesn’t have to guess anymore.
But this raises an interesting product question, as consumers have grown to expect their AI apps to have some kind of memory - even if including memory makes the responses worse. Even if it’s lazy, you can go down the same route as everyone else and just bolt on a vector database implementation and call it a day- that would certainly be feasible. You could argue that letting users manually engineer their context is superior - but then you’d have to deal with both the UI complexity this entails + compete with other apps that promise seamless memory without the manual work.
I think the answer is somewhere in the middle, for now. Obviously, since we can agree that manually engineering context is better - you should include that option regardless of the clutter or complexity it brings, but you should try to make an effort to figure out a memory implementation that’s more suited for your specific use case. An AI app focused on legal may need a vector DB implementation where all case details matter and nothing is irrelevant if the meanings match, but a code editor may just need a different implementation of search where you’re making embeddings of syntax but rather using keywords to search for specific details across the codebase.
If you could do one or the other, or would rather bet on future AI’s getting better at memory than they currently are - I’d choose the approach that delivers the most transparency and control over one that’s objectively worse.
IV.
Now, one can argue that future AI memory will be smart enough to only include relevant context. That they’ll be so good at gathering context about your query that it’ll be stupid not to have it turned on. I can certainly imagine such a future where this is true.
But right now, every time I enable memory, I feel a severe loss of agency and a feeling that I’m not getting the intelligence of the model that I’m supposed to be getting. I feel like I’m sure there’s something in the context that’s subtly influencing the answer and that’s not what I want.
LLMs are highly probabilistic but they can be used somewhat deterministically depending on whether you have power of being the only context engineer - or whether there’s invisible context being added that you have no idea about.
There can be several easy solutions to this like I’ve mentioned above- maybe a chat picker where you search and include the context that you want to include, or maybe you can toggle different pieces of context off and on that an LLM has gathered, but for now, I think the best way to interact with an AI and leverage its full intelligence, without worrying about what context is drifting its response, is to add the context yourself.
In the future, if it’s proven that “LLMs agentically gather more relevant context more efficiently than humans” - then I’m willing to change this view.
Of course, a super intelligent LLM with a 1M context window - where it doesn’t degrade in performance with increasing context, might provide a better answer despite the irrelevant context it gathers simply because it has gathered so much data for your task that it’s even impossible for you to provide it the sufficient context manually. That future is probably not too far away.
But until then, I’ll keep my LLM memory off and engineer the context myself.



