AI Kept Forgetting My Notes. Fixing That Taught Me How It Actually Works.
A simple, experience-driven breakdown of context windows, tokens, retrieval, hallucination, and why AI behaves the way it does.
THE PROBLEM
Three weeks into learning machine learning, I ran into a problem. Not with models or math, but with my notes. I had taken the time to write things in my own words, build analogies that made sense to me, and note down questions I wanted to revisit. The problem wasn’t quality. It was structure.
My notes were scattered across different apps, formats, and styles. Some were in Notion, a few in Google Docs, and others buried in random text files. Nothing was consistent. Every time I sat down to study, I found myself spending the first twenty minutes just trying to reconstruct context. What had I already understood? Where had I left off? Which explanation had actually made sense last time? It felt like I was re-learning my own thinking before I could move forward.
At some point, I tried something that felt obvious. I opened an AI chat, pasted my notes in, and asked it to help me study. For a while, it worked better than expected. The responses were aligned with how I thought, and it felt like I finally had something that could adapt to me instead of the other way around.
For the first time, studying felt continuous instead of fragmented.
That illusion didn’t last very long.
When It Started Breaking
The problems didn’t show up all at once. At first, things felt smooth enough that I didn’t question it. The AI was using my notes, explaining things in ways that made sense, and saving me time. It felt like the system was working.
Then small inconsistencies started creeping in.
Occasionally it would explain something using an example I didn’t recognize. Other times it would skip over details I was sure I had written down. It wasn’t entirely wrong, just slightly off. It is easy to ignore at first, but noticeable if you pay attention.
I assumed it was just me. Maybe I hadn’t phrased something clearly. Maybe I had forgotten what I wrote.
Then one response made me stop.
I had asked it to explain a concept based on my notes, something I had already spent time understanding. It gave a clean answer, structured, confident, and easy to follow. But halfway through, it referenced a formula and attributed it to my notes.
I paused because I knew that formula wasn’t there.
I went back and checked. It wasn’t buried somewhere I had forgotten about. It simply didn’t exist in my notes.
That’s when the problem became harder to ignore. The answer wasn’t obviously wrong. In fact, it looked correct. If I hadn’t been paying attention, I probably would have accepted it without questioning it.
That’s a different kind of failure.
Not something you can spot immediately, but something that quietly shifts your understanding without you realizing it.
At that point, I stopped treating it as a minor issue. I wanted to understand why the shift was happening.
Fixing the Input First
Before trying to fix the AI, I had to fix my notes.
Up until that point, the problem felt external. The AI was inconsistent, so I assumed the issue was with how it was responding. But the more I looked at my setup, the more obvious it became that I wasn’t giving it something reliable to work with.
My notes had no consistent structure. Some were written as full paragraphs, others as bullet points. A few had analogies; some didn’t. Even when two notes covered similar topics, they were formatted completely differently. It made sense that I struggled to navigate them. Expecting an AI system to interpret them consistently was even more unrealistic.
I moved everything into Markdown. Not because it’s a powerful tool, but because it forces simplicity. Plain text, lightweight formatting, and just enough structure to make things predictable.
Each note followed the same pattern. A concept at the top, a short explanation, my own analogy, and a section for things I didn’t fully understand yet. It wasn’t perfect, but it was consistent. And that consistency mattered more than anything else.
What surprised me was how much this changed things, even before bringing AI back into the picture. The notes became easier to scan, easier to revisit, and easier to build upon. I wasn’t spending time reinterpreting my writing anymore.
I also added a few lines at the top of each file, including basic metadata like topic and difficulty. It didn’t change how I used the notes directly, but it made them easier to organize once I started treating them as a collection rather than isolated pieces.
Looking back, this was the first real shift. The system didn’t start with the AI. It started with making the input structured enough to be usable.
Moving Beyond Chat
Up to this point, I was still using the AI through a chat interface. It worked for quick interactions, but it didn’t take long to feel the limitations. Every time I wanted to ask something, I had to paste my notes again or rely on whatever context was still in the conversation.
It didn’t feel like a system. It felt like starting over each time.
I wanted something that worked more consistently, where my notes were already part of the setup instead of something I had to reintroduce every session. That’s what pushed me to move beyond chat and use the API.
In simple terms, this meant writing a small script that sends my notes and questions directly to the model and receives responses back. No chat window, no manual copy-pasting, just a structured request and a structured response.
The shift itself wasn’t as complicated as it sounds, but it changed how I thought about the interaction. Instead of treating the AI like something I “talk to,” it started to feel more like a component I could build around.
There were a couple of practical things that became obvious rapidly. The API key behaves like a password, so it needs to be handled carefully. And since usage is billed per request, it’s deceptively simple to underestimate how quickly costs can add up if you’re not paying attention.
Once everything was set up, I took the simplest possible approach. I loaded all my notes, sent them together with each question, and let the model respond.
For a while, the system worked exactly the way I expected.
Then it started breaking again.
Why It Broke Again
As my notes grew, the system started behaving inconsistently again.
I would ask the same question and receive slightly different answers. Sometimes details I knew were in my notes just wouldn’t show up. It wasn’t obvious at first, but the pattern became difficult to ignore. The more notes I added, the less reliable the responses felt.
At that point, it stopped feeling like a small issue and started feeling like something fundamental.
That’s when I first encountered the concept of the context window.
The model can only process a limited amount of text at once. Everything you send—your notes, your question, and even parts of the previous conversation—must fit within that limit. If it doesn’t, some of it simply is dropped.
There’s no warning when this happens. The model doesn’t tell you it missed something. It just responds based on whatever portion it was able to read.
Once I understood that, the inconsistency made sense. The model wasn’t ignoring my notes. It literally couldn’t see all of them.
The limit itself is measured in tokens, not words. Tokens are smaller chunks of text, and they add up faster than you expect, especially with technical material. A few pages of notes can quickly turn into thousands of tokens.
Which means that sending all my notes with every question wasn’t just inefficient. It was eventually going to fail, no matter what.
That realization changed the problem. It was no longer about making the AI “better.” It was about working within a constraint I hadn’t understood before.
The real question became: How do I make sure the model sees the right information without trying to show it everything?
A Better Way to Think About It
Once I understood that the model couldn’t see everything at once, the problem became clearer. I didn’t need it to read all my notes every time. I just needed it to read the right parts.
Up until then, my approach had been simple: send everything and let the model figure it out. That worked when the notes were small, but it broke as soon as they grew beyond what the system could handle.
So I flipped the approach.
Instead of sending all my notes, I started by searching through them first. When I asked a question, the system would look for the most relevant sections, pull those out, and send only that smaller, focused context to the model.
That small shift made a noticeable difference.
The answers became more consistent, and more importantly, they started to sound like my notes again. The explanations reflected the way I had written things, including the analogies that had made sense to me when I first learned them.
This approach is often called retrieval-augmented generation, but the idea itself is straightforward. You retrieve the relevant information first and then generate a response based on it.
What stood out to me was that the method didn’t make the model smarter. It just made it more grounded. Instead of relying on whatever it “knew,” it was now anchored in what I had actually written.
That distinction mattered more than I expected.
The Moment It Clicked
Even after fixing retrieval, there was still something that didn’t feel completely right.
Usually, the answers were grounded in my notes. They reflected my explanations, my analogies, and the way I had built up my understanding. But every now and then, something would slip through that didn’t belong.
One response made it obvious.
I had asked the system to explain backpropagation using my notes. It started well, walking through the idea in a way that matched how I had written it. Then, halfway through, it introduced a detailed mathematical formula.
I stopped immediately.
I hadn’t written that formula. Not even close. I had deliberately avoided formal math at that stage and focused only on intuition. There was no way it could have come from my notes.
But the answer didn’t flag that. It didn’t say, “This part isn’t from your notes.” It just continued as if everything were consistent.
And that’s when it clicked.
The model wasn’t strictly using my notes. It was using my notes as a starting point and then filling in the gaps with what it already knew. It didn’t distinguish between the two. It just generated what sounded like a complete answer.
That’s what hallucination actually is.
It’s not randomness. It’s not a system failure in the way you might expect. It’s the model trying to be helpful beyond the information you’ve given it. It doesn’t know where your context ends, so it keeps going.
The problem is that these additions don’t feel wrong. They often sound perfectly reasonable, especially if you’re still learning the topic. That makes them easy to accept without questioning.
And that’s what makes it dangerous.
Setting Boundaries
Once I understood what was happening, the fix was surprisingly simple.
The model wasn’t going to respect the boundary of my notes on its own. If I wanted it to stay within that boundary, I had to make that explicit.
So I added a rule.
Before every interaction, the model was instructed to answer only using my notes. And if something wasn’t covered, it had to say so clearly instead of filling in the gap.
That small change made a bigger difference than anything else I had done so far.
The responses became more consistent, but more importantly, they became more honest. Instead of extending beyond what I had written, the model would now stop and acknowledge when something was missing.
That changed how I interacted with it.
If the system said, “You haven’t written about this yet,” that wasn’t a limitation. It was a signal. It told me exactly where my understanding was incomplete and where I needed to focus next.
Without that boundary, the model was trying to be supportive in a way that blurred the line between what I knew and what I didn’t. With the boundary in place, that line became clear again.
And that clarity made the system far more useful for learning.
One System, Two Behaviors
At this point, the system was working reliably. It stayed within my notes, surfaced the right context, and stopped when something wasn’t covered. But I started noticing something else.
I wasn’t using it for just one thing.
Occasionally I wanted clear explanations of concepts I had already written about. Other times, I wanted it to generate practice questions so I could test my understanding. These two tasks sound similar, but they actually require completely unique behavior.
For explanations, I wanted consistency. If I asked the same question twice, I expected roughly the same answer, grounded and predictable. For practice questions, I wanted variety. Repeating the same patterns wouldn’t help much.
That difference is due to a parameter called temperature.
Lower values make the output more stable and predictable. The model sticks closer to what it is most confident about. Higher values introduce more variation, allowing it to explore different ways of framing questions or combining ideas.
Adjusting that single parameter was enough to shift the behavior of the system depending on what I needed.
I didn’t have to change the model. I didn’t have to change the setup. I just had to decide what kind of output I was looking for.
What This Was Actually About
Looking back, the project wasn’t really about building a better way to study.
It started with something simple, my notes not working the way I wanted them to. But every time I tried to address that problem, I ran into a limitation I didn’t understand yet. And each of those limitations pointed to a deeper idea about how these systems actually work.
The context window explained why information seemed to disappear. Tokens explained why that limit showed up faster than expected. Retrieval changed how I thought about passing information to the model. And hallucination made it clear that the system doesn’t naturally respect the boundary of what you give it.
None of these felt like abstract concepts once I saw where they came from. They were all responses to very specific problems.
That was the shift.
I stopped thinking of the model as something unpredictable and started seeing it as a system operating under constraints. Once those constraints were clear, the behavior stopped feeling random.
What made the most significant difference wasn’t learning what each term meant in isolation. It was understanding why those terms existed at all.
AI systems aren’t magic. They’re solutions to a set of practical problems: limited memory, incomplete context, the need to retrieve the right information, and the tendency to fill gaps when something is missing.
Once you experience those problems clearly, the solutions make sense.
And once the solutions make sense, the system stops feeling like something you have to work around and starts feeling like something you can actually build with.
AI Kept Forgetting My Notes. Fixing That Taught Me How It Actually Works. was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.