From Notebooks to Production: The Hard Truth About Shipping Your First AI App

Image generated by Author using AI

10 months. Two complete rewrites. Five deployment platforms. Countless late nights. One hard-won lesson: shipping is a completely different skill than building.

I’ve been working with AI models for over ten months. Training them, fine-tuning them, wrapping frameworks around them, making them do useful things. I could take a raw model, benchmark it, improve it, and deploy it into a Jupyter notebook pipeline with my eyes closed. I thought that meant I knew how to build software.

Then I tried to ship a full end-to-end application — one with a real UI, real users, real infrastructure — and the universe laughed at me.

This isn’t a polished success story. It’s the unfiltered account of every wrong turn, every platform that failed me, every architectural decision I had to undo, and the slow, hard-earned progress that finally got me to something I’m proud of.

If you’re an ML engineer, a backend developer, or anyone who’s ever stared at a Streamlit app and thought “this is fine for now” — this is the story you need to read before you find out the hard way.

Chapter 1: The Confidence Trap — “I’ll Just Use Python for Everything”

When I started, I wasn’t thinking about architecture. I was thinking about the AI. The model was the point — everything else was just scaffolding to make it accessible.

So I made the classic mistake: I chose my tools based on familiarity, not fitness for purpose.

For the frontend, I looked at Streamlit and Gradio. Both are Python-native. Both let you spin up a UI in hours without touching HTML or JavaScript. Both felt like the right call for someone who thinks in Python. I picked them because they were fast and because I’d used them before for demos and internal tools.

For the backend, I weighed Django against Flask. Django is powerful but heavy — a lot of convention, a lot of structure, a lot to configure before you’ve written a single line of actual business logic. Flask, on the other hand, is lightweight, fully Python, and easy to get running quickly. If you know Python, Flask feels almost transparent. You write functions, you add routes, and it just works.

So that was my stack: Gradio/Streamlit on the frontend, Flask on the backend. I told myself it was pragmatic. Minimum viable stack for a minimum viable product.

I built the MVP in two months. And in those two months, I was genuinely proud of the progress. The AI was working. The backend was handling requests. The frontend was showing outputs. I deployed it. I sent the link to a few people.

And then I watched someone use it for the first time.

The UI looked like a developer tool. Not in a good, utilitarian way — in a way that said “a developer built this for themselves and forgot that other people would have to use it.” It wasn’t responsive. On mobile, it was nearly unusable. The layout was rigid. Colors were default. Interactions were clunky.

Worse, when I tried to fix these things, I hit walls immediately. Streamlit and Gradio abstract away the HTML and CSS by design — that’s the tradeoff. You get speed at the cost of control. Want to change the button style? Limited. Want a custom layout that adapts to screen size? Painful. Want to add animations or transitions that make the app feel alive? Good luck.

The ceiling was too low. I had built something functional but not something good. And in a world where users make judgments in seconds, functional but not good is the same as broken.

Chapter 2: The Deployment Nightmare — A Tour of Broken Promises

While I was wrestling with the UI problem, I was also wrestling with something more fundamental: getting the thing to actually run reliably for people who weren’t me.

I started this journey assuming deployment was a solved problem. You build the thing, you push it to a platform, users access it. Simple, right?

Wrong.

Railway was my first attempt at proper cloud deployment. The platform looked promising — good developer experience, reasonable pricing, easy integration with GitHub. But I ran into a wall almost immediately: payment integration issues. The configuration was more complex than expected, I couldn’t get it working properly, and with no straightforward path to resolution, I moved on.

Google Cloud Run seemed like the power move. Serious infrastructure, serious scalability. I spun up a VM and got the app running. And then I discovered the problem with running a VM: it needs to actually keep running.

The free and low-cost tiers have auto-stop behaviors. My instance kept getting shut down during low-traffic periods. Every time it stopped, I had to go back in, restart it, sometimes reconfigure things that had reset.

If a new resource charge hit and I needed to adjust my setup, the process was essentially: tear it down, redo the deployment from scratch, verify everything is working again. It was continuous operations overhead on a project where I was a team of one. I was spending more time managing infrastructure than building product.

Digital Ocean was the most functional option I found. Clean interface, predictable behavior, solid uptime. But the cost was too high for an early-stage project without revenue. When you’re bootstrapping something and every dollar matters, paying for infrastructure that exceeds what the project currently needs is hard to justify.

Render became my answer. The free tier has its limitations — instances spin down after inactivity, which means cold starts for users who hit the app after a quiet period.

But it gave me something the other platforms hadn’t: stability I could build on. I could actually focus on the application instead of the infrastructure. The ops overhead dropped dramatically, and that freed up mental space to work on the real problems.

The meta-lesson from all of this: deployment isn’t just a technical decision, it’s a product decision. The platform you choose determines how fast you can iterate, how much cognitive overhead you carry, and how much time you spend fighting infrastructure versus building features. Choose wrong, and you’re not just paying in money — you’re paying in momentum.

Chapter 3: The Resource Reckoning — When the Cloud Bill Becomes Your Product Manager

Once I’d found a stable deployment home, I had to get serious about resource allocation. Running an AI application isn’t like running a simple CRUD app — inference is expensive, and if you’re not careful, your costs will outpace your progress before you’ve built anything worth paying for.

I went through multiple cycles of adjustment before landing on a configuration that made sense:

RAM at 4GB. This was the floor below which the application became genuinely unpleasant to use — slow responses, timeouts under load, a general sense of lag that made the AI feel worse than it was. 4GB kept things moving without burning money unnecessarily.

Storage at 10GB per build. Tight, but manageable. It forced discipline in what I was bundling into each deployment — no bloated dependencies, no unnecessary assets.

Artifact registry capped at 2 entries. This meant every new build pushed out the oldest one. No hoarding of old versions. Clean, minimal.

Deployed versions retained for 2 days only. If I had multiple deployments running, all instances were kept — but after 2 days, only the 2 most recent survived. Everything older was automatically deleted. This kept storage costs predictable and forced me to be deliberate about what I was deploying.

What I learned from this process is something I didn’t expect to find in resource management: constraints clarify thinking. When you can’t afford to be wasteful, you stop being wasteful.

When you have to make deliberate choices about every megabyte and every instance, you start understanding your system at a deeper level. You find the fat. You cut it. The product gets leaner and faster as a result.

The cloud bill, as it turns out, is a surprisingly good product manager. It has opinions, and it will share them with you whether you ask or not.

Chapter 4: The Two-Service Problem — When Your Architecture Fights Itself

By this point I had a working system, but it had a fundamental architectural issue: two separate services running in the cloud.

Service one: the Python/Flask backend handling all the AI inference and data processing.

Service two: the frontend. Two codebases, two deployments, two sets of environment variables to manage, two potential failure points, and the latency overhead of every frontend request having to cross the network to reach the backend.

During local development, this wasn’t obvious as a problem. Everything ran on the same machine, latency was negligible, and it was easy to restart either service independently.

But in production, the cracks showed immediately. Inter-service communication introduced delays. Keeping both services in sync during deployments added coordination overhead. And the cost of running two cloud instances was nearly double what a single well-architected service would have required.

The natural response was: merge them. Run everything as one service. It seemed obvious.

I tried it. It didn’t work.

The issue was a circular dependency problem between the Python backend and the React frontend layer. The Python backend was doing heavy AI lifting — model inference, data transformation, API calls to external services.

Trying to fold that into a React-centric architecture created dependency chains that were genuinely difficult to untangle. The two halves of the application had evolved separately, with different assumptions about how data would flow, and forcing them together created more problems than it solved.

The solution required a different framing. Instead of trying to merge the services physically, I rethought the responsibility boundaries. Python stayed as the inference engine — its job was to handle computationally expensive tasks and nothing else. React took ownership of all UI state, user interaction logic, and presentation. The two services still existed, but they had cleaner contracts between them.

Over time, as I streamlined the backend and reduced its surface area, I was able to consolidate to a single instance. The payoff was real: faster response times, lower cost, simpler deployment, and a system that was easier to reason about. Going from 2 services to 1 felt like setting down a weight I hadn’t realized I was carrying.

Chapter 5: Learning React in 3–4 Weeks — The Frontend Reckoning

Let me be honest about something: I did not want to learn React.

I had spent years getting good at Python. I understood its idioms, its ecosystem, its quirks. The idea of investing significant time into a JavaScript framework — with its own ecosystem, its own idioms, its own entire philosophy about how UIs should work — felt like a detour from what I actually wanted to build.

So I looked for alternatives first. I researched everything that might let me avoid JavaScript without sacrificing UI quality.

HTMX was the most promising alternative. It’s genuinely easy to learn — you can drop it into an existing architecture with minimal disruption, embed it inside containers, and get dynamic behavior without writing much JavaScript. For small-scale applications, it’s an elegant solution. The problem is the ceiling.

HTMX is designed for limited improvements — progressive enhancement on top of server-rendered HTML. For a large-scale application with complex state, real-time updates, and rich interactions, it runs out of runway quickly. It’s not built for what I needed.

After exhausting the alternatives, I accepted the reality: React was the answer. Not because it was comfortable, but because it was right for the problem.

I gave myself 3–4 weeks to go from zero to a working migration. This was aggressive. I was learning the framework while simultaneously porting an existing application onto it — debugging React concepts and application logic at the same time. There were days where I wasn’t sure if something was broken because I’d misunderstood React or because the underlying logic was wrong.

But it came together. The Python/Flask backend stayed in place for inference. React took over everything visible to the user. Two services, cleaner than before, with React finally giving me the design control I’d been missing since the beginning.

The UI improved dramatically. Not because I’m a great designer — I’m not. But because React gives you the tools to actually implement good design when you know what you want. Responsive layouts, smooth state transitions, components that behave consistently — all of it became possible in ways that Streamlit and Gradio had never allowed.

The hardest thing I had to accept in this entire journey is also the most important: the interface is the product. It doesn’t matter how good the model is if the UI makes it feel slow, confusing, or untrustworthy. Users don’t experience your architecture. They experience what’s on their screen. And if that experience is poor, they leave — regardless of what’s running underneath.

Chapter 6: Authentication — The Unglamorous Work That Makes Everything Real

With a stable architecture and a frontend I was finally proud of, I turned to authentication. This is the part of building that nobody finds exciting but that makes everything else real. Without auth, you don’t have a product — you have a demo.

I implemented Google OAuth sign-in. The flow is the industry standard for good reason: users already trust Google with their credentials, the redirect-based authentication pattern is familiar, and setting up the client on the backend is well-documented.

You create an OAuth client, configure your redirect URIs, and let Google handle the actual credential verification. It’s not glamorous, but it works reliably and removes a significant barrier to adoption — nobody wants to create yet another account if they can sign in with something they already have.

On top of OAuth, I added two-factor authentication (2MFA) as a requirement before users could fully access the application. This added friction to the onboarding process — there’s no way around that.

But for an AI application that processes user data and potentially sensitive inputs, the friction is justified. Security isn’t something you bolt on after the fact. Every week you delay proper authentication is a week where your users are exposed and your application is a liability rather than an asset.

Getting auth right also unlocked something unexpected: it made the application feel serious. When users go through a proper authentication flow with 2FA, they understand that this is a real product, not a toy. It sets expectations appropriately and establishes trust before the user has even seen what the application does.

What I’d Tell Myself at the Start

Ten months of building, rebuilding, and rebuilding again distilled into the things I genuinely wish someone had told me before I started:

Your MVP stack will not be your production stack — and that’s okay. The tools that let you move fast at the start are rarely the tools that scale. Accept this early, and you’ll make peace with rewrites instead of treating them as failures. Every rewrite I did made the product better. None of them were wasted.

Deployment is a product decision, not just an infrastructure one. The platform you choose shapes your iteration speed, your cost structure, and your sanity. Evaluate platforms the way you evaluate product decisions: what does this enable, what does it cost, and what does it make harder?

UI is not optional, even for backend engineers. Especially for AI applications. The model is the engine — but the interface is the car. Nobody cares about your engine if the car is unpleasant to drive.

Simplicity is architecture. Every service you remove is a failure mode you eliminate, a cost you reduce, and a cognitive load you no longer carry. Fight for the simplest system that does the job.

Constraints make you better. Resource limits, cost limits, time limits — they all force clarity. Some of my best technical decisions came directly from not being able to afford the easy answer.

Security is not optional. Build authentication properly from the beginning. The cost of doing it right early is far lower than the cost of retrofitting it after users have data in your system.

The Product Still Exists. That’s the Win.

After ten months, two complete architectural overhauls, five deployment platforms, a self-taught crash course in React, more failed deployments than I can count, and more late nights than I’ll admit — the application is live.

It’s fast. It’s responsive. The UI is something I’m actually proud to show people. Users can sign in securely. The infrastructure is stable and cost-efficient. The AI that started this whole journey is finally wrapped in something worthy of it.

The journey from “I’ll just use Python for everything” to a production-ready, full-stack AI application taught me that the hardest part of building AI products isn’t the AI.

The model was never the hard part. The hard part is the frontend you don’t want to learn, the deployment platform that keeps breaking, the architectural decision you have to undo three months after you made it, and the authentication system you put off because it feels boring compared to the interesting AI problems.

The hard part is all the stuff around it.

And here’s the thing about all that hard stuff: it makes you a better engineer. Not because suffering is inherently educational, but because every one of those problems forced me to understand something I’d been comfortable not understanding.

I know more about deployment, frontend architecture, resource optimization, and authentication than I ever would have learned in a purely backend or ML role. The product is better because of every painful iteration.

Ship the thing. Learn what breaks. Fix it. Repeat.

That’s the whole process. And it’s worth every painful step.

If you’re on a similar journey — building AI applications from the ground up, wrestling with deployment, or staring at a UI rewrite you’ve been putting off — I’d love to hear what’s breaking for you. Drop it in the comments.

Disclaimer: I used AI to help refine and structure my research for the content. The insights are from my direct experience and my own work.

📘 My Books
Modern AI Systems
A practical exploration of building, deploying, and scaling modern AI systems.
👉 https://www.amazon.com/dp/B0GM71ZBW3?binding=kindle_edition&ref=dbs_m_mng_rwt_sft_tkin_tpbk
👉https://tanveer94.gumroad.com/l/pnnti

Building Reliable AI
A hands-on guide to understanding and building large language models from the ground up.
👉 https://www.amazon.com/dp/B0GJQ9HPVJ
👉https://gum.new/gum/cmly1ii9x001b04k799x8eeje

Enjoyed this article? Read other articles

The 6 Optimization Algorithms: How AI Learns to Learn 10× Faster with 50% Less Memory
The 6 Learning Rate Schedules: How to Accelerate Training Without Crashing
The 4 Mixture of Experts Architectures: How to Train 100B Models at 10B Cost
The 6-Stage Journey: How Pre-Training Creates AI Intelligence from Scratch
The 5 Normalization Techniques: Why Standardizing Activations Transforms Deep Learning

Transform your career

The Complete LLM Mastery Course: From Zero to Production Hero
I Spent 6 Months Reverse-Engineering How Elite AI Engineers Think. Here’s What Separates Them From Everyone Else
The AI Knowledge Gap That’s Costing Engineers Their Career Growth

I write about AI, system design, startups, and the real lessons from building products.


From Notebooks to Production: The Hard Truth About Shipping Your First AI App was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Liked Liked