How We Lost 9 Months to Invisible Architecture Decay (And Fixed It in 3)
The Problem Nobody Names
Six months into the medical CRM calendar project, I opened a PR and spent twenty minutes just figuring out which slice owned the appointment state. The code wasn’t broken. It wasn’t even obviously wrong. It had just quietly become something else.
We kept calling it “technical debt” — but that wasn’t quite right either. Technical debt implies a conscious trade-off: ship now, fix later. What we had was different. Nobody made a bad call. Every decision was defensible. But stack twelve defensible decisions on top of each other, and you end up somewhere nobody intended to go.
We started calling it Architecture Decision Degradation (ADD): the gradual erosion of architectural quality through accumulated compromises. No dramatic breaking point. Just a slow creep — invisible until sprint velocity fell off a cliff.
This is the story of how it happened on a medical CRM calendar planner, with 18 months of real metrics, real code, and a refactor that got us back.
What Is Architecture Decision Degradation?
ADD is different from “technical debt.”
Technical debt is a conscious trade-off: ship now, fix later. ADD is unintended erosion — architecture degrading even when teams make “correct” decisions at every step. The scary part? Nobody made a “bad” decision. Each step felt correct in isolation. But stack them up over six months, and the architecture becomes barely recognizable.
ADD follows a predictable lifecycle:
| Phase | Timeline | What’s happening |
|—-|—-|—-|
| Complexity Creep | Months 1–9 | Slice proliferation, inconsistent patterns |
| Race Conditions | Months 10–14 | Real-time updates collide with optimistic state |
| Velocity Collapse | Months 15+ | 40%+ bugs from state, rewrites discussed |
In our case, we didn’t see it until Month 16. By then, we were looking at a full 3-month refactor.
We Started So Well
Building a drag-and-drop calendar for scheduling doctor appointments seemed straightforward. The team made all the “right” decisions upfront:
- Colocated state in Redux-Toolkit (no prop drilling)
createSelectorfor memoizationcreateAsyncThunkfor async handling- Optimistic updates for drag-and-drop
Month 1 — clean architecture:
// store/slices/appointmentsSlice.ts
const appointmentsSlice = createSlice({
name: 'appointments',
initialState: [] as Appointment[],
reducers: {
addAppointment: (state, action) => {
state.push(action.payload);
},
updateAppointment: (state, action) => {
const index = state.findIndex(a => a.id === action.payload.id);
state[index] = action.payload;
}
}
});
export const selectAppointments = (state: RootState) =>
state.appointments;
Team velocity was high. The architecture felt solid — and that feeling of “man, this is clean” was the first warning sign we completely missed.
Phase 1: Complexity Creep (Months 3–9)
New requirements arrived one by one:
- Real-time backend updates via WebSocket
- Proactive conflict detection during drag
- Filters by doctor, specialty, date range
- Team grew: 2 → 5 developers, each adding their own slices
What started as 3 slices became 10:
// Month 3: appointmentsSlice, doctorsSlice, timeSlotsSlice
// Month 9: 10 slices and counting...
// store/slices/appointmentsSlice.ts — 600+ lines
// store/slices/appointmentsCacheSlice.ts — 200 lines
// store/slices/conflictsSlice.ts — 180 lines
// store/slices/dragDropSlice.ts — 250 lines
// store/slices/filtersSlice.ts — 150 lines
// store/slices/filtersPersistenceSlice.ts — 80 lines
// store/slices/validationSlice.ts — 200 lines
// store/slices/uiSlice.ts — 180 lines
// store/slices/notificationsSlice.ts — 140 lines
// store/slices/websocketSlice.ts — 200 lines
The selector chain that followed:
export const selectFilteredAppointmentsWithConflicts = createSelector(
[
selectAppointments,
selectConflicts,
selectActiveFilters,
selectDragPreview,
selectValidationStatus
],
(appointments, conflicts, filters, preview, validation) => {
// 80+ lines of transformation logic
return appointments
.filter(apt => matchesFilters(apt, filters))
.map(apt => ({
...apt,
hasConflict: checkConflict(apt, conflicts, preview),
isValid: checkValidation(apt, validation),
isDragging: preview?.appointmentId === apt.id
}));
}
);
// Changing ONE slice breaks ALL selectors.
// No clear ownership. Re-renders cascade through the entire app.
Metrics at Month 9:
- PR review time: 30 min → 2+ hours
- Onboarding new developers: 1 day → 3–5 days
- “Where is this state updated?” asked daily
- State-related bugs: 15% → 25% of all issues
Phase 2: Race Condition Nightmare (Months 10–14)
WebSocket + Optimistic Updates + Drag-and-Drop. In isolation, each decision made sense. Together, they created something we didn’t have a name for at the time:
// Scenario: user drags appointment to new time slot
// 1. Optimistic update fires immediately
dispatch(updateAppointmentOptimistic({ id: apt.id, newTimeSlot: newSlot }));
// 2. WebSocket update arrives from backend
onWebSocketMessage((msg) => {
dispatch(updateAppointmentFromBackend(msg.data));
// Arrives BEFORE optimistic update settles → UI flickers, appointment jumps back
});
// 3. API response arrives — may already be stale
dispatch(updateAppointmentFulfilled(response));
// 4. Conflict check runs on stale state → false positives
dispatch(checkAppointmentConflict(apt));
// Symptoms:
// - Drag preview "snaps back" randomly
// - Conflicts flash then disappear
// - Appointment appears in TWO slots for 200ms
// - Click handlers fire on wrong appointment
The team responded with band-aids:
// "Fix" #1: Debounce to mask the race condition
const debouncedUpdate = useCallback(
debounce((apt) => dispatch(updateAppointment(apt)), 300), []
);
// "Fix" #2: Version checking for stale updates
if (response.version > state.version) {
dispatch(updateFromBackend(response));
}
// "Fix" #3: Pause WebSocket during drag
useEffect(() => {
if (isDragging) websocket.pause();
}, [isDragging]);
I remember merging the WebSocket pause on a Thursday and thinking: finally, that’s done. It wasn’t done. We’d just moved the problem somewhere less visible. That’s the trap with ADD — you’re always debugging the last thing that broke, not the thing that’s breaking everything.
The Breaking Point (Month 16)
A developer spent 3 days debugging why appointments disappeared during drag-and-drop — but only when: WebSocket was active, another user was editing the same doctor, a filter was applied, and the browser tab was in the background (throttling).
Root cause: a selector chain reading from 7 slices with nondeterministic update order due to Redux batch timing.
State of the codebase at Month 16:
- 10 slices, 3 with circular dependencies
- 20+ selectors reading from 3+ slices each
- 40% of all bugs related to state synchronization
- Velocity: 12 → 7 story points/sprint (−42%)
- New features taking 2× longer than Month 3
Two options: keep patching and slow down further, or stop everything and restructure.
The Fix: Three Architectural Principles
The refactor took 3 months and was built on three principles.
Principle 1: Consolidated Domain Slices
10 slices → 3, organized by domain responsibility:
// store/slices/entitiesSlice.ts — all data
const entitiesSlice = createSlice({
name: 'entities',
initialState: {
appointments: byId<Appointment>(),
doctors: byId<Doctor>(),
timeSlots: byId<TimeSlot>()
},
reducers: {
upsertEntity: (state, action) => {
const { entityType, id, data } = action.payload;
state[entityType][id] = { ...state[entityType][id], ...data };
}
}
});
// store/slices/uiSlice.ts — UI-only state
const uiSlice = createSlice({
name: 'ui',
initialState: {
dragPreview: null,
activeFilters: {},
openModals: [],
validationErrors: {}
},
reducers: {
setDragPreview: (state, action) => { state.dragPreview = action.payload; }
}
});
// store/slices/sessionSlice.ts — session state
const sessionSlice = createSlice({
name: 'session',
initialState: {
currentUser: null,
websocketStatus: 'disconnected',
pendingTransactions: {}
},
reducers: {
setWebsocketStatus: (state, action) => {
state.websocketStatus = action.payload;
}
}
});
Principle 2: Transaction Middleware — Solving Race Conditions Properly
The core idea: wrap every optimistic update in an explicit transaction with a commit and rollback. A Redux middleware intercepts actions tagged with meta.transaction, saves a snapshot of the state before the optimistic update, and discards any conflicting WebSocket updates that arrive while the transaction is still pending. If the API call fails — it rolls back to the snapshot.
Instead of masking race conditions with debounce, the team built explicit transaction boundaries:
// middleware/transactionMiddleware.ts
const transactionMiddleware: Middleware = store => next => action => {
if (!action.meta?.transaction) return next(action);
const { id, phase } = action.meta.transaction;
if (phase === 'optimistic') {
store.dispatch({
type: 'transaction/begin',
payload: { id, originalState: cloneDeep(store.getState()) }
});
}
// WebSocket update arrives during active transaction — discard it
if (action.meta?.fromWebSocket) {
const activeTransaction = selectActiveTransaction(store.getState(), id);
if (activeTransaction) return; // Let optimistic update win
}
if (phase === 'commit') {
store.dispatch({ type: 'transaction/commit', payload: { id } });
}
if (phase === 'rollback') {
store.dispatch({
type: 'transaction/rollback',
payload: { id, originalState: action.payload.originalState }
});
}
return next(action);
};
Usage — drag-and-drop now bulletproof:
const handleDragEnd = useCallback(async (result) => {
const transactionId = uuid();
// 1. Optimistic update
dispatch(moveAppointmentOptimistic({
appointmentId: result.draggableId,
newTimeSlot: result.droppableId,
meta: { transaction: { id: transactionId, phase: 'optimistic' } }
}));
try {
// 2. API call
await api.moveAppointment(result.draggableId, result.droppableId);
// 3. Commit
dispatch(moveAppointmentFulfilled({
meta: { transaction: { id: transactionId, phase: 'commit' } }
}));
} catch {
// 4. Rollback on error
dispatch(moveAppointmentRejected({
meta: { transaction: { id: transactionId, phase: 'rollback' } }
}));
}
}, [dispatch]);
Principle 3: Colocated Selectors with Single-Slice Ownership
// selectors/appointments.ts — all appointment selectors in ONE file
export const selectAppointmentById = (id: string) =>
(state: RootState) => state.entities.appointments[id];
export const selectAppointmentsByDoctor = (doctorId: string) =>
createSelector(
[(state: RootState) => Object.values(state.entities.appointments)],
(appointments) => appointments.filter(apt => apt.doctorId === doctorId)
);
export const selectAppointmentsWithConflictStatus = createSelector(
[
selectAppointmentsByDoctor(doctorId),
(state: RootState) => state.entities.timeSlots
],
(appointments, timeSlots) =>
appointments.map(apt => ({
...apt,
hasConflict: checkConflictWithTimeSlots(apt, timeSlots)
}))
);
// Each selector reads from ONE slice. No cross-slice chains.
Results
Before (Month 16) → After (Month 20):
| Metric | Before | After | Δ |
|—-|—-|—-|—-|
| Story points/sprint | 7 | 11 | ~+55% |
| Total bugs/sprint | 15–20 | 8–10 | −50% |
| State-related bugs | 40% of total | 15% of total | −75% |
| PR review time | 2+ hours | 45 min | −62% |
| Onboarding (state) | 3–5 days | 4 hours | −80% |
| Redux slices | 10 (3 circular) | 3 (clear domains) | −70% |
| Avg selector dependencies | 3.5 slices | 1.2 slices | −66% |
Code reviews started focusing on business logic instead of state plumbing. Adding new features stopped feeling like defusing a bomb.
ADD Is Not Just a Redux Problem
Redux is where we felt it first — but it wasn’t the only place.
Around month 12, a backend developer mentioned in passing that they were up to 14 microservices for what started as 3. “We’re not sure who owns user notifications anymore,” he said. At the time I filed it away as a backend problem. It wasn’t. It was the same pattern: every service added for a good reason, ownership dissolving gradually, circular dependencies appearing only after the fact.
CI/CD does it too — config drift, duplicated deployment logic, the classic “works on staging, breaks on prod” that nobody can explain. Database schemas accumulate it in missed indexes and migrations that made sense in the moment.
The stack doesn’t matter. What matters is whether someone is actively watching the seams — because ADD doesn’t announce itself.
How to Recognize ADD Before It Kills Velocity
Watch closely — act within 2–3 sprints:
- 6+ slices with unclear domain boundaries
- Selectors regularly reading from 2+ slices
- State-related bugs exceed 25% of total
- New developers need more than 2 days to understand state flow
Act immediately:
- 9+ slices and growing
- Circular dependencies between slices
- Race conditions appearing in production
- Velocity down 30%+ vs baseline over 3+ months
Prevention template for new projects:
store/
├── slices/
│ ├── entitiesSlice.ts ← all data entities
│ ├── uiSlice.ts ← UI-only state
│ └── sessionSlice.ts ← session state
├── selectors/
│ ├── appointments.ts ← colocated with domain
│ └── conflicts.ts
├── middleware/
│ └── transactionMiddleware.ts
└── types/
└── index.ts
Rules:
1. Max 3–4 domain-driven slices
2. Each slice = single responsibility
3. Selectors read from ONE slice only
4. All async actions use transaction boundaries
5. Create a new slice only after 3+ confirmed use cases
Conclusion
Architecture Decision Degradation isn’t random. In nearly every project I’ve seen, it follows the same trajectory: clean start, complexity creep, race conditions, velocity collapse.
No Redux architecture stays clean on its own. Slice proliferation is the first signal. Race conditions are the second. By the time 40% of sprint time goes to state bugs, ADD has already won.
None of the three principles we applied are revolutionary. The hard part was recognizing ADD early — before velocity collapsed and we’d already burned three sprints on state bugs. Consolidated domain slices, explicit transaction boundaries, and single-slice selector ownership turned a 7-point sprint into an 11-point sprint and cut state-related bugs by more than half.
We lost 9 months to gradual degradation, then spent 3 months refactoring. Total: a year of pain. But velocity came back, and so did confidence in the codebase.
Solve race conditions properly, don’t debounce them away. Your architecture will still degrade — but now you’ll see it coming.