Why I Wrote My Own Terminal Emulator (and How)

digitado ⋅ 3 de June de 2026

I’ve written before about the wrapper I built around Claude Code – a TUI that hot-swaps API backends underneath it. You configure your backends once and switch between them mid-session with a hotkey; the normal way to change backend is to edit a config file and restart Claude Code every time. That wrapper rendered Claude Code inside an off-the-shelf terminal stack: one crate kept the screen state, another drew it, a third fed it input. For a while, that was enough.

It stopped being enough. I wanted three things a simple TUI client couldn’t give me: maximum flexibility over every layer, a real understanding of how terminals work, and a foundation that wouldn’t fight me on the next feature. So I wrote my own terminal emulator.

This is why, and how.

What I had, and why it wasn’t enough

That stack was reasonable. alacritty_terminal maintained the screen grid, ratatui drew it, crossterm captured input, and portable-pty ran Claude Code in a pseudo-terminal underneath. Each is a good library. Together they had limits I kept running into.

ratatui is an immediate-mode TUI framework: it works in rows and columns of fixed-width character cells, redrawn every frame. It’s a good fit for dashboards. But the screen is a character grid – no pixel-level control, no variable-width fonts, no smooth scrolling. Scrolling moves a whole row at a time.

crossterm was the first concrete problem. It parses raw terminal input into structured events, then re-encodes those events into escape sequences for the child PTY – and that round-trip isn’t lossless. Option+Backspace, Ctrl+Arrow, and Alt+Arrow were dropped. I’d already written a small crate to bypass it and forward raw bytes instead. I was replacing pieces of the stack one at a time.

I had three reasons to replace the rest.

Flexibility. I’m building a custom UI around Claude Code – panel splits, top and bottom chrome, popup overlays, animations. All of those need to live in the same coordinate space as the terminal content. With a third-party renderer, the terminal is an opaque region you blit to the screen; you can’t draw a popup half over it, animate a divider through it, or share a glyph atlas between the chrome and the text. Owning the renderer makes the terminal just more geometry in the same scene.

Understanding. I wanted to know how terminals work – the escape-sequence state machine, glyph atlases, subpixel positioning, why scrolling feels different from one terminal to the next. You don’t learn that by importing a crate.

The next feature. Everything I wanted next was beyond what that stack could do: GPU rendering, variable-width fonts, momentum scrolling, image paste, custom panel layout. Each one was either impossible there or a fight.

What writing your own terminal actually means

A terminal emulator is two things.

The first is the parser and model. The bytes coming back from the child process are a stream of printable text interleaved with ANSI escape sequences – “move the cursor to row 5”, “set the foreground to red”, “erase three characters”, “switch to the alternate screen”. The emulator is a state machine that consumes those bytes and maintains a grid of cells: characters with colors and attributes, a cursor, and a scrollback buffer.

The second is the renderer: it turns that grid into pixels.

The fact that made the project tractable: I am not writing a general-purpose terminal. Mine has one job – run Claude Code, and that’s a known, bounded set of escape sequences. I don’t need sixel graphics, tmux control mode, or every legacy DEC character set. I need what Claude Code emits, and nothing more. A general terminal has to handle anything; mine has to handle one program well. That lets me cut scope nearly everywhere.

A monospace grid behind a variable-width screen

The terminal lives in two crates. term_core is the parser and the grid – the model – with zero dependencies. term_gpu is the renderer: atlas, shaders, scrolling, text shaping. (Two more crates handle panel splitting and the clipboard, and a small UI kit draws the chrome, but they sit around the terminal rather than inside it.)

The decision that shaped everything else: the model is monospace, the screen is not.

Claude Code positions its cursor by cell – “row 5, column 10, erase three characters.” That only works if column 10 is one definite place. So the grid is a fixed-cell grid: a Vec<Row>, each row a Vec<Cell>, every cell one column wide. Cursor moves, erases, inserts, and deletes all address cells, unambiguously.

But I draw with a proportional font – SF Pro – because a monospace face looks worse. So at render time the grid still says “column 10,” and the renderer puts column 10 at x = 10 * cell_width no matter how wide the glyph actually is. The model is a grid; the picture is not. Keeping those two apart is what lets a proportional font sit on top of strict cell semantics.

A cell is deliberately small, because there are a lot of them:

pub struct Cell {
    pub c: char,
    pub fg: TermColor,
    pub bg: TermColor,
    pub flags: CellFlags,              // bold / italic / underline / inverse / ...
    pub extra: Option<Box<CellExtra>>, // rare metadata, heap-indirected
}

The common cell is about 24 bytes. The rare things – combining marks, hyperlinks, prompt markers – go behind the boxed extra, so a screenful of plain text is a flat array of small structs, and the exceptions allocate only when they actually appear.

Parsing the byte stream

The parser is a state machine over the escape-sequence grammar – about 770 lines of Rust. There’s a crate for it (vte), but term_core is the root of the whole pipeline and I wanted it to depend on nothing; when something breaks, there’s one place to look. The state diagram is fully documented, and following it is less work than adapting to someone else’s callback trait. The core is a match on (state, byte) that moves the state and emits actions:

enum State { Ground, Escape, CsiParam, OscString, /* DCS, SOS/PM/APC, ... */ }

fn step(&mut self, b: u8) {
    match (self.state, b) {
        (State::Ground,   0x1B)               => self.state = State::Escape,
        (State::Ground,   0x20..=0x7E)        => self.print(b),
        (State::Escape,   b'[')               => self.state = State::CsiParam,
        (State::CsiParam, b'0'..=b'9' | b';') => self.collect_param(b),
        (State::CsiParam, 0x40..=0x7E)        => self.dispatch_csi(b), // final byte
        // ... OSC string, UTF-8 continuation bytes, C0 controls ...
    }
}

Most of the work isn’t the structure, it’s knowing which sequences a real program leans on. I found these by reading what Claude Code emits and checking against a reference parser:

ECH / DCH / ICH – erase, delete, insert characters in place. Used on nearly every redraw to rewrite partial lines; miss them and the screen fills with stale text.
DA (Device Attributes). Apps send CSI c at startup and block waiting for a reply. Don’t answer and the program hangs – frozen, no output – looking like the terminal is broken. I reply x1b[?6c (VT102).
OSC 7 / 8 / 133 – current directory, hyperlinks, prompt markers. Two of these have different lifetimes: OSC 8 hyperlinks are sticky (every printed cell carries them until a closing sequence), while OSC 133 prompt markers tag only the next cell.
DEC private modes – autowrap, origin mode, focus reporting, synchronized output.

Two bugs from this part are worth keeping, because both came from the gap between the spec and the actual byte stream.

The first showed up as the window title rendering into the grid: “Claude CodClaude Code.” Claude sets its title with ESC ] 0 ; ✳ Claude Code BEL, and my OSC parser recognized three string terminators – one of them 0x9C, the 8-bit C1 String Terminator. But ✳ (U+2733) is e2 9c b3 in UTF-8, and the middle byte is 0x9C. The parser was cutting the title in half on a byte that wasn’t a terminator at all, just the middle of a character; the tail printed into the grid as text. The fix was deleting that terminator: in a UTF-8 terminal the 8-bit C1 controls (0x80-0x9F) can never be honored, because every one of them is a valid continuation byte. Only the 7-bit ESC-prefixed forms are safe.

The second was an underline under every line of the welcome screen. I read the SGR parser three times and fixed three unrelated bugs; the underline stayed. The fourth time I stopped reading code and captured the bytes – ran Claude under script – and the answer was in the trace: 1b 5b 3e 34 3b 32 6d, which is CSI > 4 ; 2 m, the modifyOtherKeys = 2 keyboard handshake. My dispatcher only treated ? as a private marker; for > it fell through to plain attribute dispatch and read “4; 2” as “underline.” A keyboard handshake was being drawn as an attribute on every cell. One line to reject the marker. The habit it left: when a render bug looks like a wrong attribute, read the bytes before the code.

The glyph atlas

Rasterizing a glyph – turning a font outline into a bitmap – is expensive, and a terminal draws the same few hundred glyphs thousands of times a second. So you rasterize each glyph once, keep the bitmap in a texture, and every cell that needs it just samples a rectangle back out. That texture is the atlas. Three decisions shaped mine.

One texture for both monochrome text and color emoji. A mono glyph is one coverage byte per pixel; an emoji is full RGBA. The cheap option is an R8 texture, but then emoji have nowhere to live. I use RGBA8 for everything and store mono glyphs in the alpha channel with zeroed RGB:

GlyphFormat::Alpha => {
    self.cpu_data[dst]     = 0;                 // R
    self.cpu_data[dst + 1] = 0;                 // G
    self.cpu_data[dst + 2] = 0;                 // B
    self.cpu_data[dst + 3] = raster.data[src];  // coverage -> alpha
}
GlyphFormat::Rgba => {
    self.cpu_data[dst..dst + 4].copy_from_slice(&raster.data[src..src + 4]);
}

The fragment shader sees zero RGB and multiplies in the text color; an emoji carries its own RGB and draws as-is. One texture, one sampler, both cases.

Packing: shelf-next-fit. Glyphs come at arbitrary sizes and have to be packed into a 2D texture without overlap. Optimal bin-packing is NP-hard; the shelf heuristic is good enough and fits in about fifty lines. It keeps three numbers – the Y of the current shelf, the tallest glyph on it, and how far right it has filled – and lays glyphs left to right until the row is full, then opens a new shelf above:

pub fn pack(&mut self, w: u32, h: u32) -> Option<(u32, u32)> {
    let (w, h) = (w + GLYPH_PAD * 2, h + GLYPH_PAD * 2);
    if w > self.width { return None; }
    if self.row_extent + w > self.width {          // shelf full -> new shelf
        self.row_baseline += self.row_tallest + GLYPH_PAD;
        self.row_extent = 0;
        self.row_tallest = 0;
    }
    if self.row_baseline + h > self.height { return None; }  // layer full
    let pos = (self.row_extent + GLYPH_PAD, self.row_baseline + GLYPH_PAD);
    self.row_extent += w;
    self.row_tallest = self.row_tallest.max(h);
    Some(pos)
}

Growth and eviction – and the bug that forced both. A 1024×1024 texture holds a lot of glyphs, but not infinitely many, and “the texture is full” is a state you actually reach while scrolling through varied text. The first version just stopped inserting when the packer returned None, and the symptom was bad: after a long scroll, glyphs started rendering blank. The atlas had filled and never recovered.

The fix is two mechanisms. First, it isn’t one texture but a texture array – four 1024×1024 layers. When a layer’s packer can’t fit a glyph, packing moves up to the next layer:

let (layer, x, y) = loop {
    if let Some((x, y)) = self.packers[self.current_layer].pack(w, h) {
        break (self.current_layer, x, y);
    }
    if self.current_layer + 1 >= MAX_LAYERS { return None; } // genuinely full
    self.current_layer += 1;
};

Second, glyphs unused for a while get evicted. Every entry records the frame it was last sampled; once per frame I drop anything untouched for ten frames, and – the part that fixed the scroll bug – if every glyph in a layer is gone, I reset that whole layer and reuse its space:

self.entries.retain(|_, e| now.wrapping_sub(e.last_used_frame) <= MAX_UNUSED_FRAMES);

let mut layer_live = [false; MAX_LAYERS];
for e in self.entries.values() { layer_live[e.placed.layer as usize] = true; }
for layer in 0..MAX_LAYERS {
    if !layer_live[layer] && !self.packers[layer].is_empty() {
        self.packers[layer].reset();               // reclaim the whole layer
        self.cpu_data[layer_range(layer)].fill(0);
        self.layer_dirty[layer] = true;
    }
}

A frame counter instead of an LRU is deliberate: no intrusive linked list, no per-access bookkeeping, and “unused for ten frames” is exactly the question I want answered. Whole-layer reclaim is what actually fixed the bug – without it, scrolling fills layer after layer until the last one is full and new glyphs have nowhere to go. Only if all four layers fill within a single frame, which never happens in practice, does it clear everything and re-rasterize on the next frame.

The CPU keeps a mirror of each layer and a dirty flag per layer, so a frame that only touched layer 0 re-uploads layer 0 alone:

for layer in 0..MAX_LAYERS {
    if !self.layer_dirty[layer] { continue; }
    queue.write_texture(/* ... just this layer ... */);
    self.layer_dirty[layer] = false;
}

Placing glyphs

A shaped glyph knows its own advance width – how far the pen should move before the next one. For a proportional font those advances are fractional, and if you place each glyph at the running sum of advances, two things go wrong: the columns drift out of line with the monospace model, and each glyph lands on a different fractional pixel, so the rasterizer rounds each one differently and the row looks soft.

So I discard the advances. I shape a cell to get the right glyph image, then place it at column * cell_width, where cell_width is round(advance of 'M') in whole physical pixels. The shaper picks the glyph; the grid decides where it goes.

Subpixel positioning comes for free: cosmic-text’s cache key already bins the fractional pen position into a few subpixel variants, so a glyph at x=10.25 and one at x=10.75 are rasterized as distinct images and cached separately. No hand-rolled alignment.

DPI goes in one place. Author every position and size in logical pixels, pass scale_factor in the uniform, and multiply once in the vertex shader:

struct Uniforms {
    screen_size:   vec2<f32>,  // physical px
    scroll_offset: vec2<f32>,  // logical px
    scale_factor:  f32,
    _pad0: f32, _pad1: f32, _pad2: f32, // three scalars, not a vec3: vec3 aligns to 16 in WGSL
}

@vertex
fn vs_main(inst: Instance, @builtin(vertex_index) vi: u32) -> VsOut {
    let p_logical  = inst.pos + UNIT_QUAD[vi] * inst.size - u.scroll_offset;
    let p_physical = p_logical * u.scale_factor;
    let ndc = (p_physical / u.screen_size) * 2.0 - 1.0;
    // ...
}

The _pad fields aren’t cosmetic: a vec3<f32> aligns to 16 bytes in WGSL, so a single vec3 pad would quietly grow the struct to 48 bytes and desync it from the Rust side.

This caused a bug I chased for a while. Text came out slightly blurry, and I spent three rounds on subpixel theories before finding the real cause: a line I’d deleted as unused a few commits earlier, self.scale_factor = renderer.scale_factor(). Nothing read it in that commit; the next commit added a reader; the field was gone, defaulting to 1.0; glyphs rasterized at logical size while the framebuffer was Retina, and the sampler stretched them 2x. One line back, and it was sharp. A field that bridges two subsystems isn’t dead just because the current commit doesn’t read it.

Skipping the shaper for the common case. At 200 columns x 60 rows x 60 fps the renderer touches 720,000 cells a second, and the first version shaped every one of them through cosmic-text, building the cache key with text.to_string() – an allocation per cell, even on a cache hit:

| Grid | Cells/frame | Allocations/second @ 60 fps |
|—-|—-|—-|
| 80×24 | 1,920 | 115,200 |
| 132×40 | 5,280 | 316,800 |
| 200×60 | 12,000 | 720,000 |

A cell is almost always a single codepoint, which needs no shaping at all. So the cache has two tiers: a fast path keyed by (char, FontId) – Copy, no allocation – and the string-keyed path only for combining clusters.

fn shape_char(&mut self, ch: char, weight: Weight, style: Style) -> Option<CharGlyph> {
    let font_id = self.face_id(weight, style)?;        // resolved once per (weight, style)
    if let Some(g) = self.char_cache.get(&(ch, font_id)) {
        return Some(*g);                                // hit: no alloc, no shaping
    }
    let face = self.font_system.get_font(font_id)?;
    let glyph_id = face.rustybuzz().glyph_index(ch)?;   // direct cmap lookup
    let g = CharGlyph { font_id, glyph_id, baseline_y };
    self.char_cache.insert((ch, font_id), g);
    Some(g)
}

The atlas key it builds is identical to the one the shaping path produces, so a glyph rasterized by either path is reused by the other. ASCII – the 99% case – reaches the atlas without touching the shaper or the allocator.

Scrolling by the pixel

A row-based terminal stores scroll position as a line count, so the smallest step is one row and trackpad deltas get rounded into stair-steps. I store it as pixels:

pub struct ScrollState {
    pub offset_y: f32,       // pixels from the top of the content; 0.0 = top
    pub total_size_px: f32,
    pub visible_px: f32,
}

Momentum – the content continuing to move after you lift your fingers – is a velocity sample plus an exponential decay. The sampling has one subtlety: winit sometimes delivers a batch of wheel events in a single cycle, so the time between two of them can be near zero, and delta / time produces a huge bogus velocity. A floor on the time delta fixes it:

pub fn record(prev: Option<Self>, delta: Vec2, now: Instant) -> Self {
    let time_delta = prev
        .map(|v| now.duration_since(v.last_update).as_secs_f32())
        .unwrap_or(MOMENTUM_DECAY_INTERVAL)
        .max(MIN_VELOCITY_TIME_DELTA);     // 4 ms floor
    Self { velocity: delta / time_delta, last_update: now }
}

After you let go, an 8 ms loop decays that velocity until it drops below 1 px/s:

pub fn decay_velocity(velocity: Vec2, elapsed: f32) -> Vec2 {
    velocity * MOMENTUM_DECAY.powf(elapsed / MOMENTUM_DECAY_INTERVAL)
}

The feel is seven constants, tuned by hand: the decay factor 0.968, an 8 ms tick, a 50 px/s threshold to start inertia, a 1 px/s floor to stop it, a 2000 px/s clamp, the 4 ms time floor, and 40 pixels per wheel-mouse line. None of them are clever; the only hard part was knowing they exist and need tuning.

On the GPU there is nothing more to do than the line already in the shader above: scroll_offset is subtracted from every position before projection. No relayout, no atlas change – one uniform write moves the whole screen.

Where it stands now

The wrapper runs Claude Code on this stack now – parser, grid, atlas, renderer, panels, clipboard – and the libraries I started with are gone. I set out wanting control over every layer and an understanding of how terminals work; building each of these pieces gave me both. The next thing I want – several Claude sessions side by side in split panels – is now a feature to add in my own code, not a limit in someone else’s.

If you run Claude Code, or you’ve ever wondered what’s under a terminal, it’s open source.

GitHub: github.com/arttttt/AnyClaude

Like 0

Liked Liked