AI Agents

AI agents can use hty to interact with any terminal program the same way a human would — by reading the rendered screen and typing. Because hty exposes every operation as a plain shell command, your agent doesn’t need a special SDK: it runs programs, takes screenshots of the terminal, and sends keystrokes using the same primitives it would use for any other shell task.

The agent loop

Every agent-driven terminal interaction reduces to the same three questions: “is the program ready?”, “what is it showing me?”, and “what should I type next?” hty packages the first two into a single round-trip so each tool call by the agent corresponds to one observable change in the session.

The fused form of hty send and hty run is the fast path:


# One call: type input, wait for the program to be ready, return the screen.
hty send review --text "y\n" --snapshot --wait-until-idle 200 --json

That single command sends y\n, blocks until the rendered screen has been quiet for 200 ms, then returns a JSON payload with the post-action snapshot plus matched / elapsed_ms. The agent now has everything it needs to decide the next action: no second tool call required, and no race between “I sent input” and “I read the screen too early.”

The slower but more explicit three-step form (send → wait → snapshot) is still available for interactive exploration, and it’s what you reach for when the condition you want to wait on comes much later than the thing you typed.

A complete example: create-next-app

The following script drives create-next-app’s interactive questionnaire, making deliberate choices at each prompt:


# Launch the scaffolder and wait for the first question
hty run --name app --snapshot \
  --wait-until-text "What is your project named?" --timeout 30000 \
  -- npx create-next-app
 
# Answer "my-app" + Enter, wait for the next prompt
hty send app --text "my-app\n" --snapshot \
  --wait-until-text "TypeScript?" --timeout 10000
 
# Accept TypeScript (Enter) and wait for the next one
hty send app --key enter --snapshot \
  --wait-until-text "ESLint?" --timeout 5000
 
# …and so on for each prompt. Finally, wait for the scaffolder to finish.
hty send app --key enter --snapshot --wait-until-exit --timeout 120000

Each step is a single round-trip that returns the new screen state, so the agent can inspect .snapshot.buffer (or .snapshot.lines) to verify the prompt looked the way it expected before committing to the next answer.

See the README quickstart for the canonical git add -p walkthrough, which uses the same pattern against an even simpler program.

Reading the screen

With --json, the fused form already includes the snapshot in its response. When you want a standalone read — e.g. between user turns, or to re-inspect state after an external event — call hty snapshot --json directly:


# Full rendered buffer as a single string
hty snapshot my-session --json | jq -r '.snapshot.buffer'
 
# Individual line by index (0-based)
hty snapshot my-session --json | jq -r '.snapshot.lines[0]'
 
# Test for a specific string
hty snapshot my-session --json | jq -r '.snapshot.buffer' | grep -q "error" && echo "error found"
 
# Cursor position (1-indexed)
hty snapshot my-session --json | jq '.snapshot.cursor_row, .snapshot.cursor_col'

hty snapshot without --json prints plain text; pair it with --json whenever you want to pipe into jq or read structured metadata like status, cursor_row, or the per-cell cells grid.

Waiting for the right moment

Every wait condition that hty wait supports is also available on fused run / send as --wait-until-*. Pick the mode that matches what you’re waiting for:


# Wait for a specific string (e.g., a shell prompt)
hty wait session --text "$ " --timeout 5000
 
# Wait for a POSIX extended regex
hty wait session --regex "(error|Error|ERROR)" --timeout 5000
 
# Wait for the rendered screen to stop updating
hty wait session --idle 300 --timeout 10000
 
# Wait for the program to exit
hty wait session --exit --timeout 30000

Use --text when you know the exact string that signals readiness. Use --regex when you need to match one of several possibilities (e.g. “success” or “error” lines). Use --idle when the program produces output whose exact content you don’t know in advance — it measures the rendered screen, not raw I/O, so programs that redraw continuously (like top or spinners) will never go idle. Use --exit to confirm the program has finished before checking results.

Handling timeouts

hty wait exits with code 3 when the timeout expires. Pass --json to get a structured result back so the agent can distinguish success from timeout without checking exit codes:


hty wait my-session --text "ready" --timeout 5000 --json
# → { "ok": true,  "matched": true,  "mode": "text", "elapsed_ms": 284 }
# or { "ok": false, "matched": false, "elapsed_ms": 5000, ... } on timeout

On a fused send --snapshot, the same fields appear at the top level of the response alongside the snapshot, so the agent sees both “did the wait match?” and “what does the screen look like now?” in one payload.

Typing source code or regex: `--raw-text`

--text processes C-style escapes: \n becomes a newline, \t a tab, \\ a literal backslash. That’s usually what you want for “type a command and press Enter.”

But when an agent pastes source code, a regex, or any content that contains backslashes it does not want interpreted, use --raw-text instead — it sends the UTF-8 bytes verbatim:


# With --text, the \n inside the regex would turn into a real newline.
hty send my-session --raw-text "const re = /\\n/;"

Prefer --raw-text whenever shell quoting of backslashes starts to look hairy.

Mouse input

A handful of TUIs (btop, lazygit, k9s) are far easier to drive with clicks than with keyboard navigation. hty send supports click and scroll events — but only against apps that have opted into mouse mode via the usual CSI ?1000/1002/1003 h sequences. The snapshot payload exposes the current state so the agent can check before it clicks:


# Is mouse input available in this session?
hty snapshot lazygit --json | jq '.snapshot.mouse.enabled'
 
# Click row 5, column 12 (1-indexed, same coords as the snapshot grid)
hty send lazygit --click 5 12

If the app has no mouse mode enabled, the command fails with error: target app has not enabled mouse input and exits non-zero. Agents should prefer keyboard navigation as the primary primitive and fall back to clicks only when the target explicitly supports it.

Compound input: `--seq`

For sequences that mix text, named keys, and pauses, --seq sends the whole thing in one call:


hty send my-session --seq '"git status" enter 300ms "q"'

Quoted strings are text, durations (200ms, 1s) are pauses, bare words are key names. Often cleaner than chaining three separate send calls.

Programs hty works with

hty works with any program that runs in a terminal. A few common ones:

vim, neovim — modal editors; send esc with --key esc, enter insert mode with --text "i", save and quit with --text ":wq\n"
git add -p — interactive staging; respond to each hunk prompt with y, n, q, etc.
psql — interactive SQL prompt; send queries as text, wait for the =# prompt between statements
gh auth login — interactive auth flows that require reading prompts and typing responses
btop, htop — monitoring TUIs; useful for observing system state without modifying it
create-next-app and other interactive scaffolding tools — answer the setup questionnaire programmatically

Use --name to give sessions descriptive names. Your agent code is much easier to debug when sessions are named review, psql-prod, or auth-flow rather than identified by a UUID prefix.

Avoiding session leaks with `--remove`

Long-running agents often forget to hty kill sessions once they’ve finished with them. The registry then fills with dozens of zombie exited entries, each holding a log file on disk, and hty list becomes useless as a live-state view. The fix is opt-in on hty run:


# Tie session lifetime to the child. The record is auto-deleted when the
# child exits, no matter how (success, failure, signal, or hty kill).
hty run --remove --name app --snapshot --wait-until-text "Ready" \
  --timeout 30000 -- ./migrate.sh

With --remove, every short-lived tool call the agent makes cleans up after itself. Pair it with --wait-until-exit for the common “launch, wait for it to finish, read the final screen, move on” pattern — the agent gets the exit code and the last snapshot in one round-trip, and the session is gone by the time the next step runs. Keep sessions without --remove when you genuinely want the record to survive for later hty replay or hty logs inspection.

Streaming output with `run --attach`

Harnesses that support backgrounded processes (Claude Code, Cursor, and friends) stream the background process’s stdout straight to the model as it arrives — no polling, no missed frames. hty run --attach pairs naturally with that: it spawns a session, attaches to it, and writes the PTY’s live output to stdout until the child exits.


# One-shot: foreground a command in a fresh PTY, tie the session's lifetime
# to it, and block until it exits. Output streams live to the harness.
hty run --attach --remove -- npm test

This is the recommended shape for agents that want to “watch a long command run” without writing a snapshot loop. Dropping --remove leaves the session in hty list afterwards so you can hty replay it; adding it makes the whole invocation fully self-cleaning.

--attach is mutually exclusive with --snapshot / --wait-until-*; use one style or the other per call. Use fused --snapshot for “I want the final screen as JSON”; use --attach when you want the stream.