Multi Turn Conversations

In the last article we ended with a note about the messages array: the model has no memory between calls — the array is the memory.

That sentence is the whole mechanic of multi-turn conversations. This article builds it out.

Side note: code for this article is in Inkwell revision: d53d235

The problem with a single call

Inkwell's handler accepted a draft and an instruction, packed them into one user message, called Claude, and returned the result. One call, one response, done.

That's fine for a one-shot improvement. But writing rarely works that way. You ask for something shorter, read the result, decide it's now too terse, and ask for something warmer. Each instruction builds on the previous state. The model needs to see that history to respond correctly — "now make it warmer" only makes sense if the model knows what it produced on the previous turn.

The solution isn't special API. There's no "continue this conversation" flag. It's just the array. Every call to /v1/messages is stateless. You're the one who keeps the history and sends it back each time.

What the array looks like across turns

Say the user starts with this draft:

Meeting notes from Tuesday. Covered Q3 targets. John said the numbers look fine. We'll follow up next week.

Turn 1 — user asks: "Make this more professional."

The request goes to Claude with one user message:

[user]: Meeting notes from Tuesday. Covered Q3 targets...

Make this more professional.

Claude responds:

[assistant]: Tuesday's meeting addressed Q3 performance targets.
John confirmed the figures are satisfactory. A follow-up is
scheduled for next week.

Turn 2 — user asks: "Add a bit more warmth."

Now you need Claude to understand this request in context — that it's asking to modify the previous output, not the original draft. So you send the full history:

[user]:      Meeting notes from Tuesday... Make this more professional.
[assistant]: Tuesday's meeting addressed Q3 performance targets...
[user]:      Add a bit more warmth.

The model sees the full thread. It knows what it wrote. "Warmth" is now unambiguous.

Turn 3 — user asks: "Actually, shorter." The array grows by two more elements and the next call carries four messages. And so on.

How Inkwell stores it? Inkwell stores this under the revisions table.

The revisions table is exactly a persisted version of this array. Each row is one exchange: the user's prompt and the assistant's completion.

CREATE TABLE revisions (
    id         INTEGER PRIMARY KEY AUTOINCREMENT,
    draft_id   INTEGER NOT NULL REFERENCES drafts(id),
    prompt     TEXT    NOT NULL,
    completion TEXT    NOT NULL,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

When a new revision is requested, the handler loads every prior revision for the draft, ordered by id (which is insertion order), and replays them as the conversation history before appending the new prompt.

The API now has three endpoints instead of one:

POST /api/drafts — saves the initial draft content; no model call
GET /api/drafts/{id} — returns the draft and all its revisions
POST /api/drafts/{id}/revisions — the interesting one

Building the messages array

This is the piece worth looking at closely. Given the stored history, we need to reconstruct what the model would have seen if it had been in an ongoing conversation the whole time:

func buildMessages(content string, history []*domain.Revision, newPrompt string) []anthropic.MessageParam {
    // First call: no prior history, one user message.
    if len(history) == 0 {
        return []anthropic.MessageParam{
            anthropic.NewUserMessage(anthropic.NewTextBlock(content + "\n\n" + newPrompt)),
        }
    }

    messages := make([]anthropic.MessageParam, 0, len(history)*2+1)

    // First turn: original draft + first prompt, paired with first completion.
    messages = append(messages,
        anthropic.NewUserMessage(anthropic.NewTextBlock(content+"\n\n"+history[0].Prompt)),
        anthropic.NewAssistantMessage(anthropic.NewTextBlock(history[0].Completion)),
    )

    // Middle turns: just prompt → completion pairs.
    for _, rev := range history[1:] {
        messages = append(messages,
            anthropic.NewUserMessage(anthropic.NewTextBlock(rev.Prompt)),
            anthropic.NewAssistantMessage(anthropic.NewTextBlock(rev.Completion)),
        )
    }

    // Final turn: the new prompt, awaiting a reply.
    messages = append(messages, anthropic.NewUserMessage(anthropic.NewTextBlock(newPrompt)))
    return messages
}

The first user message packs the original draft content together with the first instruction. Every revision after that is a clean prompt+completion pair. The new prompt goes last, unanswered.

The full conversation sent to Claude for turn 3 in our example would be four messages: user (draft + prompt 1), assistant (completion 1), user (prompt 2), assistant (completion 2) — plus the new user message for turn 3.

What this costs

The token count grows with every turn. Turn 1 sends the draft plus the instruction. Turn 2 sends the draft, instruction 1, completion 1, instruction 2. By turn 5, you're sending the draft plus four full revisions plus the new prompt. Input tokens compound.

Inkwell logs this now — the revision handler records both tokens_in and tokens_out per turn:

log.Info().
    Int("draft_id", draftID).
    Int("revision_id", rows[0].ID).
    Int("turn", turn).
    Int("tokens_in", int(msg.Usage.InputTokens)).
    Int("tokens_out", int(msg.Usage.OutputTokens)).
    Msg("revision saved")

Run a five-turn session and watch tokens_in climb. It's a useful thing to see once. Later in the series we will address it with prompt caching — the first call writes the static parts of the context to a cache, and subsequent calls read from it at a fraction of the cost.

One thing to notice

Every call to Claude is still stateless. Inkwell isn't maintaining any server-side session with the API. Each request to POST /api/drafts/{id}/revisions hits /v1/messages fresh, with the full conversation reconstructed from the database. You could restart the server, and the next revision request would reconstruct the same history and continue seamlessly.

This is sometimes described as a limitation of the API, but it's also a kind of gift. Your conversation history is data you own, in a place you control. You can inspect it, edit it, branch it, summarise it, or throw away the middle turns if the context window gets too large. The API doesn't know or care; it just sees whatever array you send.

Multi Turn Conversations

The problem with a single call

What the array looks like across turns

Building the messages array

What this costs

One thing to notice

Comments

Building with AI

Understanding System Prompts

More from this blog

Streaming Responses

Understanding System Prompts

Anatomy of a message

Building with AI

Command Palette

The problem with a single call

What the array looks like across turns

Building the messages array

What this costs

One thing to notice

Comments

Building with AI

Understanding System Prompts

More from this blog