Understanding System Prompts

In this article we will explore why does the system prompt produce a different kind of output than a user message, even when the words are similar?

User messages and system prompts are not the same thing

You could ask the model to write like an academic by putting it in the user message - "rewrite this in an academic style" works fine. So why use the system prompt at all?

Because the model treats them differently.

User messages are things a human said. The model may agree with them, push back on them, or interpret them charitably. System prompts are instructions from the operator - the person who deployed the model. They establish the frame in which the conversation happens. The model treats them with a higher degree of trust and persistence.

In practice, the difference is most visible in follow-through. Put "academic style" in a user message and the model will lean that way for a turn or two, then drift back toward its defaults if you give it a sequence of short prompts without reinforcement. Put it in the system prompt and the persona sticks - even five turns deep, even when the user's instruction says nothing about tone.

The system prompt is also invisible to the user. It shapes the experience without being part of the conversation the user sees. That's useful for writing tools where you want the persona to be consistent without requiring the user to re-state it on every turn.

What Inkwell's old system prompt was doing

Until now, the system prompt has been:

You are a writing assistant. Help the user improve their draft.
Return only the revised text — no preamble, no explanations.

This is fine as a default. The model understands "writing assistant" and will produce reasonable revisions. But it's deliberately underspecified - the model fills in the gaps with its own defaults, which means the output reflects Claude's general-purpose sense of good writing: clear, neutral, a bit formal, reasonably accessible.

That's not wrong, but it's also not anything in particular. It doesn't write the way a journalist writes. It doesn't use the vocabulary an academic would reach for. It doesn't produce the structured, example-driven prose an engineer expects.

A stronger persona changes the output in ways that are immediately legible - not just stylistically, but structurally. The model doesn't just adopt different word choices; it makes different decisions about argument order, sentence complexity, and what counts as a useful level of detail.

The three modes

Inkwell now has three system prompts, one per mode. Here they are, with a note on what each one is actually doing:

Academic:

You are an academic writing assistant.
Revise the draft with scholarly rigour: use a formal register,
precise domain vocabulary, and a clear argument structure.
Favour complex sentences where they add clarity, not obscurity.
Return only the revised text — no preamble, no explanations.

"Precise domain vocabulary" and "clear argument structure" are doing the work here. The model already knows what academic prose looks like — this just establishes that you want that register, not a softened version of it. The note about complex sentences is deliberate: without it, many models over-simplify when you say "clear", producing prose that's clear but thin.

Journalist:

You are a seasoned copy editor at a national newspaper.
Revise the draft for maximum clarity and reader impact:
active voice, tight sentences, a strong opening that earns the reader's attention.
Cut jargon; keep every word accountable.
Return only the revised text — no preamble, no explanations.

"Keep every word accountable" is not a standard instruction, but the model responds to it in the right way — it cuts hedges and qualifiers rather than just shortening sentences mechanically. The framing as a copy editor (rather than a journalist) is also intentional: an editor's job is to improve what's already there, not to report fresh information. The persona fits the task.

Engineer:

You are a technical writing editor.
Revise the draft for precision and usability:
concrete examples over abstractions, consistent terminology,
structured prose that scans well (short paragraphs, lists where helpful).
Define any term that a competent engineer outside the domain might not know.
Return only the revised text — no preamble, no explanations.

The last sentence — "define any term that a competent engineer outside the domain might not know" — is one of those instructions where the model's background knowledge matters. It can make a reasonable call about what's domain-specific jargon without you having to enumerate the terms.

The terminal instruction

All three prompts end the same way: Return only the revised text — no preamble, no explanations.

This is non-negotiable. Claude's default behaviour — particularly in a helpful assistant framing — is to acknowledge the request, produce the output, and then add a brief note about what it did. That's fine in a chat interface. In a writing tool where the completion is rendered directly, the user gets "Here's the revised version:" as the first line of every output, which is immediately irritating.

The terminal instruction is a contract. The system prompt is where you establish output format contracts, because the model treats operator instructions with more persistence than user-level requests.

Switching modes mid-conversation

Here's something worth trying: start a draft in Academic mode, read the result, switch to Journalist, and submit another instruction.

The second revision will read like a journalist rewrote the academic version - because that's exactly what happened. The system prompt changed, but the conversation history didn't. The model sees the full thread: original draft, academic revision, new instruction - and now the system prompt tells it to approach that thread as a copy editor.

This creates an interesting affordance: the model's persona is stateless (it comes from the system prompt, which is provided fresh each call), while the content it's working on is stateful (it's reconstructed from the revisions table). The user can drag the same content through three different editorial lenses in sequence, and each lens sees everything the previous ones produced.

Whether that's a feature or a footgun depends on what you're building. For Inkwell it's a feature. For a product where the persona should be fixed per session, you'd store the mode on the draft rather than on each revision.

How Inkwell implements this

The handler now resolves the system prompt before making the API call:

var systemPrompts = map[string]string{
    "academic":   "You are an academic writing assistant. ...",
    "journalist": "You are a seasoned copy editor at a national newspaper. ...",
    "engineer":   "You are a technical writing editor. ...",
}

func resolveSystemPrompt(mode string) (string, string) {
    if p, ok := systemPrompts[mode]; ok {
        return mode, p
    }
    return defaultMode, systemPrompts[defaultMode]
}

resolveSystemPrompt returns both the mode name and the prompt text. The mode name is what gets stored on the revision row; the prompt text goes into the API call. An unrecognised mode falls back to the default — lenient, because a stale client shouldn't break things.

The mode is stored on each revision, not on the draft:

ALTER TABLE revisions ADD COLUMN mode TEXT NOT NULL DEFAULT 'academic';

This is the right granularity. A draft might have academic revisions from yesterday and journalist revisions from today. The thread can show which persona produced each turn - which is exactly what the UI does, displaying a small mode badge next to each revision's prompt.

The client sends the mode with each revision request:

async requestRevision(id, prompt, mode) {
    const res = await fetch(`/api/drafts/${id}/revisions`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body:   JSON.stringify({ prompt, mode }),
    });
    // ...
}

And the revision response carries the resolved mode back, so the badge reflects what the server actually used (which may differ if the client sent an unrecognised value):

return &ReviseResponse{
    RevisionID: rows[0].ID,
    Completion: completion,
    Mode:       mode,   // resolved, not raw from request
    Turn:       turn,
}, nil

What to notice when you run it

Take any paragraph and run it through all three modes with the same instruction: "Make this clearer and more authoritative."

The surface instruction is identical. The outputs will be structurally different: Academic will lengthen and add precision; Journalist will shorten and sharpen the lead; Engineer will flatten the prose and may introduce a list where there was a paragraph.

The persona isn't just dressing. It changes what the model considers a "better" version of the same text.

Understanding System Prompts

User messages and system prompts are not the same thing

What Inkwell's old system prompt was doing

The three modes

The terminal instruction

Switching modes mid-conversation

How Inkwell implements this

What to notice when you run it

Comments

Building with AI

Streaming Responses

More from this blog

Streaming Responses

Multi Turn Conversations

Anatomy of a message

Building with AI

Command Palette

User messages and system prompts are not the same thing

What Inkwell's old system prompt was doing

The three modes

The terminal instruction

Switching modes mid-conversation

How Inkwell implements this

What to notice when you run it

Comments

Building with AI

Streaming Responses

More from this blog