How to Reuse ChatGPT Prompts in Ollama

Published March 24, 2026 · Updated March 24, 2026 · 4 min read

Ollama's API is designed to be OpenAI-compatible, which means ChatGPT prompts mostly work in Ollama with just a model name change. The exceptions are system prompt handling on older models, and the fact that local models behave differently from GPT-4 even with identical prompts.

API compatibility

Ollama's chat endpoint accepts the same messages array format as OpenAI:

# OpenAI format — works directly in Ollama for most models
{
  "model": "llama3.2",        # change from "gpt-4o" to local model name
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain tokenization briefly."}
  ]
}

For basic text completion with modern models (Llama 3, Mistral, Gemma), this works without modification beyond the model name. The Ollama API translates the OpenAI-format messages into the model's native instruction format automatically.

System prompt handling differences

Modern models (Llama 3.x, Mistral 0.3+, Gemma 2) handle system prompts in the messages array well. Older models (Llama 2, early Mistral) were fine-tuned with instruction formats that expect the system prompt embedded directly in the first user message:

# Older Llama 2 native format
[INST] <>
You are a helpful assistant.
<>

Explain tokenization briefly. [/INST]

Ollama handles this translation automatically for the models it knows about. If a model isn't following system instructions reliably, try embedding the system content in the first user message instead of using a separate system message.

Quality differences vs GPT-4

The bigger practical difference is model quality. Llama 3.2 (8B parameters), even with an identical prompt, produces different output than GPT-4o. Local models from Ollama are generally competitive for straightforward tasks and noticeably weaker on complex reasoning, long-context tasks, and instruction following with many constraints.

Prompts that rely on GPT-4's ability to infer intent from ambiguous instructions may need to be made more explicit for local models. What works as "summarise this" for GPT-4 may need to be "summarise this in 3 bullet points, each under 20 words" for a local model to produce consistently useful output.

Model selection matters: Ollama supports many models with different strengths. For code-heavy tasks, CodeLlama or Deepseek-Coder often outperform general-purpose models at their parameter size. For instruction-following, Mistral and Llama 3 are usually stronger than older alternatives. Match the model to the task before assuming the prompt is the problem.

Benefits that justify the conversion effort

Running prompts locally through Ollama has real advantages: no API costs for development and testing, no data leaving your machine (important for sensitive content), no rate limits, and faster iteration during prompt development. The cost is lower quality on complex tasks and the need to tune prompts for the specific local model you're using.