How to Customize Ollama Models with Modelfiles for Apps and Automation (Part 3)

ollama-modelfile-thumbnail

Build Your Own Ollama Model: Complete Modelfile Guide for Developers (Part 3)

In Part 1 and Part 2, we covered the mental model, installation, and API usage. Now it's time to make Ollama actually yours โ€” by building custom model variants with the Modelfile, no fine-tuning or GPU training required.

If you are building an internal support assistant, private code reviewer, document chatbot, or local AI API, Modelfiles are one of the cheapest ways to make the model behave consistently before you spend money on fine-tuning, managed inference, or bigger hardware. ๐Ÿ’ธ

Ollama Modelfile workflow: base model to custom model to app API

โฑ๏ธ Time to Complete

Around 15 minutes.

๐ŸŽฏ What you'll achieve / learn

  • What a Modelfile is and why it's the easiest way to customize a model
  • How to set a permanent system prompt (give your model a personality or role)
  • How to tune generation parameters like temperature and context size
  • How to save your custom model under its own name and run it like any other
  • Common Modelfile mistakes and how to avoid them

๐Ÿ“‹ Prerequisites

  • Ollama installed and working (ollama --version)
  • At least one base model already pulled from the Ollama model library, e.g. ollama pull llama3.2
  • Basic terminal comfort. If you plan to connect this to an app later, keep the Ollama API docs open too.

๐Ÿค” What Is a Modelfile?

A Modelfile is a plain text file โ€” like a Dockerfile, but for AI models โ€” that tells Ollama: "Start from this base model, then apply these tweaks." No retraining, no GPU-hours, no dataset needed. You're customizing behavior, not weights.

This is the easiest way to:

  • Give a model a fixed personality or role ("You are a sarcastic code reviewer")
  • Lock in generation settings so every call behaves consistently
  • Package a specific prompt template for your team to reuse
  • Standardize AI behavior across CLI, API, LangChain, Open WebUI, or your own backend

[!NOTE] Ollama has an official Modelfile reference. This post focuses on the practical workflow and the parts most developers actually use first. ๐Ÿง 


๐Ÿ“ Step 1: Write Your First Modelfile

Create a file named Modelfile (no extension) anywhere on your machine:

FROM llama3.2

SYSTEM """
You are a senior code reviewer. You are direct, concise, and always
point out security issues first. You never apologize excessively.
"""

PARAMETER temperature 0.3

Each instruction does something specific:

InstructionPurpose
FROMThe base model to build on top of (required)
SYSTEMA permanent system prompt baked into the model
PARAMETERGeneration settings like temperature, context length, etc.
TEMPLATE(Advanced) Custom prompt formatting for the model

Think of this as packaging your best prompt engineering into a reusable local model. Instead of pasting the same 30-line instruction into every script, API call, or automation job, you move the stable behavior into the model definition. ๐Ÿ“ฆ


โš™๏ธ Step 2: Create the Custom Model

From the same directory as your Modelfile:

ollama create code-reviewer -f ./Modelfile

This bakes your system prompt and parameters into a new named model called code-reviewer. Run it like any other model:

ollama run code-reviewer "Review this function: def add(a,b): return a+b"

No need to repeat your system prompt every time โ€” it's now permanent for this model.

[!TIP] Naming convention matters for teams. Use clear names like code-reviewer, support-bot, or sql-helper so anyone on your team knows what each custom model does just from ollama list.

For app work, this also makes your code cleaner. Your backend can call code-reviewer directly through the Ollama generate API instead of sending a giant system prompt on every request.


๐ŸŽ›๏ธ Step 3: Tune Parameters That Actually Matter

These are the parameters worth understanding first:

ParameterWhat it controlsTypical range
temperatureRandomness/creativity. Lower = more focused, higher = more varied0.0 โ€“ 1.0
top_pNucleus sampling โ€” limits word choice to the most likely options0.5 โ€“ 0.95
num_ctxContext window size (how much conversation history it can "remember")2048 โ€“ 8192+
repeat_penaltyDiscourages repeating the same phrases1.0 โ€“ 1.3

Ollama Modelfile parameter tuning and behavior guide

Example โ€” a focused, low-randomness assistant with a larger context window:

FROM llama3.2

SYSTEM """
You are a precise technical writer. Answer only with facts you are
confident about. If unsure, say so explicitly.
"""

PARAMETER temperature 0.2
PARAMETER num_ctx 8192
PARAMETER repeat_penalty 1.1

[!NOTE] Higher num_ctx uses more memory. If you increase it significantly, revisit a hardware guide before buying or upgrading. Useful starting points: NVIDIA GPUs, Apple Mac hardware, and my local Ollama hardware posts on sabbirz.com. ๐Ÿงฎ

๐Ÿงช Practical parameter recipes

Use caseSuggested settingsWhy
Code review assistanttemperature 0.2, repeat_penalty 1.1Focused, less chatty, fewer creative guesses
Customer support bottemperature 0.3, top_p 0.9Consistent but still natural
Brainstorming assistanttemperature 0.8, top_p 0.95More variation and idea generation
Documentation writertemperature 0.2, num_ctx 8192More room for source material and fewer surprises

๐Ÿงฉ Step 4: Use a Custom Template (Advanced)

The TEMPLATE instruction controls exactly how your prompt and system message get formatted before being sent to the model. Most users never need to touch this โ€” the base model's default template is usually correct. Only override it if you know the exact prompt format your base model expects. Check the model page on Ollama's library, and if the model is based on a public family, also check the provider docs such as Meta Llama or Mistral AI. ๐Ÿ”ฌ

FROM llama3.2

TEMPLATE """{{ if .System }}System: {{ .System }}{{ end }}
User: {{ .Prompt }}
Assistant:"""

SYSTEM "You are a helpful assistant who answers in bullet points only."

[!WARNING] Getting the template wrong can silently degrade output quality without any error message. If your custom model starts behaving strangely after adding a TEMPLATE, remove it first to confirm whether that's the cause.


๐Ÿ’ผ Where Modelfiles Make Business Sense

This is where Modelfiles become more than a fun local AI feature. They are useful when repeated AI behavior has business value:

Ollama Modelfile business use cases for code review support bots RAG apps and API prototypes

WorkflowWhy a Modelfile helpsRelated tooling
Internal code reviewerKeeps review style, security checks, and tone consistentGitHub, GitLab, SonarQube
Private support assistantBakes in support policy and escalation rulesZendesk, Freshdesk, Intercom
Local RAG appPairs a stable assistant role with your private documentsQdrant, Chroma, LangChain
Developer API prototypeLets your app call a named model with predictable behaviorFastAPI, Node.js, Docker
Cost-control experimentsTests whether local inference can replace some cloud API callsOpenAI API pricing, AWS Bedrock, Google Vertex AI

For high-volume internal tools, this matters because every repeated system prompt adds tokens, latency, and operational noise. A Modelfile will not magically make local AI free, but it can make your local LLM workflow easier to package, test, and compare against cloud APIs. ๐Ÿ“ˆ


๐Ÿ” Step 5: Inspect, Update, and Remove Custom Models

Check exactly what's baked into a model:

ollama show code-reviewer --modelfile

Update it after editing your Modelfile โ€” just re-run create with the same name:

ollama create code-reviewer -f ./Modelfile

Remove it when you no longer need it:

ollama rm code-reviewer

[!IMPORTANT] Removing a custom model only removes your customization layer โ€” the base model (llama3.2 in this example) stays downloaded and untouched.


๐Ÿšง Common Mistakes

MistakeFix
Forgetting FROMEvery Modelfile needs a base model as the first line
Multi-line SYSTEM without triple quotesWrap multi-line system prompts in """..."""
Expecting PARAMETER changes to retrain the modelParameters only change generation behavior, not knowledge
Overriding TEMPLATE without checking the base model's expected formatLeave TEMPLATE untouched unless you have a specific reason
Using one model name for every experimentCreate separate names like support-bot-v1 and support-bot-strict so you can compare behavior
Treating local AI as automatically secureIf other machines can reach your Ollama server, add network controls, auth, or a reverse proxy

๐Ÿ“‹ Quick Reference Cheatsheet

Ollama Modelfile command cheatsheet

TaskCommand
Create a custom modelollama create <name> -f ./Modelfile
Run your custom modelollama run <name>
View baked-in Modelfileollama show <name> --modelfile
Delete a custom modelollama rm <name>
List all models (base + custom)ollama list

โœ… A Simple Production Checklist

Before you use a custom Ollama model in a real app, check these:

  • ๐Ÿง  The SYSTEM prompt is specific enough for the job
  • ๐ŸŽ›๏ธ Parameters are tuned for the workflow, not copied randomly
  • ๐Ÿ” The Ollama API is not exposed publicly without protection
  • ๐Ÿงพ You can reproduce the model from a committed Modelfile
  • ๐Ÿ“Š You compare local latency, hardware cost, and cloud API pricing honestly
  • ๐Ÿงช You test the custom model with real examples, not just one happy-path prompt

๐ŸŽ Final Thoughts

The Modelfile is the fastest way to turn a generic base model into something that feels purpose-built for your team โ€” a code reviewer, a support assistant, a SQL helper, or a private automation agent โ€” all without touching a single training script. Start with FROM + SYSTEM, get that working, then layer in PARAMETER tuning once you know what behavior you want to adjust. ๐Ÿš€

Next up in this series: connecting your custom model to a real RAG pipeline so it can answer questions using your own documents.

Related posts