Ollama

Ollama runs open-weight models locally (Llama, Qwen, DeepSeek, Gemma, …). Pair with TapPass for a fully on-prem governance loop — no data leaves your network.

Requirements

Server-side (the TapPass server must be able to reach the Ollama daemon):

OLLAMA_HOST=http://ollama.internal:11434

Pull whatever models you need on the Ollama host:

ollama pull llama3.1:8b
ollama pull qwen2.5:14b

Option A — SDK

from tappass import Agent

agent = Agent("https://tappass.example.com", "tp_...")
response = agent.chat("Hello", model="ollama/llama3.1:8b")

Option B — OpenAI SDK, zero-code

export OPENAI_BASE_URL=https://tappass.example.com/v1
export OPENAI_API_KEY=tp_...

from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
    model="ollama/llama3.1:8b",
    messages=[{"role": "user", "content": "Hello"}],
)

Why this matters

Zero egress — inference stays on your hardware.
Audit-complete — every call still hits TapPass, so you still get the full audit trail, detections, and policy decisions.
Works offline — useful for airgapped environments paired with the license server.

What’s supported

Chat completions (streaming + non-streaming)
Embeddings
Tool calls (on models that support them: Llama 3.1+, Qwen 2.5+, …)