Ollama
Ollama runs open-weight models locally (Llama, Qwen, DeepSeek, Gemma, …). Pair with TapPass for a fully on-prem governance loop — no data leaves your network.
Requirements
Section titled “Requirements”Server-side (the TapPass server must be able to reach the Ollama daemon):
OLLAMA_HOST=http://ollama.internal:11434Pull whatever models you need on the Ollama host:
ollama pull llama3.1:8bollama pull qwen2.5:14bOption A — SDK
Section titled “Option A — SDK”from tappass import Agent
agent = Agent("https://tappass.example.com", "tp_...")response = agent.chat("Hello", model="ollama/llama3.1:8b")Option B — OpenAI SDK, zero-code
Section titled “Option B — OpenAI SDK, zero-code”export OPENAI_BASE_URL=https://tappass.example.com/v1export OPENAI_API_KEY=tp_...from openai import OpenAI
client = OpenAI()response = client.chat.completions.create( model="ollama/llama3.1:8b", messages=[{"role": "user", "content": "Hello"}],)Why this matters
Section titled “Why this matters”- Zero egress — inference stays on your hardware.
- Audit-complete — every call still hits TapPass, so you still get the full audit trail, detections, and policy decisions.
- Works offline — useful for airgapped environments paired with the license server.
What’s supported
Section titled “What’s supported”- Chat completions (streaming + non-streaming)
- Embeddings
- Tool calls (on models that support them: Llama 3.1+, Qwen 2.5+, …)