Skip to content

Ollama

Ollama runs open-weight models locally (Llama, Qwen, DeepSeek, Gemma, …). Pair with TapPass for a fully on-prem governance loop — no data leaves your network.

Server-side (the TapPass server must be able to reach the Ollama daemon):

Terminal window
OLLAMA_HOST=http://ollama.internal:11434

Pull whatever models you need on the Ollama host:

Terminal window
ollama pull llama3.1:8b
ollama pull qwen2.5:14b
from tappass import Agent
agent = Agent("https://tappass.example.com", "tp_...")
response = agent.chat("Hello", model="ollama/llama3.1:8b")
Terminal window
export OPENAI_BASE_URL=https://tappass.example.com/v1
export OPENAI_API_KEY=tp_...
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="ollama/llama3.1:8b",
messages=[{"role": "user", "content": "Hello"}],
)
  • Zero egress — inference stays on your hardware.
  • Audit-complete — every call still hits TapPass, so you still get the full audit trail, detections, and policy decisions.
  • Works offline — useful for airgapped environments paired with the license server.
  • Chat completions (streaming + non-streaming)
  • Embeddings
  • Tool calls (on models that support them: Llama 3.1+, Qwen 2.5+, …)