Skip to content

LlamaIndex

LlamaIndex uses the OpenAI SDK for both embeddings (retrieval) and completions (synthesis). Both paths pick up OPENAI_BASE_URL — so setting the two env vars governs every call, RAG included.

Terminal window
export OPENAI_BASE_URL=https://tappass.example.com/v1
export OPENAI_API_KEY=tp_...
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(
documents,
embed_model=OpenAIEmbedding(model="text-embedding-3-small"),
)
query_engine = index.as_query_engine(
llm=OpenAI(model="gpt-4o-mini"),
)
response = query_engine.query("What does the compliance report say?")

Both the embedding call (retrieval) and the completion call (synthesis) flow through TapPass.

If you’d rather not use env vars:

from llama_index.llms.openai import OpenAI
llm = OpenAI(
model="gpt-4o-mini",
api_base="https://tappass.example.com/v1",
api_key="tp_...",
)
streaming_response = query_engine.query("...")
for chunk in streaming_response.response_gen:
print(chunk, end="", flush=True)

tappass-examples/llamaindex