LlamaIndex
LlamaIndex uses the OpenAI SDK for both embeddings (retrieval) and completions (synthesis). Both paths pick up OPENAI_BASE_URL — so setting the two env vars governs every call, RAG included.
export OPENAI_BASE_URL=https://tappass.example.com/v1export OPENAI_API_KEY=tp_...Query a vector index
Section titled “Query a vector index”from llama_index.core import VectorStoreIndex, SimpleDirectoryReaderfrom llama_index.llms.openai import OpenAIfrom llama_index.embeddings.openai import OpenAIEmbedding
documents = SimpleDirectoryReader("./data").load_data()index = VectorStoreIndex.from_documents( documents, embed_model=OpenAIEmbedding(model="text-embedding-3-small"),)
query_engine = index.as_query_engine( llm=OpenAI(model="gpt-4o-mini"),)response = query_engine.query("What does the compliance report say?")Both the embedding call (retrieval) and the completion call (synthesis) flow through TapPass.
Explicit api_base
Section titled “Explicit api_base”If you’d rather not use env vars:
from llama_index.llms.openai import OpenAI
llm = OpenAI( model="gpt-4o-mini", api_base="https://tappass.example.com/v1", api_key="tp_...",)Streaming
Section titled “Streaming”streaming_response = query_engine.query("...")for chunk in streaming_response.response_gen: print(chunk, end="", flush=True)