LLM and embedding providers¶
Purpose¶
Scribe IQ separates LLM provider (chat, pre-meeting summary, note generation) from embedding provider (vector encoding of notes for RAG). Both are configurable through environment variables read by the typed Settings layer; both are surfaced in GET /health so the frontend and operators can see exactly which capabilities are configured.
Pick the provider posture that matches the deployment:
- Local demo:
LLM_PROVIDER=groqwithEMBEDDING_PROVIDER=openaiis the simplest path and whatQUICKSTART.mdassumes. - Institutional Azure tenancy:
LLM_PROVIDER=azure_openaiandEMBEDDING_PROVIDER=azure_openai, pointing at your Azure deployment. - AWS-native deployment:
LLM_PROVIDER=bedrockandEMBEDDING_PROVIDER=bedrock, configured against your Bedrock account.
The provider choice is per-deployment, not per-request. Switching the LLM provider does not require re-embedding; switching the embedding provider does (see Embedding rebuild workflow below).
Health surface¶
GET /health returns a JSON document with provider configuration so the frontend and operators can confirm the running posture without reading the environment:
| Field | Meaning |
|---|---|
llm_provider |
One of groq, azure_openai, bedrock (or unset if no LLM provider is configured) |
llm_configured |
Boolean — whether credentials for the selected LLM provider are present |
embedding_provider |
One of openai, azure_openai, bedrock, none |
embedding_configured |
Boolean — whether credentials for the selected embedding provider are present |
embedding_model |
Resolved model or deployment used for embeddings, or null when none is configured |
embedding_dim |
Vector dimension expected by the backend and pgvector column |
When any capability field is missing or false, the dependent UI surface degrades visibly (chat 503, meeting-prep placeholder, generate-note disabled) rather than silently failing. /health does not count populated embedding rows; chat surfaces that separately when retrieval is attempted.
Groq (default demo)¶
Groq is the recommended LLM provider for local synthetic demos and development. It is fast, inexpensive, and produces narrative quality sufficient for the pre-meeting summary, structured note generation, and chat completion routes. It does not provide embeddings; pair Groq with OpenAI (or Azure OpenAI / Bedrock) for embeddings.
Azure OpenAI¶
Azure OpenAI is the natural fit for institutions with an existing Azure tenancy and a BAA-eligible Azure deployment. It is configured per-deployment-name, not per-model:
LLM_PROVIDER=azure_openai
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com
AZURE_OPENAI_API_VERSION=2024-08-01-preview
AZURE_OPENAI_CHAT_DEPLOYMENT=<deployment-name-for-chat-model>
AZURE_OPENAI_JSON_DEPLOYMENT=<deployment-name-for-json-capable-model> # optional; falls back to chat deployment
# Embeddings (independent of LLM choice)
EMBEDDING_PROVIDER=azure_openai
AZURE_EMBEDDING_DEPLOYMENT=<deployment-name-for-embedding-model>
Legacy aliases¶
Earlier configurations used AZURE_OPENAI_DEPLOYMENT (treated as the chat deployment) and AZURE_OPENAI_MINI_DEPLOYMENT (treated as the JSON-capable deployment). These names are still accepted as aliases for backward compatibility; prefer AZURE_OPENAI_CHAT_DEPLOYMENT and AZURE_OPENAI_JSON_DEPLOYMENT for new configurations.
Amazon Bedrock¶
Bedrock is the natural fit for AWS-native deployments. Credentials follow the standard AWS resolution chain (environment variables, instance/role credentials, or a named profile).
LLM_PROVIDER=bedrock
AWS_REGION=us-west-2
AWS_BEDROCK_CHAT_MODEL_ID=us.anthropic.claude-3-5-haiku-20241022-v1:0
AWS_BEDROCK_JSON_MODEL_ID=us.anthropic.claude-3-5-haiku-20241022-v1:0 # optional; falls back to chat model
BEDROCK_PROFILE_NAME=<aws-profile> # optional; use named profile from ~/.aws/credentials
# Embeddings (independent of LLM choice)
EMBEDDING_PROVIDER=bedrock
AWS_BEDROCK_EMBEDDING_MODEL_ID=amazon.titan-embed-text-v1
JSON-mode note¶
Not all Bedrock models support strict JSON-mode output the way OpenAI / Azure OpenAI do. The structured note generation route (POST /notes/generate) issues a JSON-shaped prompt and parses the response defensively: if the selected Bedrock model does not return valid JSON the route surfaces an explicit error rather than synthesizing a malformed note. Pick a Bedrock model with reliable JSON adherence (e.g. recent Claude family) for note-generation use; chat and meeting-prep paths are tolerant of unstructured text.
Embedding rebuild workflow¶
Switching the embedding provider requires re-embedding all stored note vectors. Provider vector spaces are not interchangeable; mixing them silently corrupts retrieval. The supported workflow:
- Update
EMBEDDING_PROVIDER(and any provider-specific credentials and model/deployment fields) inbackend/.env. - Confirm health.
curl http://127.0.0.1:8000/healthshould report the newembedding_providerandembedding_configured=true. - Clear existing embeddings. There is no dedicated loader flag today; manually run
UPDATE notes SET embedding = NULLagainst the target database soscribe-load-corpus --embeddoes not skip already-embedded rows. - Re-embed. Run
scribe-load-corpus --embedagainst the same corpus JSONL artifact. Loader logs report rows processed and any retries. - Verify.
GET /healthshould still report the expectedembedding_provider,embedding_model, andembedding_dim; a chat request should now return grounded answers instead of the no-embeddings 503.
Re-embedding is idempotent and safe to retry; the loader uses transactional batches.
Troubleshooting¶
| Symptom | Likely cause | Action |
|---|---|---|
GET /health shows llm_configured=false despite credentials set |
LLM_PROVIDER set to a value without all required env vars for that provider |
Cross-check the provider section above; restart the API after .env edits |
| Chat returns 503 with "No embeddings in the database" | notes.embedding is empty for the current provider |
Run scribe-load-corpus --embed |
| Chat returns 503 with "Embedding provider mismatch" or retrieval is empty after a provider switch | Stored embeddings were generated by a different provider | Follow the Embedding rebuild workflow (clear + re-embed) |
| Note generation returns "invalid JSON from model" on Bedrock | Selected Bedrock model does not reliably honor JSON-mode prompts | Switch AWS_BEDROCK_CHAT_MODEL_ID / AWS_BEDROCK_JSON_MODEL_ID to a model with strong JSON adherence (recent Claude family) |
| Azure OpenAI returns 404 on the deployment | _CHAT_DEPLOYMENT / _EMBEDDING_DEPLOYMENT is a model name, not the Azure deployment name |
Use the deployment name from the Azure portal, not the underlying model id |
Bedrock returns UnauthorizedOperation |
Credentials chain not resolving the intended account/role | Set BEDROCK_PROFILE_NAME to a named profile in ~/.aws/credentials, assume AWS_BEDROCK_ROLE_ARN, or export AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN |
| Pre-meeting summary returns a placeholder | LLM provider not configured for the selected LLM_PROVIDER |
Either set credentials or leave the placeholder visible — the UI surfaces this state intentionally |