v1.74.6
Deploy this versionโ
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.74.6
pip install litellm==1.74.6
Key Highlightsโ
- Vector Stores - Support for Vertex RAG Engine, PG Vector, OpenAI & Azure OpenAI Vector Stores.
- Health Check Improvements - Separate health check app on dedicated port for better Kubernetes liveness probes.
- Control Plane + Data Plane Architecture - Enhanced proxy architecture for better scalability and separation of concerns.
- New LLM Providers - Added Moonshot API
moonshot
andv0
provider support.
MCP Gateway: Enhanced Namespacingโ
v1.74.6 introduces improved URL-based namespacing for MCP servers, enabling better segregation and organization of MCP tools across different environments and teams.
Key features include:
- URL-based namespacing: Better isolation between different MCP server instances
- Access group improvements: Enhanced management of MCP server access through configuration
- Tool permission management: Improved object permissions when updating/deleting keys and teams
Read more here
Vector Stores APIโ
v1.74.6 introduces OpenAI-compatible vector store endpoints, bringing powerful vector search capabilities to the LiteLLM proxy.
New Endpoints:
/v1/vector_stores
- Create and manage vector stores/v1/vector_stores/{vector_store_id}/search
- Perform vector searches
Supported Providers:
- Vertex RAG Engine - Google's managed RAG solution
- PG Vector - PostgreSQL vector extension support
- OpenAI Vector Stores - Full OpenAI compatibility
- Azure AI Search - Microsoft's vector search service
This enables developers to easily integrate vector search capabilities into their applications while maintaining compatibility with OpenAI's vector store API.
Control Plane + Data Plane Architectureโ
New Models / Updated Modelsโ
Pricing / Context Window Updatesโ
Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) |
---|---|---|---|---|
Azure AI | azure_ai/grok-3 | 131k | $3.30 | $16.50 |
Azure AI | azure_ai/global/grok-3 | 131k | $3.00 | $15.00 |
Azure AI | azure_ai/global/grok-3-mini | 131k | $0.25 | $1.27 |
Azure AI | azure_ai/grok-3-mini | 131k | $0.275 | $1.38 |
Azure AI | azure_ai/jais-30b-chat | 8k | $3200 | $9710 |
Groq | groq/moonshotai-kimi-k2-instruct | 131k | $1.00 | $3.00 |
AI21 | jamba-large-1.7 | 256k | $2.00 | $8.00 |
AI21 | jamba-mini-1.7 | 256k | $0.20 | $0.40 |
Together.ai | together_ai/moonshotai/Kimi-K2-Instruct | 131k | $1.00 | $3.00 |
v0 | v0/v0-1.0-md | 128k | $3.00 | $15.00 |
v0 | v0/v0-1.5-md | 128k | $3.00 | $15.00 |
v0 | v0/v0-1.5-lg | 512k | $15.00 | $75.00 |
Moonshot | moonshot/moonshot-v1-8k | 8k | $0.20 | $2.00 |
Moonshot | moonshot/moonshot-v1-32k | 32k | $1.00 | $3.00 |
Moonshot | moonshot/moonshot-v1-128k | 131k | $2.00 | $5.00 |
Moonshot | moonshot/moonshot-v1-auto | 131k | $2.00 | $5.00 |
Moonshot | moonshot/kimi-k2-0711-preview | 131k | $0.60 | $2.50 |
Moonshot | moonshot/moonshot-v1-32k-0430 | 32k | $1.00 | $3.00 |
Moonshot | moonshot/moonshot-v1-128k-0430 | 131k | $2.00 | $5.00 |
Moonshot | moonshot/moonshot-v1-8k-0430 | 8k | $0.20 | $2.00 |
Moonshot | moonshot/kimi-latest | 131k | $2.00 | $5.00 |
Moonshot | moonshot/kimi-latest-8k | 8k | $0.20 | $2.00 |
Moonshot | moonshot/kimi-latest-32k | 32k | $1.00 | $3.00 |
Moonshot | moonshot/kimi-latest-128k | 131k | $2.00 | $5.00 |
Moonshot | moonshot/kimi-thinking-preview | 131k | $30.00 | $30.00 |
Moonshot | moonshot/moonshot-v1-8k-vision-preview | 8k | $0.20 | $2.00 |
Moonshot | moonshot/moonshot-v1-32k-vision-preview | 32k | $1.00 | $3.00 |
Moonshot | moonshot/moonshot-v1-128k-vision-preview | 131k | $2.00 | $5.00 |
Featuresโ
- ๐ Moonshot API (Kimi)
- New LLM API integration for accessing Kimi models - PR #12592, Get Started
- ๐ v0 Provider
- New provider integration for v0.dev - PR #12751, Get Started
- OpenAI
- Use OpenAI DeepResearch models with
litellm.completion
(/chat/completions
) - PR #12627 DOC NEEDED - Add
input_fidelity
parameter for OpenAI image generation - PR #12662, Get Started
- Use OpenAI DeepResearch models with
- Azure OpenAI
- Anthropic
- Tool cache control support - PR #12668
- Bedrock
- Claude 4 /invoke route support - PR #12599, Get Started
- Application inference profile tool choice support - PR #12599
- Gemini
- VertexAI
- Added Vertex AI RAG Engine support (use with OpenAI compatible
/vector_stores
API) - PR #12752, DOC NEEDED
- Added Vertex AI RAG Engine support (use with OpenAI compatible
- vLLM
- Added support for using Rerank endpoints with vLLM - PR #12738, Get Started
- AI21
- Added ai21/jamba-1.7 model family pricing - PR #12593, Get Started
- Together.ai
- [New Model] add together_ai/moonshotai/Kimi-K2-Instruct - PR #12645, Get Started
- Groq
- Add groq/moonshotai-kimi-k2-instruct model configuration - PR #12648, Get Started
- Github Copilot
- Change System prompts to assistant prompts for GH Copilot - PR #12742, Get Started
Bugsโ
- Anthropic
- Fix streaming + response_format + tools bug - PR #12463
- XAI
- grok-4 does not support the
stop
param - PR #12646
- grok-4 does not support the
- AWS
- Role chaining with web authentication for AWS Bedrock - PR #12607
- VertexAI
- Add project_id to cached credentials - PR #12661
- Bedrock
- Fix bedrock nova micro and nova lite context window info in PR #12619
LLM API Endpointsโ
Featuresโ
- /chat/completions
- Include tool calls in output of trim_messages - PR #11517
- /v1/vector_stores
- /streamGenerateContent
- Non-gemini model support - PR #12647
Bugsโ
- /vector_stores
- Knowledge Base Call returning error when passing as
tools
- PR #12628
- Knowledge Base Call returning error when passing as
MCP Gatewayโ
Featuresโ
- Access Groups
- Namespacing
- Gateway Features
- Allow using MCPs with all LLM APIs (VertexAI, Gemini, Groq, etc.) when using /responses - PR #12546
Bugsโ
- Fix to update object permission on update/delete key/team - PR #12701
- Include /mcp in list of available routes on proxy - PR #12612
Management Endpoints / UIโ
Featuresโ
- Keys
- Regenerate Key State Management improvements - PR #12729
- Models
- Usage Page
- Fix Y-axis labels overlap on Spend per Tag chart - PR #12754
- Teams
- Users
- New
/user/bulk_update
endpoint - PR #12720
- New
- Logs Page
- Add
end_user
filter on UI Logs Page - PR #12663
- Add
- MCP Servers
- Copy MCP Server name functionality - PR #12760
- Vector Stores
- General
- Add Copy-on-Click for all IDs (Key, Team, Organization, MCP Server) - PR #12615
- SCIM
- Add GET /ServiceProviderConfig endpoint - PR #12664
Bugsโ
- Teams
Logging / Guardrail Integrationsโ
Featuresโ
- Google Cloud Model Armor
- New guardrails integration - PR #12492
- Bedrock Guardrails
- Allow disabling exception on 'BLOCKED' action - PR #12693
- Guardrails AI
- Support
llmOutput
based guardrails as pre-call hooks - PR #12674
- Support
- DataDog LLM Observability
- Add support for tracking the correct span type based on LLM Endpoint used - PR #12652
- Custom Logging
- Allow reading custom logger python scripts from S3 or GCS Bucket - PR #12623
Bugsโ
- General Logging
- StandardLoggingPayload on cache_hits should track custom llm provider - PR #12652
- S3 Buckets
- S3 v2 log uploader crashes when using with guardrails - PR #12733
Performance / Loadbalancing / Reliability improvementsโ
Featuresโ
- Health Checks
- Caching
- Add Azure Blob cache support - PR #12587
- Router
- Handle ZeroDivisionError with zero completion tokens in lowest_latency strategy - PR #12734
Bugsโ
- Database
- Cache
- Fix: redis caching for embedding response models - PR #12750
Helm Chartโ
- DB Migration Hook: refactor to support use_prisma_migrate - for helm hook PR
- Add envVars and extraEnvVars support to Helm migrations job - PR #12591
General Proxy Improvementsโ
Featuresโ
- Control Plane + Data Plane Architecture
- Control Plane + Data Plane support - PR #12601
- Proxy CLI
- Add "keys import" command to CLI - PR #12620
- Swagger Documentation
- Add swagger docs for LiteLLM /chat/completions, /embeddings, /responses - PR #12618
- Dependencies
- Loosen rich version from ==13.7.1 to >=13.7.1 - PR #12704
Bugsโ
-
Verbose log is enabled by default fix - PR #12596
-
Add support for disabling callbacks in request body - PR #12762
-
Handle circular references in spend tracking metadata JSON serialization - PR #12643
New Contributorsโ
- @AntonioKL made their first contribution in https://github.com/BerriAI/litellm/pull/12591
- @marcelodiaz558 made their first contribution in https://github.com/BerriAI/litellm/pull/12541
- @dmcaulay made their first contribution in https://github.com/BerriAI/litellm/pull/12463
- @demoray made their first contribution in https://github.com/BerriAI/litellm/pull/12587
- @staeiou made their first contribution in https://github.com/BerriAI/litellm/pull/12631
- @stefanc-ai2 made their first contribution in https://github.com/BerriAI/litellm/pull/12622
- @RichardoC made their first contribution in https://github.com/BerriAI/litellm/pull/12607
- @yeahyung made their first contribution in https://github.com/BerriAI/litellm/pull/11795
- @mnguyen96 made their first contribution in https://github.com/BerriAI/litellm/pull/12619
- @rgambee made their first contribution in https://github.com/BerriAI/litellm/pull/11517
- @jvanmelckebeke made their first contribution in https://github.com/BerriAI/litellm/pull/12725
- @jlaurendi made their first contribution in https://github.com/BerriAI/litellm/pull/12704
- @doublerr made their first contribution in https://github.com/BerriAI/litellm/pull/12661