Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/upstream-projects.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ projects:

- id: toolhive
repo: stacklok/toolhive
version: v0.31.0
version: v0.32.0
# toolhive is a monorepo covering the CLI, the Kubernetes
# operator, and the vMCP gateway. It also introduces cross-
# cutting features that land in concepts/, integrations/,
Expand Down
36 changes: 36 additions & 0 deletions docs/toolhive/guides-cli/run-mcp-servers.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,42 @@ thv run --transport streamable-http --target-port <PORT_NUMBER> <SERVER> -- http
Check your MCP server's documentation for the required transport and port
configuration.

### Restrict browser Origin headers

ToolHive validates the HTTP `Origin` header on inbound proxy requests to protect
browser-based clients against DNS-rebinding attacks, per the
[MCP 2025-11-25 specification](https://modelcontextprotocol.io/specification/2025-11-25/basic/transports#security-warning).
Requests without an `Origin` header (such as IDE clients, CLI bridges, and
SDK-based MCP clients) pass through unchanged; only browser cross-origin
requests are subject to the check.

When ToolHive binds to a loopback address (the default is `127.0.0.1`, but
`localhost` and `[::1]` also count), it derives a loopback-only allowlist
automatically and no configuration is required.

When you bind to a non-loopback address with `--host` and a browser client
connects from a different origin, pass `--allowed-origins` to enable the check.
The flag is repeatable, matches each value exactly on scheme, host, and port,
and is also accepted by `thv proxy`. Without it, ToolHive logs a warning and
disables the check entirely, which isn't recommended for public binds:

```bash
thv run --transport streamable-http \
--host 0.0.0.0 \
--allowed-origins https://my-web-app.example.com \
<SERVER>
```

:::info[Legacy SSE transport CORS]

As of v0.32.0, the legacy SSE transport no longer sends a wildcard
`Access-Control-Allow-Origin: *` header. Browser clients on a non-loopback
origin that used to rely on the wildcard must now be added to the
`--allowed-origins` allowlist, or migrated off SSE to the streamable-HTTP
transport.

:::

### Add a custom CA certificate

In corporate environments with TLS inspection or custom certificate authorities,
Expand Down
10 changes: 10 additions & 0 deletions docs/toolhive/guides-vmcp/authentication.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -513,6 +513,16 @@ for key generation steps.

:::

The issuer URL must use the `https://` scheme. The single exception is
`localhost`, which can use `http://` for local development. For in-cluster
deployments where traffic between the embedded auth server and other pods stays
on a trusted network (for example, an in-cluster service mesh), you can opt in
to an `http://` issuer on a non-localhost host by setting
`insecureAllowHTTP: true`. The VirtualMCPServer controller rejects this
combination at reconcile time with `AuthServerConfigValidated=False` if the flag
is unset, so misconfiguration surfaces on the resource rather than crashing the
pod at startup. Never set this for issuers reachable outside the cluster.

If the browser-facing authorization endpoint needs to be on a different host
than the issuer (for example, behind an ingress that rewrites paths), set
`authorizationEndpointBaseUrl` to override the `authorization_endpoint` in the
Expand Down
7 changes: 7 additions & 0 deletions docs/toolhive/guides-vmcp/local-cli.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,13 @@ For Tier 2, ToolHive starts and stops a HuggingFace Text Embeddings Inference
(TEI) container named `thv-embedding-<hash>` automatically. Customize the model
and image with `--embedding-model` and `--embedding-image`.

For Tier 3, you can point at any HuggingFace TEI server or at an
OpenAI-compatible `/embeddings` endpoint (OpenAI, Azure OpenAI, or another
compatible gateway). Set `embeddingProvider: openai` and `embeddingModel`
alongside `embeddingService`, and supply the API key via the `OPENAI_API_KEY`
environment variable (omit it for keyless gateways). The default is `tei`, so
existing Tier 3 configs continue to work unchanged.

For the conceptual background and tuning parameters, see
[Optimize tool discovery](./optimizer.mdx) and
[Tool optimization](../concepts/tool-optimization.mdx).
Expand Down
105 changes: 94 additions & 11 deletions docs/toolhive/guides-vmcp/optimizer.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,14 @@ toolset.

## EmbeddingServer resource

The EmbeddingServer CRD manages the lifecycle of a TEI server. An empty
`spec: {}` uses all defaults. The two most important fields you can customize
are:
The EmbeddingServer CRD manages the lifecycle of a managed TEI server, which is
the default embedding backend. If you'd rather point the optimizer at an
external OpenAI-compatible embedding service instead, see
[Use an OpenAI-compatible embedding service](#use-an-openai-compatible-embedding-service)
below.

An empty `spec: {}` uses all defaults. The two most important fields you can
customize are:

- **`model`**: The Hugging Face embedding model to use. The default
(`BAAI/bge-small-en-v1.5`) is the tested and recommended model. You can
Expand Down Expand Up @@ -115,6 +120,77 @@ spec:

:::

## Use an OpenAI-compatible embedding service

Instead of running a managed TEI EmbeddingServer, you can point the optimizer at
an external service that speaks the OpenAI `/embeddings` API, such as OpenAI
itself, Azure OpenAI, or another OpenAI-compatible gateway. Use this when you
already operate a centralized embedding service and don't want a second copy
running per vMCP, or when you need a hosted model.

Set `embeddingProvider: openai` under `spec.config.optimizer` and configure
`embeddingService` and `embeddingModel` directly. Do **not** set
`embeddingServerRef`; the operator rejects combining the two at admission.
Comment thread
danbarr marked this conversation as resolved.

```yaml title="VirtualMCPServer resource"
apiVersion: toolhive.stacklok.dev/v1beta1
kind: VirtualMCPServer
metadata:
name: optimizer-vmcp
namespace: toolhive-system
spec:
groupRef:
name: my-group
config:
optimizer:
# highlight-start
embeddingProvider: openai
embeddingService: http://llm-gateway.default.svc.cluster.local:8080/v1
embeddingModel: text-embedding-3-small
# highlight-end
embeddingServiceTimeout: 15s
incomingAuth:
type: anonymous
```

`embeddingService` is the base URL of the OpenAI-compatible endpoint;
`/embeddings` is appended automatically. `embeddingModel` is the model name
passed in each request and is required for the `openai` provider (the `tei`
provider ignores it, because the model is fixed by the TEI container).

The API key for the embedding service is read from the `OPENAI_API_KEY`
environment variable on the vmcp container, never from the CRD spec or
ConfigMap. Inject it from a Secret via `podTemplateSpec`:

```yaml title="VirtualMCPServer resource (excerpt)"
spec:
podTemplateSpec:
spec:
containers:
- name: vmcp
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: embedding-api-key
key: apiKey
```

Omit the env var entirely if your gateway is keyless (for example, an in-cluster
LLM gateway that authenticates by network position). An empty key omits the
`Authorization` header.

:::warning[Inputs are not truncated]

Unlike the TEI backend, the OpenAI API does not silently truncate over-long
inputs. A tool description that exceeds the model's context window causes the
request to fail with an error rather than being truncated.

:::

When `embeddingProvider` is omitted, the optimizer defaults to `tei` and your
existing TEI-based configuration continues to work unchanged.

## Local mode (CLI)

You can enable the optimizer directly from the `thv vmcp` CLI without a
Expand Down Expand Up @@ -256,16 +332,23 @@ spec:
exclude={['embeddingService']}
/>

:::info[Kubernetes: EmbeddingServer is always required]
:::info[Kubernetes: EmbeddingServer is required for the default TEI provider]

When using the Kubernetes operator with the default `tei` embedding provider,
even if you set `hybridSearchSemanticRatio` to `"0.0"` (all keyword search), the
optimizer still requires a configured `EmbeddingServer`. The EmbeddingServer
won't be used at runtime when the semantic ratio is `0.0`, but the configuration
must be present due to how the operator wires the resources internally.

When using the Kubernetes operator, even if you set `hybridSearchSemanticRatio`
to `"0.0"` (all keyword search), the optimizer still requires a configured
`EmbeddingServer`. The EmbeddingServer won't be used at runtime when the
semantic ratio is `0.0`, but the configuration must be present due to how the
operator wires the resources internally.
This restriction doesn't apply when you set `optimizer.embeddingService`
directly, such as with the
[OpenAI-compatible provider](#use-an-openai-compatible-embedding-service); the
operator only requires `embeddingServerRef` when no manual embedding service is
configured.

This restriction does not apply to local CLI mode. `thv vmcp serve --optimizer`
runs keyword-only search with no EmbeddingServer and no container.
This restriction also does not apply to local CLI mode.
`thv vmcp serve --optimizer` runs keyword-only search with no EmbeddingServer
and no container.

:::

Expand Down
1 change: 1 addition & 0 deletions docs/toolhive/reference/cli/thv_proxy.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ thv proxy [flags] SERVER_NAME
### Options

```
--allowed-origins stringArray Exact-match allowlist for the HTTP Origin header (repeatable). Recommended when binding publicly; loopback binds derive a default allowlist automatically, non-loopback binds log a warning when no value is supplied. Example: https://my-mcp.example.com
-h, --help help for proxy
--host string Host for the HTTP proxy to listen on (IP or hostname) (default "127.0.0.1")
--oidc-audience string Expected audience for the token
Expand Down
1 change: 1 addition & 0 deletions docs/toolhive/reference/cli/thv_run.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ thv run [flags] SERVER_OR_IMAGE_OR_PROTOCOL [-- ARGS...]

```
--allow-docker-gateway Allow outbound connections to Docker gateway addresses (host.docker.internal, gateway.docker.internal, 172.17.0.1). Only applies when --isolate-network is set. These are blocked by default even when insecure_allow_all is enabled.
--allowed-origins stringArray Exact-match allowlist for the HTTP Origin header (repeatable). Recommended when binding publicly; loopback binds derive a default allowlist automatically, non-loopback binds log a warning when no value is supplied. Example: https://my-mcp.example.com
--audit-config string Path to the audit configuration file
--authz-config string Path to the authorization configuration file
--ca-cert string Path to a custom CA certificate file to use for container builds
Expand Down
7 changes: 5 additions & 2 deletions docs/toolhive/tutorials/mcp-optimizer.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -181,8 +181,11 @@ Then apply the YAML above, which creates a new `fetch` server with the correct

## Step 2: Deploy an EmbeddingServer

The optimizer uses semantic search to find relevant tools. This requires an
EmbeddingServer, which runs a text embeddings inference (TEI) server.
The optimizer uses semantic search to find relevant tools, which means it needs
to talk to an embedding service. This tutorial deploys a managed EmbeddingServer
that runs a HuggingFace Text Embeddings Inference (TEI) container. If you'd
rather point at an existing OpenAI-compatible embedding service, see
[Use an OpenAI-compatible embedding service](../guides-vmcp/optimizer.mdx#use-an-openai-compatible-embedding-service).

Create an EmbeddingServer with default settings. This deploys the
`BAAI/bge-small-en-v1.5` model. If you are running on ARM64 nodes (for example,
Expand Down
18 changes: 18 additions & 0 deletions static/api-specs/toolhive-api.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -640,6 +640,12 @@ components:
type: string
type: array
uniqueItems: false
insecure_allow_http:
description: |-
InsecureAllowHTTP permits an http:// issuer URL for non-localhost hosts.
Only set this for in-cluster Kubernetes deployments on a trusted network.
Production deployments reachable outside the cluster MUST use https://.
Comment thread
danbarr marked this conversation as resolved.
type: boolean
issuer:
description: |-
Issuer is the issuer identifier for this authorization server.
Expand Down Expand Up @@ -1309,6 +1315,18 @@ components:
blocked by default in the egress proxy even when InsecureAllowAll is set.
Only applicable to Docker deployments with network isolation enabled.
type: boolean
allowed_origins:
description: |-
AllowedOrigins is the allowlist of values accepted on the HTTP Origin header,
used for DNS-rebinding protection per MCP 2025-11-25 §"Security Warning".
When empty and Host is loopback (127.0.0.1 / localhost / [::1]), a default
loopback-only allowlist is derived at middleware-wiring time.
When empty and Host is non-loopback, the middleware is disabled — operators
exposing the proxy publicly must configure an explicit allowlist.
items:
type: string
type: array
uniqueItems: false
audit_config:
$ref: '#/components/schemas/github_com_stacklok_toolhive_pkg_audit.Config'
audit_config_path:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@
"description": "EmbeddedAuthServer configures an embedded OAuth2/OIDC authorization server\nOnly used when Type is \"embeddedAuthServer\"",
"properties": {
"authorizationEndpointBaseUrl": {
"description": "AuthorizationEndpointBaseURL overrides the base URL used for the authorization_endpoint\nin the OAuth discovery document. When set, the discovery document will advertise\n`{authorizationEndpointBaseUrl}/oauth/authorize` instead of `{issuer}/oauth/authorize`.\nAll other endpoints (token, registration, JWKS) remain derived from the issuer.\nThis is useful when the browser-facing authorization endpoint needs to be on a\ndifferent host than the issuer used for backend-to-backend calls.\nMust be a valid HTTPS URL (or HTTP for localhost) without query, fragment, or trailing slash.",
"description": "AuthorizationEndpointBaseURL overrides the base URL used for the authorization_endpoint\nin the OAuth discovery document. When set, the discovery document will advertise\n`{authorizationEndpointBaseUrl}/oauth/authorize` instead of `{issuer}/oauth/authorize`.\nAll other endpoints (token, registration, JWKS) remain derived from the issuer.\nThis is useful when the browser-facing authorization endpoint needs to be on a\ndifferent host than the issuer used for backend-to-backend calls.\nMust be a valid HTTPS URL (or HTTP for localhost, or HTTP for trusted in-cluster hosts\nwhen insecureAllowHTTP is true) without query, fragment, or trailing slash.",
"pattern": "^https?://[^\\s?#]+[^/\\s?#]$",
"type": "string"
},
Expand Down Expand Up @@ -195,8 +195,13 @@
"type": "array",
"x-kubernetes-list-type": "atomic"
},
"insecureAllowHTTP": {
"default": false,
"description": "InsecureAllowHTTP permits an http:// issuer URL for non-localhost hosts.\nOnly set this for in-cluster Kubernetes deployments where traffic between\npods traverses a trusted network (e.g. the in-cluster service mesh).\nProduction deployments reachable outside the cluster MUST use https://.\n\nOn VirtualMCPServer: when false (the default), http:// issuers for non-localhost\nhosts are rejected at reconcile time with an AuthServerConfigValidated=False condition.\n\nOn MCPServer and MCPRemoteProxy (via MCPExternalAuthConfig): this field is\nstructurally present but enforcement is deferred to pod startup via Config.Validate();\na misconfigured issuer will cause the pod to crash at startup rather than surface\nas an operator condition.",
"type": "boolean"
},
"issuer": {
"description": "Issuer is the issuer identifier for this authorization server.\nThis will be included in the \"iss\" claim of issued tokens.\nMust be a valid HTTPS URL (or HTTP for localhost) without query, fragment, or trailing slash (per RFC 8414).",
"description": "Issuer is the issuer identifier for this authorization server.\nThis will be included in the \"iss\" claim of issued tokens.\nMust be a valid HTTPS URL (or HTTP for localhost, or HTTP for trusted in-cluster hosts when\ninsecureAllowHTTP is true) without query, fragment, or trailing slash (per RFC 8414).",
"pattern": "^https?://[^\\s?#]+[^/\\s?#]$",
"type": "string"
},
Expand Down
Loading