-
Notifications
You must be signed in to change notification settings - Fork 117
Description
Description:
Issue: AIGatewayRoute with model-based matching fails to route /v1/chat/completions requests despite /v1/models listing the model
After deploying the InferencePool example README, the /v1/models endpoint correctly returns a list of available models (e.g., meta-llama/Llama-3.1-8B-Instruct, mistral:latest). However, when sending a standard OpenAI-compatible request to /v1/chat/completions with a model field in the JSON body, Envoy AI Gateway returns:
No matching route found. It is likely because the model specified in your request is not configured in the Gateway.
This suggests that the gateway fails to match the model field from the request body to the routes defined in the AIGatewayRoute, even though the model is listed and the route appears correctly configured.
Expected behavior:
The request should be routed to the appropriate backend (e.g., vllm-llama3-8b-instruct InferencePool) based on the model value in the JSON payload, without requiring custom headers like x-ai-eg-model.
Repro Steps
Deploy the example manifests:
kubectl apply -f base.yaml
kubectl apply -f aigwroute.yaml
Forward the gateway service port:
kubectl port-forward svc/envoy-default-inference-pool-with-aigwroute-d416582c 8081:80 -n envoy-gateway-system
Verify models are listed:
curl http://localhost:8081/v1/models
Send a chat completion request:
curl -X POST http://localhost:8081/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [{"role": "user", "content": "Say this is a test"}]
}'
Observe error:
No matching route found. It is likely because the model specified in your request is not configured in the Gateway.
root@iZ6wed7dv05mxum2042djiZ:~/ai-gateway/examples/inference-pool# kubectl apply -f base.yaml
service/mistral-upstream created
deployment.apps/mistral-upstream created
inferencepool.inference.networking.k8s.io/mistral created
inferenceobjective.inference.networking.x-k8s.io/mistral created
serviceaccount/mistral-epp created
service/mistral-epp created
deployment.apps/mistral-epp created
configmap/plugins-config created
role.rbac.authorization.k8s.io/pod-read created
rolebinding.rbac.authorization.k8s.io/pod-read-binding created
clusterrole.rbac.authorization.k8s.io/auth-reviewer created
clusterrolebinding.rbac.authorization.k8s.io/auth-reviewer-binding created
aiservicebackend.aigateway.envoyproxy.io/envoy-ai-gateway-basic-testupstream created
backend.gateway.envoyproxy.io/envoy-ai-gateway-basic-testupstream created
deployment.apps/envoy-ai-gateway-basic-testupstream created
service/envoy-ai-gateway-basic-testupstream created
root@iZ6wed7dv05mxum2042djiZ:~/ai-gateway/examples/inference-pool# kubectl apply -f aigwroute.yaml
gatewayclass.gateway.networking.k8s.io/inference-pool-with-aigwroute created
gateway.gateway.networking.k8s.io/inference-pool-with-aigwroute created
aigatewayroute.aigateway.envoyproxy.io/inference-pool-with-aigwroute created
root@iZ6wed7dv05mxum2042djiZ:~# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default envoy-ai-gateway-basic-testupstream-6f75dd4cf6-9cv86 1/1 Running 0 2m47s
default mistral-epp-f95446897-xvq4f 1/1 Running 0 2m48s
default mistral-upstream-9c959d4d4-b9d97 1/1 Running 0 2m48s
default mistral-upstream-9c959d4d4-dgkkp 1/1 Running 0 2m48s
default mistral-upstream-9c959d4d4-kq8jq 1/1 Running 0 2m48s
envoy-ai-gateway-system ai-gateway-controller-5558c7cf7c-r8s9t 1/1 Running 0 35m
envoy-gateway-system envoy-default-inference-pool-with-aigwroute-d416582c-58ffc9jxwm 3/3 Running 0 2m42s
envoy-gateway-system envoy-gateway-6dd8f9b8f-dxwgs 1/1 Running 0 34m
kube-system coredns-668d6bf9bc-9n5cn 1/1 Running 0 36m
kube-system coredns-668d6bf9bc-wld9q 1/1 Running 0 36m
kube-system etcd-cluster1-control-plane 1/1 Running 0 36m
kube-system kindnet-6dr8j 1/1 Running 0 35m
kube-system kindnet-cl8tz 1/1 Running 0 36m
kube-system kindnet-phj8s 1/1 Running 0 35m
kube-system kube-apiserver-cluster1-control-plane 1/1 Running 0 36m
kube-system kube-controller-manager-cluster1-control-plane 1/1 Running 0 36m
kube-system kube-proxy-b959k 1/1 Running 0 35m
kube-system kube-proxy-nctnq 1/1 Running 0 36m
kube-system kube-proxy-tq59g 1/1 Running 0 35m
kube-system kube-scheduler-cluster1-control-plane 1/1 Running 0 36m
local-path-storage local-path-provisioner-7dc846544d-4vn9n 1/1 Running 0 36m
root@iZ6wed7dv05mxum2042djiZ:~# kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default envoy-ai-gateway-basic-testupstream ClusterIP 10.96.213.150 <none> 80/TCP 3m21s
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 36m
default mistral-epp ClusterIP 10.96.104.62 <none> 9002/TCP 3m22s
default mistral-upstream ClusterIP None <none> 8080/TCP 3m22s
envoy-ai-gateway-system ai-gateway-controller ClusterIP 10.96.30.73 <none> 9443/TCP,1063/TCP,9090/TCP 36m
envoy-gateway-system envoy-default-inference-pool-with-aigwroute-d416582c LoadBalancer 10.96.136.253 <pending> 80:31523/TCP 3m16s
envoy-gateway-system envoy-gateway ClusterIP 10.96.123.136 <none> 18000/TCP,18001/TCP,18002/TCP,19001/TCP,9443/TCP 35m
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 36mroot@iZ6wed7dv05mxum2042djiZ:~/ai-gateway/examples/inference-pool# kubectl port-forward svc/envoy-default-inference-pool-with-aigwroute-d416582c 8081:80 -n envoy-gateway-system
Forwarding from 127.0.0.1:8081 -> 10080
Forwarding from [::1]:8081 -> 10080
Handling connection for 8081
Handling connection for 8081
root@iZ6wed7dv05mxum2042djiZ:~# curl -X POST "http://localhost:8081/v1/models"
{"data":[{"id":"meta-llama/Llama-3.1-8B-Instruct","created":1762256144,"object":"model","owned_by":"Envoy AI Gateway"},{"id":"meta-llama/Llama-3.1-8B-Instruct","created":1762256144,"object":"model","owned_by":"Envoy AI Gateway"},{"id":"mistral:latest","created":1762256144,"object":"model","owned_by":"Envoy AI Gateway"},{"id":"some-cool-self-hosted-model","created":1762256144,"object":"model","owned_by":"Envoy AI Gateway"}],"object":"list"}root@iZ6wed7dv05mxum2042djiZ:~#
root@iZ6wed7dv05mxum2042djiZ:~#
root@iZ6wed7dv05mxum2042djiZ:~#
root@iZ6wed7dv05mxum2042djiZ:~#
root@iZ6wed7dv05mxum2042djiZ:~# curl -X POST "http://localhost:8081/v1/chat/completions" -H "Content-Type: application/json" -d '{"messages":[{"role":"user","content":"Say this is a test"}],"model":"meta-llama/Llama-3.1-8B-Instruct"}'
No matching route found. It is likely because the model specified in your request is not configured in the Gateway.root@iZ6wed7dv05mxum2042djiZ:~#What issue is being seen? Describe what should be happening instead of
the bug, for example: Envoy should not crash, the expected value isn't
returned, etc.
Repro steps:
Include sample requests, environment, etc. All data and inputs
required to reproduce the bug.
Note: If there are privacy concerns, sanitize the data prior to
sharing.
Environment:
Environment
Envoy Gateway Helm Chart: v0.0.0-latest
Kubernetes: vX.XX (e.g., v1.28 via Kind)
OS: Ubuntu 22.04
Steps followed: InferencePool Example README
Include the environment like gateway version, envoy version and so on.
Logs:
Include the access logs and the Envoy logs.