-
Notifications
You must be signed in to change notification settings - Fork 191
Description
Following the proposal specified in this doc, this issue suggests to evolve BBR to support multiple InferencePool management in a scalable way.
as proposed in the doc - BBR will be changed to include ConfigMap(s) as the source of truth for mapping between LoRA adapters names (or base model) to the InferencePool. The ConfigMap serve as "allow-list" of models that can be used and is completely decoupled from the LoRA adapters file system resolver in vLLM.
Then upon receiving a new request, BBR will consult with the mapping and inject an appropriate header with the InferencePool name.
HttpRoute can be configured with one rule per pool relying on the correct pool appearing in the header.
This functionality should be optional and users should be able to keep running IGW as today with a single pool without it.