Skip to content

Multi Pool support through BBR #1812

@nirrozenbaum

Description

@nirrozenbaum

Following the proposal specified in this doc, this issue suggests to evolve BBR to support multiple InferencePool management in a scalable way.

as proposed in the doc - BBR will be changed to include ConfigMap(s) as the source of truth for mapping between LoRA adapters names (or base model) to the InferencePool. The ConfigMap serve as "allow-list" of models that can be used and is completely decoupled from the LoRA adapters file system resolver in vLLM.

Then upon receiving a new request, BBR will consult with the mapping and inject an appropriate header with the InferencePool name.
HttpRoute can be configured with one rule per pool relying on the correct pool appearing in the header.
This functionality should be optional and users should be able to keep running IGW as today with a single pool without it.

Metadata

Metadata

Assignees

Labels

triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions