Skip to content

[Flow Control] Validate and Harden Scale-from-Zero Behavior #1800

@LukeAVanDrie

Description

@LukeAVanDrie

What needs to be done?: The Flow Control layer is designed to gracefully handle scale-from-zero scenarios by queueing incoming requests until backend pods are ready. This behavior needs to be rigorously validated to ensure it is robust.

Validation Steps:

  • Create test scenarios where an InferencePool scales from 0 to N replicas while under load (or even 1 to 0 to N)
  • Verify that requests are correctly queued and not dropped, provided they do not exceed their own timeouts.
    Measure the end-to-end latency for the first few requests to confirm they are dispatched promptly once backends become available.
  • Ensure the system remains stable and does not enter a deadlocked state during the scale-up process.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions