You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix: Account for CPU offloading in KV cache memory check
The check_enough_kv_cache_memory() function was not accounting for
CPU offloading capacity when validating available memory. This caused
the V1 engine to fail with 'No available memory for cache blocks' error
even when --cpu-offload-gb was set.
This fix adds the CPU offload capacity to the effective available memory
before performing the check, allowing 7B-13B models to work correctly
with CPU offloading on 12GB GPUs.
Fixes#27934
0 commit comments