[Bug]: `vllm bench serve` incorrectly calculates ITL for openai-chat with reasoning, tool calling, or harmony models

### Your current environment

Testing `vllm bench serve` from main as of opening this bug - Oct 24, 2025. The issue is agnostic of hardware in use.


### 🐛 Describe the bug

When using `vllm bench serve` with the `openai-chat` backend against a vLLM server, the inter-token latency (ITL) is incorrectly calculated for any situation where the server chunks streamed back do not map 1:1 with tokens generated. This regularly happens when reasoning parsers, tool call parsers, or harmony models are in use as all of these have special tokens and parsing logic that can cause responses to get temporarily buffered and/or special tokens removed from the final output.

The logic in `vllm.benchmarks.lib.endpoint_request_func.async_request_openai_chat_completions` assumes every chunk is one token and calculates ITL as the simple timestamp difference between the last chunk and this chunk. This is misleading, and will lead to reporting higher ITL values than reality because this is actually calculating the latency between streaming chunks as opposed to the latency between generated tokens.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: `vllm bench serve` incorrectly calculates ITL for openai-chat with reasoning, tool calling, or harmony models #27485

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: vllm bench serve incorrectly calculates ITL for openai-chat with reasoning, tool calling, or harmony models #27485

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: `vllm bench serve` incorrectly calculates ITL for openai-chat with reasoning, tool calling, or harmony models #27485