Skip to content

Conversation

@xmfan
Copy link
Member

@xmfan xmfan commented Nov 8, 2025

Stacked PRs:


Mark input tokens to routed experts as dynamic to avoid a recompile

This saves 1 recompile, and you can see the input tokens are dynamic from the first graph compiled:

class GraphModule(torch.nn.Module):
    def forward(...s77: "Sym(s77)", L_x_: "bf16[s77, 5120][5120, 1]cuda:0"...

I verified that this also fixes the AC recompile issue of: #1971. But I'm keeping torch._C._dynamo.eval_frame._set_lru_cache(False), as there could be other recompile reasons popping up.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 8, 2025
@jquesnelle
Copy link
Contributor

the fix for #1971 requires using PyTorch nightly as of a few days ago (_set_lru_cache just added) so this may be a preferrable fix to allow for using PyTorch stable (2.9)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants