Skip to content

Conversation

@rmatif
Copy link
Contributor

@rmatif rmatif commented Nov 4, 2025

This PR adds support for Easycache, a variant of TeaCache that achieves significant speedup

Currently tested only with CUDA and on Flux/Qwen

Command usage:

--easycache threshold,start_percent,end_percent

Examples:

Without Easycache --easycache 0.2,0.15,0.95 Speedup
output2 output3 x1.85
noeasycache easycache x1.85

@Green-Sky
Copy link
Contributor

Green-Sky commented Nov 6, 2025

Ran it with a SPARK (preview) Chroma finetune

thresh img real speedup
0 ec_chroma spark_noec baseline
0.025 ec_chroma spark_0 025 1.12x
0.1 ec_chroma spark_0 1 1.31x
0.2 ec_chroma spark_0 2 1.4x

I noticed the estimated speedup is off. It is off exactly by 2x, so I guess cfg is not handled properly yet.

eg. 40steps with cfg are actually 80 steps

40/(40-8) -> 1.25 (estimated 1.25)
80/(80-8) -> 1.11 (measured 1.12)

I included 0.025, because that is what the original used for wan.

Anyway Good stuff, I take the 11% speedup.

$ result/bin/sd --diffusion-model models/SPARK.Chroma_preview-q5_k.gguf --t5xxl models/flux-extra/t5xxl_fp16.safetensors -t 8 --vae models/flux-extra/ae-f16.gguf --sampling-method dpm++2m --scheduler simple --steps 40 --cfg-scale 3.8 -n "low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors, noisy, artifacts, fake, generated, overblown, over exposed" -p "Photograph of a lovely cat. Green rolling hills in the background." --clip-on-cpu --offload-to-cpu -v -W 768 -H 1024 --diffusion-fa --chroma-disable-dit-mask --easycache 0.025,0.15,0.95

@JohnLoveJoy
Copy link

Ran it with a SPARK (preview) Chroma finetune

thresh img real speedup
0 ec_chroma spark_noec baseline
0.025 ec_chroma spark_0 025 1.12x
0.1 ec_chroma spark_0 1 1.31x
0.2 ec_chroma spark_0 2 1.4x
I noticed the estimated speedup is off. It is off exactly by 2x, so I guess cfg is not handled properly yet.

eg. 40steps with cfg are actually 80 steps

40/(40-8) -> 1.25 (estimated 1.25)
80/(80-8) -> 1.11 (measured 1.12)

I included 0.025, because that is what the original used for wan.

Anyway Good stuff, I take the 11% speedup.

$ result/bin/sd --diffusion-model models/SPARK.Chroma_preview-q5_k.gguf --t5xxl models/flux-extra/t5xxl_fp16.safetensors -t 8 --vae models/flux-extra/ae-f16.gguf --sampling-method dpm++2m --scheduler simple --steps 40 --cfg-scale 3.8 -n "low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors, noisy, artifacts, fake, generated, overblown, over exposed" -p "Photograph of a lovely cat. Green rolling hills in the background." --clip-on-cpu --offload-to-cpu -v -W 768 -H 1024 --diffusion-fa --chroma-disable-dit-mask --easycache 0.025,0.15,0.95

Is this additional noise in the image expected? I haven’t noticed it in other Tea Cache tests or similar methods.

@Green-Sky
Copy link
Contributor

Green-Sky commented Nov 6, 2025

Is this additional noise in the image expected? I haven’t noticed it in other Tea Cache tests or similar methods.

I speculate that this happens with some models (see second pic OP) when too much is skipped at the end of the sampling phase. You can lower the second value from 0.95 to 0.80 or lower. This will result in less skips late, but also less skips overall, so it is not conclusive(!).

I observed similar behavoir with beta/smoothstep schedulers, which reduce the noise in those models. The scheduler spends more time in early and late timesteps of the sampling (s-cuve or gain function behavoir).

edit: Ofc this pr can be broken too. Also other factors is the quant used, which seems to exacerbate the noise.

edit2: Here is what smoothstep(almost beta) with 20 steps looks like:
output

@Green-Sky
Copy link
Contributor

I think it would be simpler if --easycache threshold,start_percent,end_percent is split into 3 or 4 commands. You generally just want to enable the safe default OR only modify threshold.
Or maybe make star and end optional? Not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants