Skip to content

Commit f3db38c

Browse files
authored
ArXiv -> HF Papers (#12583)
* Update pipeline_skyreels_v2_i2v.py * Update README.md * Update torch_utils.py * Update torch_utils.py * Update guider_utils.py * Update pipeline_ltx.py * Update pipeline_bria.py * Apply suggestion from @qgallouedec * Update autoencoder_kl_qwenimage.py * Update pipeline_prx.py * Update pipeline_wan_vace.py * Update pipeline_skyreels_v2.py * Update pipeline_skyreels_v2_diffusion_forcing.py * Update pipeline_bria_fibo.py * Update pipeline_skyreels_v2_diffusion_forcing_i2v.py * Update pipeline_ltx_condition.py * Update pipeline_ltx_image2video.py * Update regional_prompting_stable_diffusion.py * make style * style * style
1 parent f5e5f34 commit f3db38c

16 files changed

+67
-64
lines changed

examples/community/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5488,7 +5488,7 @@ Editing at Scale", many thanks to their contribution!
54885488

54895489
This implementation of Flux Kontext allows users to pass multiple reference images. Each image is encoded separately, and the resulting latent vectors are concatenated.
54905490

5491-
As explained in Section 3 of [the paper](https://arxiv.org/pdf/2506.15742), the model's sequence concatenation mechanism can extend its capabilities to handle multiple reference images. However, note that the current version of Flux Kontext was not trained for this use case. In practice, stacking along the first axis does not yield correct results, while stacking along the other two axes appears to work.
5491+
As explained in Section 3 of [the paper](https://huggingface.co/papers/2506.15742), the model's sequence concatenation mechanism can extend its capabilities to handle multiple reference images. However, note that the current version of Flux Kontext was not trained for this use case. In practice, stacking along the first axis does not yield correct results, while stacking along the other two axes appears to work.
54925492

54935493
## Example Usage
54945494

examples/community/regional_prompting_stable_diffusion.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -490,7 +490,7 @@ def hook_forwards(root_module: torch.nn.Module):
490490
def prepare_extra_step_kwargs(self, generator, eta):
491491
# prepare extra kwargs for the scheduler step, since not all schedulers have the same signature
492492
# eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.
493-
# eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502
493+
# eta corresponds to η in DDIM paper: https://huggingface.co/papers/2010.02502
494494
# and should be between [0, 1]
495495

496496
accepts_eta = "eta" in set(inspect.signature(self.scheduler.step).parameters.keys())
@@ -841,7 +841,7 @@ def stable_diffusion_call(
841841
num_images_per_prompt (`int`, *optional*, defaults to 1):
842842
The number of images to generate per prompt.
843843
eta (`float`, *optional*, defaults to 0.0):
844-
Corresponds to parameter eta (η) from the [DDIM](https://arxiv.org/abs/2010.02502) paper. Only applies
844+
Corresponds to parameter eta (η) from the [DDIM](https://huggingface.co/papers/2010.02502) paper. Only applies
845845
to the [`~schedulers.DDIMScheduler`], and is ignored in other schedulers.
846846
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
847847
A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make
@@ -872,7 +872,7 @@ def stable_diffusion_call(
872872
[`self.processor`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
873873
guidance_rescale (`float`, *optional*, defaults to 0.0):
874874
Guidance rescale factor from [Common Diffusion Noise Schedules and Sample Steps are
875-
Flawed](https://arxiv.org/pdf/2305.08891.pdf). Guidance rescale factor should fix overexposure when
875+
Flawed](https://huggingface.co/papers/2305.08891). Guidance rescale factor should fix overexposure when
876876
using zero terminal SNR.
877877
clip_skip (`int`, *optional*):
878878
Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that
@@ -1062,7 +1062,7 @@ def stable_diffusion_call(
10621062
noise_pred = noise_pred_uncond + self.guidance_scale * (noise_pred_text - noise_pred_uncond)
10631063

10641064
if self.do_classifier_free_guidance and self.guidance_rescale > 0.0:
1065-
# Based on 3.4. in https://arxiv.org/pdf/2305.08891.pdf
1065+
# Based on 3.4. in https://huggingface.co/papers/2305.08891
10661066
noise_pred = rescale_noise_cfg(noise_pred, noise_pred_text, guidance_rescale=self.guidance_rescale)
10671067

10681068
# compute the previous noisy sample x_t -> x_t-1
@@ -1668,7 +1668,7 @@ def rescale_noise_cfg(noise_cfg, noise_pred_text, guidance_rescale=0.0):
16681668
r"""
16691669
Rescales `noise_cfg` tensor based on `guidance_rescale` to improve image quality and fix overexposure. Based on
16701670
Section 3.4 from [Common Diffusion Noise Schedules and Sample Steps are
1671-
Flawed](https://arxiv.org/pdf/2305.08891.pdf).
1671+
Flawed](https://huggingface.co/papers/2305.08891).
16721672
16731673
Args:
16741674
noise_cfg (`torch.Tensor`):

src/diffusers/guiders/guider_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -373,7 +373,7 @@ def rescale_noise_cfg(noise_cfg, noise_pred_text, guidance_rescale=0.0):
373373
r"""
374374
Rescales `noise_cfg` tensor based on `guidance_rescale` to improve image quality and fix overexposure. Based on
375375
Section 3.4 from [Common Diffusion Noise Schedules and Sample Steps are
376-
Flawed](https://arxiv.org/pdf/2305.08891.pdf).
376+
Flawed](https://huggingface.co/papers/2305.08891).
377377
378378
Args:
379379
noise_cfg (`torch.Tensor`):

src/diffusers/models/autoencoders/autoencoder_kl_qwenimage.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
# QwenImageVAE is further fine-tuned from the Wan Video VAE to achieve improved performance.
1717
# For more information about the Wan VAE, please refer to:
1818
# - GitHub: https://github.com/Wan-Video/Wan2.1
19-
# - arXiv: https://arxiv.org/abs/2503.20314
19+
# - Paper: https://huggingface.co/papers/2503.20314
2020

2121
from typing import List, Optional, Tuple, Union
2222

src/diffusers/pipelines/bria/pipeline_bria.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -245,7 +245,7 @@ def guidance_scale(self):
245245
return self._guidance_scale
246246

247247
# here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
248-
# of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`
248+
# of the Imagen paper: https://huggingface.co/papers/2205.11487 . `guidance_scale = 1`
249249
# corresponds to doing no classifier free guidance.
250250
@property
251251
def do_classifier_free_guidance(self):
@@ -489,11 +489,11 @@ def __call__(
489489
in their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is
490490
passed will be used. Must be in descending order.
491491
guidance_scale (`float`, *optional*, defaults to 5.0):
492-
Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).
493-
`guidance_scale` is defined as `w` of equation 2. of [Imagen
494-
Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale >
495-
1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`,
496-
usually at the expense of lower image quality.
492+
Guidance scale as defined in [Classifier-Free Diffusion
493+
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
494+
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
495+
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
496+
the text `prompt`, usually at the expense of lower image quality.
497497
negative_prompt (`str` or `List[str]`, *optional*):
498498
The prompt or prompts not to guide the image generation. If not defined, one has to pass
499499
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is

src/diffusers/pipelines/bria_fibo/pipeline_bria_fibo.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -337,7 +337,7 @@ def guidance_scale(self):
337337
return self._guidance_scale
338338

339339
# here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
340-
# of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`
340+
# of the Imagen paper: https://huggingface.co/papers/2205.11487 . `guidance_scale = 1`
341341
# corresponds to doing no classifier free guidance.
342342

343343
@property
@@ -498,11 +498,11 @@ def __call__(
498498
in their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is
499499
passed will be used. Must be in descending order.
500500
guidance_scale (`float`, *optional*, defaults to 5.0):
501-
Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).
502-
`guidance_scale` is defined as `w` of equation 2. of [Imagen
503-
Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale >
504-
1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`,
505-
usually at the expense of lower image quality.
501+
Guidance scale as defined in [Classifier-Free Diffusion
502+
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
503+
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
504+
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
505+
the text `prompt`, usually at the expense of lower image quality.
506506
negative_prompt (`str` or `List[str]`, *optional*):
507507
The prompt or prompts not to guide the image generation. If not defined, one has to pass
508508
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is

src/diffusers/pipelines/ltx/pipeline_ltx.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -590,9 +590,10 @@ def __call__(
590590
the text `prompt`, usually at the expense of lower image quality.
591591
guidance_rescale (`float`, *optional*, defaults to 0.0):
592592
Guidance rescale factor proposed by [Common Diffusion Noise Schedules and Sample Steps are
593-
Flawed](https://arxiv.org/pdf/2305.08891.pdf) `guidance_scale` is defined as `φ` in equation 16. of
594-
[Common Diffusion Noise Schedules and Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf).
595-
Guidance rescale factor should fix overexposure when using zero terminal SNR.
593+
Flawed](https://huggingface.co/papers/2305.08891) `guidance_scale` is defined as `φ` in equation 16. of
594+
[Common Diffusion Noise Schedules and Sample Steps are
595+
Flawed](https://huggingface.co/papers/2305.08891). Guidance rescale factor should fix overexposure when
596+
using zero terminal SNR.
596597
num_videos_per_prompt (`int`, *optional*, defaults to 1):
597598
The number of videos to generate per prompt.
598599
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
@@ -777,7 +778,7 @@ def __call__(
777778
noise_pred = noise_pred_uncond + self.guidance_scale * (noise_pred_text - noise_pred_uncond)
778779

779780
if self.guidance_rescale > 0:
780-
# Based on 3.4. in https://arxiv.org/pdf/2305.08891.pdf
781+
# Based on 3.4. in https://huggingface.co/papers/2305.08891
781782
noise_pred = rescale_noise_cfg(
782783
noise_pred, noise_pred_text, guidance_rescale=self.guidance_rescale
783784
)

src/diffusers/pipelines/ltx/pipeline_ltx_condition.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -927,9 +927,10 @@ def __call__(
927927
the text `prompt`, usually at the expense of lower image quality.
928928
guidance_rescale (`float`, *optional*, defaults to 0.0):
929929
Guidance rescale factor proposed by [Common Diffusion Noise Schedules and Sample Steps are
930-
Flawed](https://arxiv.org/pdf/2305.08891.pdf) `guidance_scale` is defined as `φ` in equation 16. of
931-
[Common Diffusion Noise Schedules and Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf).
932-
Guidance rescale factor should fix overexposure when using zero terminal SNR.
930+
Flawed](https://huggingface.co/papers/2305.08891) `guidance_scale` is defined as `φ` in equation 16. of
931+
[Common Diffusion Noise Schedules and Sample Steps are
932+
Flawed](https://huggingface.co/papers/2305.08891). Guidance rescale factor should fix overexposure when
933+
using zero terminal SNR.
933934
num_videos_per_prompt (`int`, *optional*, defaults to 1):
934935
The number of videos to generate per prompt.
935936
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
@@ -1194,7 +1195,7 @@ def __call__(
11941195
timestep, _ = timestep.chunk(2)
11951196

11961197
if self.guidance_rescale > 0:
1197-
# Based on 3.4. in https://arxiv.org/pdf/2305.08891.pdf
1198+
# Based on 3.4. in https://huggingface.co/papers/2305.08891
11981199
noise_pred = rescale_noise_cfg(
11991200
noise_pred, noise_pred_text, guidance_rescale=self.guidance_rescale
12001201
)

src/diffusers/pipelines/ltx/pipeline_ltx_image2video.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -654,9 +654,10 @@ def __call__(
654654
the text `prompt`, usually at the expense of lower image quality.
655655
guidance_rescale (`float`, *optional*, defaults to 0.0):
656656
Guidance rescale factor proposed by [Common Diffusion Noise Schedules and Sample Steps are
657-
Flawed](https://arxiv.org/pdf/2305.08891.pdf) `guidance_scale` is defined as `φ` in equation 16. of
658-
[Common Diffusion Noise Schedules and Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf).
659-
Guidance rescale factor should fix overexposure when using zero terminal SNR.
657+
Flawed](https://huggingface.co/papers/2305.08891) `guidance_scale` is defined as `φ` in equation 16. of
658+
[Common Diffusion Noise Schedules and Sample Steps are
659+
Flawed](https://huggingface.co/papers/2305.08891). Guidance rescale factor should fix overexposure when
660+
using zero terminal SNR.
660661
num_videos_per_prompt (`int`, *optional*, defaults to 1):
661662
The number of videos to generate per prompt.
662663
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
@@ -851,7 +852,7 @@ def __call__(
851852
timestep, _ = timestep.chunk(2)
852853

853854
if self.guidance_rescale > 0:
854-
# Based on 3.4. in https://arxiv.org/pdf/2305.08891.pdf
855+
# Based on 3.4. in https://huggingface.co/papers/2305.08891
855856
noise_pred = rescale_noise_cfg(
856857
noise_pred, noise_pred_text, guidance_rescale=self.guidance_rescale
857858
)

src/diffusers/pipelines/prx/pipeline_prx.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -536,11 +536,11 @@ def __call__(
536536
in their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is
537537
passed will be used. Must be in descending order.
538538
guidance_scale (`float`, *optional*, defaults to 4.0):
539-
Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).
540-
`guidance_scale` is defined as `w` of equation 2. of [Imagen
541-
Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale >
542-
1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`,
543-
usually at the expense of lower image quality.
539+
Guidance scale as defined in [Classifier-Free Diffusion
540+
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
541+
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
542+
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
543+
the text `prompt`, usually at the expense of lower image quality.
544544
num_images_per_prompt (`int`, *optional*, defaults to 1):
545545
The number of images to generate per prompt.
546546
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):

0 commit comments

Comments
 (0)