Decoding Diffusion: On the perspective of Rectified Flow.

Modern diffusion models, like Stable Diffusion 3 and flux, have changed their basic model structure to flow-based models. Understanding flows is crucial and yet challenging. In this article, we try to decode flow-based diffusion modes with Rectified Flow, one of the most elegant solutions. Note that this article aims to explain the concepts from a relatively high level, and detailed proofs can be found in the original paper.

Problem

Given distribution \pi_0, \pi_1, want to find a transfer map T such that:

    \[Z_1 = T(Z_0) \sim \pi_1, \quad \text{where } Z_0 \sim \pi_0.\]

Example: \pi_0 is a Gaussian, \pi_1 is the target distribution.

Math

(In understandable language)

  • Vector field V(x): A vector field defined in \mathbb{R}^n that indicates the direction and magnitude at each point x.
  • Velocity field v(x,t): A special type of vector field that indicates the rate of change and tangent direction of each point x. Here, t represents the time step.

Method:

Rectified Flow

We learn T implicitly by constructing an Ordinary Differential Equation(ODE):

    \[\frac{d}{dt} Z_t = v(Z_t, t), \quad Z_0 \sim \pi_0, \quad \forall t \in [0,1].\]

Intuitively, the best way to get Z_1 from Z_0 is to have a “straight” flow.

So we need to find an ODE to match the linear interpolation of points from \pi_0 and \pi_1.

Observe X_0 \sim \pi_0, X_1 \sim \pi_1. Let X_t for t \in [0,1] be the linear interpolation of X_1 and X_0:

    \[X_t = t X_1 + (1-t) X_0, \quad t \in [0,1].\]

We can have a trivial ODE:

    \[\frac{d}{dt} X_t = X_1 - X_0, \quad \forall t \in [0,1].\]

  • Note that this ODE is not causal or forward simulatable, as we need to know X_1 when t < 1 in order to calculate (X_1 - X_0).

So we want our ODE \frac{d}{dt} Z_t = v(Z_t, t) to be as close as possible to the trivial ODE.

A simple way is to optimize v by minimizing:

    \[\min_v \mathbb{E}_{X_0 \sim \pi_0, X_1 \sim \pi_1} \int_0^1 | (X_1 - X_0) - v(X_t, t) |^2 dt\]



    \[where X_t = t X_1 + (1-t) X_0\]

And v can simply be a neural network.

Reflow: Straight Flows

The method above gives us the flow (b). In order to get flow (c), we can do:

    \[\min \int_0^1 \mathbb{E}_{X_0 \sim \pi_0, X_1 \sim \pi_1} | (X_1 - X_0) - \text{Flow}_1 (X_t) |^2 dt\]

    \[where X_t = t X_1 + (1-t) X_0\]

We call it 2-rectified flow.

  • Note that we can do such a process multiple times.
  • Always remember the Reflow process can speed up the generate process, and will not improve the generation performance.


评论

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注