Modern diffusion models, like Stable Diffusion 3 and flux, have changed their basic model structure to flow-based models. Understanding flows is crucial and yet challenging. In this article, we try to decode flow-based diffusion modes with Rectified Flow, one of the most elegant solutions. Note that this article aims to explain the concepts from a relatively high level, and detailed proofs can be found in the original paper.

Problem

Given distribution $\pi_0, \pi_1$ , want to find a transfer map $T$ such that:

$Z_1 = T(Z_0) \sim \pi_1, \quad \text{where } Z_0 \sim \pi_0.$

Example: $\pi_0$ is a Gaussian, $\pi_1$ is the target distribution.

Math

(In understandable language)

Vector field $V(x)$ : A vector field defined in $\mathbb{R}^n$ that indicates the direction and magnitude at each point $x$ .
Velocity field $v(x,t)$ : A special type of vector field that indicates the rate of change and tangent direction of each point $x$ . Here, $t$ represents the time step.

Method:

Rectified Flow

We learn $T$ implicitly by constructing an Ordinary Differential Equation(ODE):

$\frac{d}{dt} Z_t = v(Z_t, t), \quad Z_0 \sim \pi_0, \quad \forall t \in [0,1].$

Intuitively, the best way to get $Z_1$ from $Z_0$ is to have a “straight” flow.

So we need to find an ODE to match the linear interpolation of points from $\pi_0$ and $\pi_1$ .

Observe $X_0 \sim \pi_0$ , $X_1 \sim \pi_1$ . Let $X_t$ for $t \in [0,1]$ be the linear interpolation of $X_1$ and $X_0$ :

$X_t = t X_1 + (1-t) X_0, \quad t \in [0,1].$

We can have a trivial ODE:

$\frac{d}{dt} X_t = X_1 - X_0, \quad \forall t \in [0,1].$

Note that this ODE is not causal or forward simulatable, as we need to know $X_1$ when $t < 1$ in order to calculate $(X_1 - X_0)$ .

So we want our ODE $\frac{d}{dt} Z_t = v(Z_t, t)$ to be as close as possible to the trivial ODE.

A simple way is to optimize $v$ by minimizing:

$\min_v \mathbb{E}_{X_0 \sim \pi_0, X_1 \sim \pi_1} \int_0^1 | (X_1 - X_0) - v(X_t, t) |^2 dt$

$where X_t = t X_1 + (1-t) X_0$

And $v$ can simply be a neural network.

Reflow: Straight Flows

The method above gives us the flow (b). In order to get flow (c), we can do:

$\min \int_0^1 \mathbb{E}_{X_0 \sim \pi_0, X_1 \sim \pi_1} | (X_1 - X_0) - \text{Flow}_1 (X_t) |^2 dt$

$where X_t = t X_1 + (1-t) X_0$

We call it 2-rectified flow.

Note that we can do such a process multiple times.
Always remember the Reflow process can speed up the generate process, and will not improve the generation performance.

Decoding Diffusion: On the perspective of Rectified Flow.

Problem

Math

Method:

Rectified Flow

Reflow: Straight Flows

评论

发表回复取消回复

Decoding Diffusion: On the perspective of Rectified Flow.

Problem

Math

Method:

Rectified Flow

Reflow: Straight Flows

评论

发表回复 取消回复

发表回复取消回复