This log is an extension for the least square problems (introduced in a previous log: The optimization set: QR factorization – ToothlessOS Log). Specifically, we will look at how to apply the results to non-linear regression problems.
Theory
Least-square problems
We recall that the least square solution is an approximate solution for overdetermined linear system , with the criteria of minimizing the residue
:
For these problems, we can compute the approximate solution easily: .
The Gauss-Newton method
Unfortunately, sometimes we cannot formulate problems in this nice linear form. Therefore, we have to make some adjustments: introducing the gauss-newton method!
That sounds really familiar to the Newton-Raphson solver mentioned in the previous log (The optimization set: Non-linear solvers – ToothlessOS Log). Actually, the ideas behind them are quite similar, in terms of the linear approximation used.
The following content is taken from Prof. Peng Sun’s slides of MATH304 at DKU.







Practice
In the practice section, we will implement the gauss-newton method on GDP data of China from scratch with MATLAB. This section is designed with reference to: Non Linear Regression Analysis | My Data Science Projects.
Step 1: Data preparation
For the dataset, we will use the China GDP data from 1960 to 2014. The dataset can downloaded here.
% Data preparation
data = readtable("china_gdp.csv");
% Preview
data
plot(data, "Year", "Value");
% Normalization
data.Year = data.Year / max(data.Year);
data.Value = data.Value / max(data.Value);
plot(data, "Year", "Value");
(Data preview)

(Normalized)

Step 2: Choose a model
Based on the growth trend of the data, we choose 2 non-linear models to fix it: the exponential function and the sigmoid/logistic function.
The exponential function:
, where and
is any real number
Now, in order to construct the matrix , we want to compute the partial derivatives of
with respect to
,
and
.
The sigmoid function:
Similarly, we need to find the partial derivatives:
Step 3: Implementation
When playing around with the implementation, I find that preprocessing of the data is very important. If the data is not aligned with the target function , we could face horrible divergence and numerical instability issues. This is especially important for exponential functions which can explodes to infinity!
Therefore, the implementation is a bit different for here. We fix the parameters = 0 and
= 1 and only update
, making it the simple form of the exponential function. At the same time, we shift and scale the original dataset so that it passed through (1, 1), aligning it with the exponential function. After this, we are able to get better fitting results.
% Fitting with exponential functions
% Data preparation
data = readtable("china_gdp.csv");
% Preview
data
plot(data, "Year", "Value");
% Normalization (scale and shift)
data_norm = data;
data_norm.Year = (data.Year / max(data.Year) - 1) * 200;
data_norm.Value = data.Value / max(data.Value);
plot(data_norm, "Year", "Value");
% Case 1: Using exponential function
% Initialize the parameters
a = 0;
b = 1;
c = 1;
% Parameter delta matrix
delta_A = zeros(1,1);
% Convergence threshold
eps = 1e-5;
% Set the max iterations here
for i=1:100
% Jacobian(Derivatives) matrix
% Vectorized implementation
Z = zeros(length(data_norm.Year),3);
% Partial derivative: a
Z(:,1) = 1;
% Partial derivative: b
Z(:,2) = c.^data_norm.Year;
% Partial derivative: c
Z(:,3) = b.*data_norm.Year.*c.^(data_norm.Year-1);
% The difference matrix
D = zeros(length(data_norm.Year),1);
D(:,1) = data_norm.Value - (a + b*c.^data_norm.Year);
% Now, we can solve for least square solution delta_A
% This tells us how to update the parameters
% We use QR factorization here
[Q, R] = qr(Z);
delta_A = Q'*D\R;
update = abs(delta_A(3)/c);
% a = a + delta_A(1);
% b = b + delta_A(2);
c = c + delta_A(3);
% Check convergence condition
if update < eps
disp("Convergence cond reach at " + i + " iteration")
break
end
end
disp("Update: " + update);
disp("Loss: " + norm(data_norm.Value - (a + b*c.^data_norm.Year)))
x_axis = linspace(-6,0,100);
y_axis = a + b*c.^x_axis;
plot(data_norm.Year, data_norm.Value, x_axis, y_axis);
Results: L2-Loss after normalization: 0.16822
(Visualization)

Unfortunately, we ran into divergence issues again with the sigmoid function.
% Data preparation
data = readtable("china_gdp.csv");
% Preview
data
plot(data, "Year", "Value");
% Normalization (scale and shift)
data_norm = data;
data_norm.Year = data.Year / max(data.Year);
data_norm.Value = data.Value / max(data.Value);
plot(data_norm, "Year", "Value");
% Case 1: Using exponential function
% Initialize the parameters
a = 0;
b = 1;
c = exp(1);
d = 0;
% Parameter delta matrix
delta_A = zeros(4,1);
% Convergence threshold
eps = 1e-5;
% Set the max iterations here
for i=1:1000
% Jacobian(Derivatives) matrix
% Vectorized implementation
Z = zeros(length(data_norm.Year),4);
% Partial derivative: a
Z(:,1) = 1;
% Partial derivative: b
Z(:,2) = 1/(1+c.^(data_norm.Year-d));
% Partial derivative: c
Z(:,3) = -(b.*(data_norm.Year-d).*c.^(data_norm.Year-d-1))./(1+c.^(data_norm.Year-d)).^2;
% Partial derivative: d
Z(:,4) = (b.*c.^(data_norm.Year-d).*log(c))./(1+c.^(data_norm.Year-d)).^2;
% The difference matrix
D = zeros(length(data_norm.Year),1);
D(:,1) = data_norm.Value - (a + b./(1+c.^(data_norm.Year-d)));
% Now, we can solve for least square solution delta_A
% This tells us how to update the parameters
% We use QR factorization here
[Q, R] = qr(Z);
delta_A = Q'*D\R;
update = max([abs(delta_A(1)/a) abs(delta_A(2)/b) abs(delta_A(3)/c) abs(delta_A(4)/d)]);
a = a + delta_A(1);
b = b + delta_A(2);
c = c + delta_A(3);
d = d + delta_A(4);
% Check convergence condition
if update < eps
disp("Convergence cond reach at " + i + " iteration")
break
end
end
disp("Update: " + update);
disp("Loss: " + norm(data_norm.Value - (a + b/(1+c.^(data_norm.Year-d)))));
x_axis = linspace(0.97,1,100);
y_axis = a + b./(1+c.^(x_axis-d));
plot(data_norm.Year, data_norm.Value, x_axis, y_axis);
legend(["original", "fit"])
The divergence issue probability lies in the hidden assumptions of the Gauss-Newton method. For this method to work, the first-order Taylor approximation must be effective. Otherwise, it is possible that we get completely lost!
发表回复