Multi-Objective Optimization for Deep Learning
CVPR Tutorial
June 19, 2023
Standard Optimization in Deep Learning
- Empirical Risk Minimization
$$\underset{\theta}{\operatorname{arg min}} \frac{1}{T} \sum_t l\left(f(\mathbf{x}^t;\mathbf{\theta}),\mathbf{y}^t\right) + \lambda\Omega(\mathbf{\theta})$$
- Goal: find a single optimal solution
- Differentiable loss functions and regularizer.
- cross-entropy, hinge loss, MSE, $l_2$, $l_1$, etc.
Emerging Class of Optimization Problems
- Multi-Objective Optimization
$$ \underset{\theta}{\operatorname{arg min}} (f_1(\mathbf{X};\theta), f_2(\mathbf{X};\theta), \ldots, f_k(\mathbf{X};\theta))$$
- Feature: objectives may be in conflict with each other
- Goal: find a multiple pareto-optimal solutions
- Loss functions: may or may not be differentiable
- Optimization variables: continuous, discrete, mixed interger, etc.
Most Common Solution Multi Objective Optimization
Weighted Sum Scalarization
$$
\begin{aligned}
\min & \text{ } g^{ws}(x|\lambda) = \lambda_1f_1(x) + \lambda_2f_2(x) \\
s.t. & \text{ } \lambda_1 + \lambda_2 = 1 \\
& \text{ } \lambda_1, \lambda_2 \geq 0
\end{aligned}
$$
- Solve optimization problem separately for each choice of $\lambda$.
The problem with scalarization
$$
\begin{aligned}
\min & \text{ } g^{ws}(x|\lambda) = \lambda_1f_1(x) + \lambda_2f_2(x) \\
s.t. & \text{ } \lambda_1 + \lambda_2 = 1 \\
& \text{ } \lambda_1, \lambda_2 \geq 0
\end{aligned}
$$
Convex Pareto Front
Concave Pareto Front
Works well for convex pareto-fronts, but fails otherwise.