Multi-Objective Optimization for Deep Learning

CVPR Tutorial

June 19, 2023

Standard Optimization in Deep Learning

  • Empirical Risk Minimization $$\underset{\theta}{\operatorname{arg min}} \frac{1}{T} \sum_t l\left(f(\mathbf{x}^t;\mathbf{\theta}),\mathbf{y}^t\right) + \lambda\Omega(\mathbf{\theta})$$
    • Goal: find a single optimal solution
    • Differentiable loss functions and regularizer.
      • cross-entropy, hinge loss, MSE, $l_2$, $l_1$, etc.

Emerging Class of Optimization Problems

  • Multi-Objective Optimization $$ \underset{\theta}{\operatorname{arg min}} (f_1(\mathbf{X};\theta), f_2(\mathbf{X};\theta), \ldots, f_k(\mathbf{X};\theta))$$
  • Feature: objectives may be in conflict with each other
  • Goal: find a multiple pareto-optimal solutions
  • Loss functions: may or may not be differentiable
  • Optimization variables: continuous, discrete, mixed interger, etc.

Where do multiple objectives arise in machine learning?

Application (1): Fairness: Utility-Fairness Trade-Off

    On Characterizing the Trade-off in Invariant Representation Learning, TMLR 2022

Application (2): Explainability: Accuracy-Explainability Trade-Off

Objectives: accuracy and ease of interpretability

    It's Just Not That Simple: An Empirical Study of the Accuracy-Explainability Trade-off in Machine Learning for Public Policy, FAccT 2022

Application (3): Neural Architecture Search

Objectives: accuracy and latency/FLOPS/memory usage

  • Lu et. al. Neural Architecture Transfer, TPAMI 2021

Application (4): Multi-Task Learning

Objectives: loss for each task

  • Sener and Koltun, Multi-Task Learning as Multi-Objective Optimization, NeurIPS 2018.

Application (5): Neural Networks over Encrypted Data

Objectives: accuracy and latency

  • Wei and Boddeti, AutoFHE: Automated Adaption of CNNs for Efficient Evaluation over FHE, Cryptology ePrint Archive 2023

Application (6): Discovering Learning Algorithms from Scratch

Objectives: error and complexity of algorithm

  • Guha et. al. MOAZ: A Multi-Objective AutoML-Zero Framework, GECCO 2023

Application (7): Learning Robotic Adaptation Policies from Scratch

  • Kelly et. al. Discovering Adaptable Symbolic Algorithms from Scratch, IROS 2023

Most Common Solution Multi Objective Optimization

Weighted Sum Scalarization

$$ \begin{aligned} \min & \text{ } g^{ws}(x|\lambda) = \lambda_1f_1(x) + \lambda_2f_2(x) \\ s.t. & \text{ } \lambda_1 + \lambda_2 = 1 \\ & \text{ } \lambda_1, \lambda_2 \geq 0 \end{aligned} $$

The problem with scalarization

$$ \begin{aligned} \min & \text{ } g^{ws}(x|\lambda) = \lambda_1f_1(x) + \lambda_2f_2(x) \\ s.t. & \text{ } \lambda_1 + \lambda_2 = 1 \\ & \text{ } \lambda_1, \lambda_2 \geq 0 \end{aligned} $$
Convex Pareto Front

Works well for convex pareto-fronts, but fails otherwise.

What is this tutorial about?

Going beyond scalarization

Adaptive Methods
Pareto MTL
Pareto Set Learning

Foundational Concepts

Non-Dominated Sorting Genetic Algorithm

... and more.

A Few Applications

Multi-Task Learning
Designing CNNs for Secure Inference
Neural Architecture Search

Hands-On Demo