Designing CNNs for Secure Inference

WCCI Tutorial

Vishnu Boddeti

June 30, 2024

Yokohama, Japan 2024

Deep Learning as a Service (DLaaS)

Internet

Prediction

Customer

Prediction

Prediction

Cloud

Secure DLaaS under Fully Homomorphic Encryption (FHE)

Prediction

Prediction

Customer

Cloud

Internet

Convolutional Neural Networks (CNNs)

Prediction

Convolution

BatchNorm

ReLU

Convolution

BatchNorm

ReLU

Block

Convolution

BatchNorm

ReLU

AvgPool

Fully Connected

\times N

Prediction

AvgPool

Fully Connected

\times N

CNNs under Homomorphic Encryption (HE)

  • Multiplication
  • Addition
  • Rotation

Convolution

BatchNorm

ReLU

Block

BatchNorm

Convolution

ReLU

BatchNorm

Convolution

ReLU

Polynomial

\color{red}{\text{ReLU}(x)=\max(x, 0)}

Polynomial

Polynomial

\color{green}{\text{Polynomial}(x) = \sum_i a_i x^i}
Approximate

Convolution

BatchNorm

BatchNorm

Convolution

Polynomial

Polynomial

Deep CNNs under Fully Homomorphic Encryption (FHE)

Level: 
number of multiplications allowed to evaluate

Level 

L

Level 

0

Level 

L-K
  • High Latency
  • High Memory Footprint
\vdots

Bootstrapping

\vdots
\vdots

Bootstrapping

  • Security Requirement
  • Prediction Accuracy
  • Inference Latency

Deep CNNs under Fully Homomorphic Encryption (FHE)

Cryptographic Parameters

  • Security Requirement
  • Prediction Accuracy
  • Inference Latency

Cyclotomic polynomial degree

N

Level

L

Modulus

Q_\ell=\prod_{i=0}^\ell q_\ell, 0\leq\ell\leq L

Bootstrapping depth

K
h

Hamming weight

Deep CNNs under Fully Homomorphic Encryption (FHE)

Conv, BN, pooling,  FC layers: packing

Polynomials: degree -> depth

Number of layers: ResNet20, ResNet32

Channels/kernels

Input image resolution

Polynomial CNNS

  • Security Requirement
  • Prediction Accuracy
  • Inference Latency

Cryptographic Parameters

Cyclotomic polynomial degree

N

Level

L

Modulus

Q_\ell=\prod_{i=0}^\ell q_\ell, 0\leq\ell\leq L

Bootstrapping depth

K
h

Hamming weight

Deep CNNs under Fully Homomorphic Encryption (FHE)

Conv, BN, pooling,  FC layers: packing

Polynomials: degree -> depth

Number of layers: ResNet20, ResNet32

Channels/kernels

Input image resolution

Polynomial CNNS

  • Security Requirement
  • Prediction Accuracy
  • Inference Latency

Cryptographic Parameters

Cyclotomic polynomial degree

N

Level

L

Modulus

Q_\ell=\prod_{i=0}^\ell q_\ell, 0\leq\ell\leq L

Bootstrapping depth

K
h

Hamming weight

Deep CNNs under Fully Homomorphic Encryption (FHE)

Hand-crafted Design of Polynomial for CNNs under FHE

High-degree Polynomial

Hand-crafted Design of Polynomial for CNNs under FHE

High-degree Polynomial

Low-degree Polynomial

Hand-crafted Design of Polynomial for CNNs under FHE

High-degree Polynomial

Low-degree Polynomial

Polynomial Neural Architectures 

How to obtain all possible polynomial neural architectures?
Key Insight

Optimize the

end-to-end polynomial neural architecture
instead of the polynomial function

Optimization of End-to-End Polynomial Neural Architecture

Poly

BN

Conv

BN

Conv

Poly

Poly

BN

Conv

BN

Conv

Poly

Poly

BN

Conv

BN

Conv

Poly

\vdots

Poly 1

Poly 2

Poly 3

Poly 4

Poly 5

Poly 6

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Optimization of End-to-End Polynomial Neural Architecture

  • I want a faster response
  • I can wait for an accurate result

To meet different requirements in real world

AutoFHE: Automated Adaption of CNNs under FHE

Non-Arithmetic Neural Network

Polynomial Neural Nets

AutoFHE

EvoReLU: Evolutionary Mixed-Degree Polynomial Approximation of ReLU

[2] Lee, Eunsang, Joon-Woo Lee, Jong-Seon No, and Young-Sik Kim. "Minimax approximation of sign function by composite polynomial for homomorphic comparison." IEEE Transactions on Dependable and Secure Computing 19, no. 6 (2021): 3711-3727.

Forward Propagation

\mathrm{EvoReLU}(x) = \begin{cases} x, &\;d=1\\ \alpha_2x^2 + \alpha_1 x + \alpha_0, &\;d=2\\ x\cdot \left( \mathcal{F}(x) + 0.5 \right), &\; d > 2 \\ \end{cases}
\mathcal{F}(x)=(f^{d_K}_K\circ \cdots \circ f^{d_k}_{k}\circ \cdots \circ f^{d_1}_1)(x), 1\leq k\leq K

High-degree composite polynomial [2]:

  • Pruning: DeepReDuce, SAFENet, Delphi
  • Quadratic: LoLa,  CryptoNets, HEMET
  • High-degree approximation: MPCNN

Differential Evolution

How to optimize end-to-end polynomial neural architecture?
Multi-Objective evolutionary optimization

Joint Search for Layerwise EvoReLU and Bootstrapping Operations

Poly

BN

Conv

BN

Conv

Poly

Poly

BN

Conv

BN

Conv

Poly

Poly

BN

Conv

BN

Conv

Poly

\vdots

EvoReLU 1

EvoReLU 2

EvoReLU 2

EvoReLU 4

EvoReLU 5

EvoReLU 6

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Joint search problem

Multi-objective optimization

  • Flexible Architecture
  • On-demand Bootstrapping

Multi-Objective Optimization

  • Accuracy
  • Latency

Single Objective

  • Only generate a single solution
  • Hard to tune balancing weights
  • Not Pareto optimal
  • Multiple solutions on the Pareto front
  • No need to tune weights
  • Pareto optimal

Scalarization of Multiple Objectives

\alpha \cdot \text{Accuracy} + \beta \cdot \text{Latency}

Multi-Objective Optimization

\min\left\{1-\text{Accuracy}, \text{\#Bootstrapping}\right\}

Evolutionary Multi-Objective Optimization

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Evolutionary Multi-Objective Optimization

#Layers

population size

x_1: \text{EvoReLU}_{1 1},\text{EvoReLU}_{1 2},\text{EvoReLU}_{1 3},\text{EvoReLU}_{1 4},\cdots
x_2: \text{EvoReLU}_{2 1},\text{EvoReLU}_{2 2},\text{EvoReLU}_{2 3},\text{EvoReLU}_{2 4},\cdots
x_3: \text{EvoReLU}_{3 1},\text{EvoReLU}_{3 2},\text{EvoReLU}_{3 3},\text{EvoReLU}_{3 4},\cdots
x_4: \text{EvoReLU}_{4 1},\text{EvoReLU}_{4 2},\text{EvoReLU}_{4 3},\text{EvoReLU}_{4 4},\cdots

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Evolutionary Multi-Objective Optimization

x_1: \text{EvoReLU}_{1 1},\text{EvoReLU}_{1 2},\text{EvoReLU}_{1 3},\text{EvoReLU}_{1 4},\cdots
x_2: \text{EvoReLU}_{2 1},\text{EvoReLU}_{2 2},\text{EvoReLU}_{2 3},\text{EvoReLU}_{2 4},\cdots
x_1^\prime: \text{EvoReLU}_{\color{red} 2 1},\text{EvoReLU}_{1 2},\text{EvoReLU}_{\color{red} 2 3},\text{EvoReLU}_{1 4},\cdots
x_2^\prime: \text{EvoReLU}_{\color{red} 1 1},\text{EvoReLU}_{2 2},\text{EvoReLU}_{\color{red} 1 3},\text{EvoReLU}_{2 4},\cdots

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Evolutionary Multi-Objective Optimization

x_1: \text{EvoReLU}_{1 1},\text{EvoReLU}_{1 2},\text{EvoReLU}_{1 3},\text{EvoReLU}_{1 4},\cdots
x_2: \text{EvoReLU}_{2 1},\text{EvoReLU}_{2 2},\text{EvoReLU}_{2 3},\text{EvoReLU}_{2 4},\cdots
x_3: \text{EvoReLU}_{3 1},\text{EvoReLU}_{3 2},\text{EvoReLU}_{3 3},\text{EvoReLU}_{3 4},\cdots
x_4: \text{EvoReLU}_{4 1},\text{EvoReLU}_{4 2},\text{EvoReLU}_{4 3},\text{EvoReLU}_{4 4},\cdots
x_1^\prime: \text{EvoReLU}_{1 1},\text{EvoReLU}_{1 2},{\color{red} \text{EvoReLU}_{1 3}^\prime},\text{EvoReLU}_{1 4},\cdots
x_2^\prime: \text{EvoReLU}_{2 1},\text{EvoReLU}_{2 2},{\color{red} \text{EvoReLU}_{2 3}^\prime},\text{EvoReLU}_{2 4},\cdots
x_3^\prime: {\color{red} \text{EvoReLU}_{3 1}^\prime},\text{EvoReLU}_{3 2},\text{EvoReLU}_{3 3},\text{EvoReLU}_{3 4},\cdots
x_4^\prime: \text{EvoReLU}_{4 1}, { \color{red} \text{EvoReLU}_{4 2}^\prime},\text{EvoReLU}_{4 3},\text{EvoReLU}_{4 4},\cdots

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Evolutionary Multi-Objective Optimization

x_3~\text{dominates}~x_6, x_7, \text{and}~x_8 \\ \text{i.e.}~x_3~\text{is better than}~x_6, x_7, \text{and}~x_8

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Evolutionary Multi-Objective Optimization

\text{crowding distance of}~x_3~\text{is}~ \frac{L_1+L_2}{2}

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Dataset: CIFAR10

50,000 training images

10,000 test images

32x32 resolution, 10 classes

Hardware & Software

Amazon AWS, r5.24xlarge

96 CPUs, 768 GB RAM

Microsoft SEAL, 3.6

Experimental Setup

Latency and Accuracy Trade-offs under FHE

Approach MPCNN
Venue ICML22
Scheme CKKS
Polynomial high-degree
Layerwise no
Strategy approx
Arch manual

Latency and Accuracy Trade-offs under FHE

Approach MPCNN AESPA
Venue ICML22 arXiv22
Scheme CKKS CKKS
Polynomial high-degree low-degree
Layerwise no no
Strategy approx train
Arch manual manual

Latency and Accuracy Trade-offs under FHE

Approach MPCNN AESPA REDsec
Venue ICML22 arXiv22 NDSS23
Scheme CKKS CKKS TFHE
Polynomial high-degree low-degree n/a
Layerwise no no n/a
Strategy approx train train
Arch manual manual manual

Latency and Accuracy Trade-offs under FHE

Approach MPCNN AESPA REDsec AutoFHE
Venue ICML22 arXiv22 NDSS23 USENIX24
Scheme CKKS CKKS TFHE CKKS
Polynomial high-degree low-degree n/a mixed
Layerwise no no n/a yes
Strategy approx train train adapt
Arch manual manual manual search

Multiplicative Depth of Layerwise EvoReLU

Layerwise EvoReLU

Conclusion
  • Multi-objective optimization generates Pareto-effective solutions to meet different requirements
  • Joint optimization of layerwise EvoReLU and bootstrapping results in optimal polynomial neural architectures
AutoFHE optimizes end-to-end polynomial neural architecture