AutoFHE

Designing CNNs for Secure Inference

WCCI Tutorial

Vishnu Boddeti

June 30, 2024

Yokohama, Japan 2024

Deep Learning as a Service (DLaaS)

Internet

Prediction

Customer

Prediction

Cloud

Secure DLaaS under Fully Homomorphic Encryption (FHE)

Prediction

Customer

Cloud

Internet

Convolutional Neural Networks (CNNs)

Prediction

Convolution

BatchNorm

ReLU

Convolution

BatchNorm

ReLU

Block

Convolution

BatchNorm

ReLU

AvgPool

Fully Connected

\times N

Prediction

AvgPool

Fully Connected

\times N

CNNs under Homomorphic Encryption (HE)

Multiplication
Addition
Rotation

Convolution

BatchNorm

ReLU

Block

BatchNorm

Convolution

ReLU

BatchNorm

Convolution

ReLU

Polynomial

\color{red}{\text{ReLU}(x)=\max(x, 0)}

Polynomial

\color{green}{\text{Polynomial}(x) = \sum_i a_i x^i}

Approximate

Convolution

BatchNorm

Convolution

Polynomial

Deep CNNs under Fully Homomorphic Encryption (FHE)

Level: 
number of multiplications allowed to evaluate

Level

L-K

High Latency
High Memory Footprint

\vdots

Bootstrapping

\vdots

Bootstrapping

Security Requirement

Prediction Accuracy

Inference Latency

Deep CNNs under Fully Homomorphic Encryption (FHE)

Cryptographic Parameters

Security Requirement

Prediction Accuracy

Inference Latency

Cyclotomic polynomial degree

Level

Modulus

Q_\ell=\prod_{i=0}^\ell q_\ell, 0\leq\ell\leq L

Bootstrapping depth

Hamming weight

Deep CNNs under Fully Homomorphic Encryption (FHE)

Conv, BN, pooling, FC layers: packing

Polynomials: degree -> depth

Number of layers: ResNet20, ResNet32

Channels/kernels

Input image resolution

Polynomial CNNS

Security Requirement

Prediction Accuracy

Inference Latency

Cryptographic Parameters

Cyclotomic polynomial degree

Level

Modulus

Q_\ell=\prod_{i=0}^\ell q_\ell, 0\leq\ell\leq L

Bootstrapping depth

Hamming weight

Deep CNNs under Fully Homomorphic Encryption (FHE)

Conv, BN, pooling, FC layers: packing

Polynomials: degree -> depth

Number of layers: ResNet20, ResNet32

Channels/kernels

Input image resolution

Polynomial CNNS

Security Requirement

Prediction Accuracy

Inference Latency

Cryptographic Parameters

Cyclotomic polynomial degree

Level

Modulus

Q_\ell=\prod_{i=0}^\ell q_\ell, 0\leq\ell\leq L

Bootstrapping depth

Hamming weight

Deep CNNs under Fully Homomorphic Encryption (FHE)

Hand-crafted Design of Polynomial for CNNs under FHE

High-degree Polynomial

Hand-crafted Design of Polynomial for CNNs under FHE

High-degree Polynomial

Low-degree Polynomial

Hand-crafted Design of Polynomial for CNNs under FHE

High-degree Polynomial

Low-degree Polynomial

Polynomial Neural Architectures

How to obtain all possible polynomial neural architectures?

Key Insight

Optimize the

end-to-end polynomial neural architecture

instead of the polynomial function

Optimization of End-to-End Polynomial Neural Architecture

Poly

Conv

Poly

Conv

Poly

Conv

Poly

\vdots

Poly 1

Poly 2

Poly 3

Poly 4

Poly 5

Poly 6

Boot

Optimization of End-to-End Polynomial Neural Architecture

I want a faster response
I can wait for an accurate result

To meet different requirements in real world

AutoFHE: Automated Adaption of CNNs under FHE

Non-Arithmetic Neural Network

Polynomial Neural Nets

AutoFHE

EvoReLU: Evolutionary Mixed-Degree Polynomial Approximation of ReLU

[2] Lee, Eunsang, Joon-Woo Lee, Jong-Seon No, and Young-Sik Kim. "Minimax approximation of sign function by composite polynomial for homomorphic comparison." IEEE Transactions on Dependable and Secure Computing 19, no. 6 (2021): 3711-3727.

Forward Propagation

\mathrm{EvoReLU}(x) = \begin{cases} x, &\;d=1\\ \alpha_2x^2 + \alpha_1 x + \alpha_0, &\;d=2\\ x\cdot \left( \mathcal{F}(x) + 0.5 \right), &\; d > 2 \\ \end{cases}

\mathcal{F}(x)=(f^{d_K}_K\circ \cdots \circ f^{d_k}_{k}\circ \cdots \circ f^{d_1}_1)(x), 1\leq k\leq K

High-degree composite polynomial [2]:

Pruning: DeepReDuce, SAFENet, Delphi
Quadratic: LoLa, CryptoNets, HEMET
High-degree approximation: MPCNN

Differential Evolution

How to optimize end-to-end polynomial neural architecture?

Multi-Objective evolutionary optimization

Joint Search for Layerwise EvoReLU and Bootstrapping Operations

Poly

Conv

Poly

Conv

Poly

Conv

Poly

\vdots

EvoReLU 1

EvoReLU 2

EvoReLU 4

EvoReLU 5

EvoReLU 6

Boot

Joint search problem

Multi-objective optimization

Flexible Architecture
On-demand Bootstrapping

Multi-Objective Optimization

Accuracy

Latency

Single Objective

Only generate a single solution
Hard to tune balancing weights
Not Pareto optimal

Multiple solutions on the Pareto front
No need to tune weights
Pareto optimal

Scalarization of Multiple Objectives

\alpha \cdot \text{Accuracy} + \beta \cdot \text{Latency}

Multi-Objective Optimization

\min\left\{1-\text{Accuracy}, \text{\#Bootstrapping}\right\}

Evolutionary Multi-Objective Optimization

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Evolutionary Multi-Objective Optimization

#Layers

population size

x_1: \text{EvoReLU}_{1 1},\text{EvoReLU}_{1 2},\text{EvoReLU}_{1 3},\text{EvoReLU}_{1 4},\cdots

x_2: \text{EvoReLU}_{2 1},\text{EvoReLU}_{2 2},\text{EvoReLU}_{2 3},\text{EvoReLU}_{2 4},\cdots

x_3: \text{EvoReLU}_{3 1},\text{EvoReLU}_{3 2},\text{EvoReLU}_{3 3},\text{EvoReLU}_{3 4},\cdots

x_4: \text{EvoReLU}_{4 1},\text{EvoReLU}_{4 2},\text{EvoReLU}_{4 3},\text{EvoReLU}_{4 4},\cdots

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Evolutionary Multi-Objective Optimization

x_1: \text{EvoReLU}_{1 1},\text{EvoReLU}_{1 2},\text{EvoReLU}_{1 3},\text{EvoReLU}_{1 4},\cdots

x_2: \text{EvoReLU}_{2 1},\text{EvoReLU}_{2 2},\text{EvoReLU}_{2 3},\text{EvoReLU}_{2 4},\cdots

x_1^\prime: \text{EvoReLU}_{\color{red} 2 1},\text{EvoReLU}_{1 2},\text{EvoReLU}_{\color{red} 2 3},\text{EvoReLU}_{1 4},\cdots

x_2^\prime: \text{EvoReLU}_{\color{red} 1 1},\text{EvoReLU}_{2 2},\text{EvoReLU}_{\color{red} 1 3},\text{EvoReLU}_{2 4},\cdots

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Evolutionary Multi-Objective Optimization

x_1: \text{EvoReLU}_{1 1},\text{EvoReLU}_{1 2},\text{EvoReLU}_{1 3},\text{EvoReLU}_{1 4},\cdots

x_2: \text{EvoReLU}_{2 1},\text{EvoReLU}_{2 2},\text{EvoReLU}_{2 3},\text{EvoReLU}_{2 4},\cdots

x_3: \text{EvoReLU}_{3 1},\text{EvoReLU}_{3 2},\text{EvoReLU}_{3 3},\text{EvoReLU}_{3 4},\cdots

x_4: \text{EvoReLU}_{4 1},\text{EvoReLU}_{4 2},\text{EvoReLU}_{4 3},\text{EvoReLU}_{4 4},\cdots

x_1^\prime: \text{EvoReLU}_{1 1},\text{EvoReLU}_{1 2},{\color{red} \text{EvoReLU}_{1 3}^\prime},\text{EvoReLU}_{1 4},\cdots

x_2^\prime: \text{EvoReLU}_{2 1},\text{EvoReLU}_{2 2},{\color{red} \text{EvoReLU}_{2 3}^\prime},\text{EvoReLU}_{2 4},\cdots

x_3^\prime: {\color{red} \text{EvoReLU}_{3 1}^\prime},\text{EvoReLU}_{3 2},\text{EvoReLU}_{3 3},\text{EvoReLU}_{3 4},\cdots

x_4^\prime: \text{EvoReLU}_{4 1}, { \color{red} \text{EvoReLU}_{4 2}^\prime},\text{EvoReLU}_{4 3},\text{EvoReLU}_{4 4},\cdots

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Evolutionary Multi-Objective Optimization

x_3~\text{dominates}~x_6, x_7, \text{and}~x_8 \\ \text{i.e.}~x_3~\text{is better than}~x_6, x_7, \text{and}~x_8

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Evolutionary Multi-Objective Optimization

\text{crowding distance of}~x_3~\text{is}~ \frac{L_1+L_2}{2}

population

crossover

mutation

crowding distance sorting

non-dominated sorting

[3]Alex Krizhevsky. CIFAR example images (online).

Dataset: CIFAR10

50,000 training images

10,000 test images

32x32 resolution, 10 classes

Hardware & Software

Amazon AWS, r5.24xlarge

96 CPUs, 768 GB RAM

Microsoft SEAL, 3.6

Experimental Setup

Latency and Accuracy Trade-offs under FHE

Approach	MPCNN
Venue	ICML22
Scheme	CKKS
Polynomial	high-degree
Layerwise	no
Strategy	approx
Arch	manual

Latency and Accuracy Trade-offs under FHE

Approach	MPCNN	AESPA
Venue	ICML22	arXiv22
Scheme	CKKS	CKKS
Polynomial	high-degree	low-degree
Layerwise	no	no
Strategy	approx	train
Arch	manual	manual

Latency and Accuracy Trade-offs under FHE

Approach	MPCNN	AESPA	REDsec
Venue	ICML22	arXiv22	NDSS23
Scheme	CKKS	CKKS	TFHE
Polynomial	high-degree	low-degree	n/a
Layerwise	no	no	n/a
Strategy	approx	train	train
Arch	manual	manual	manual

Latency and Accuracy Trade-offs under FHE

Approach	MPCNN	AESPA	REDsec	AutoFHE
Venue	ICML22	arXiv22	NDSS23	USENIX24
Scheme	CKKS	CKKS	TFHE	CKKS
Polynomial	high-degree	low-degree	n/a	mixed
Layerwise	no	no	n/a	yes
Strategy	approx	train	train	adapt
Arch	manual	manual	manual	search

Multiplicative Depth of Layerwise EvoReLU

Layerwise EvoReLU

Conclusion

Multi-objective optimization generates Pareto-effective solutions to meet different requirements
Joint optimization of layerwise EvoReLU and bootstrapping results in optimal polynomial neural architectures

AutoFHE optimizes end-to-end polynomial neural architecture