abstract
paper
code
bibtex
Machine learning (ML) is getting more pervasive. Wide adoption of ML in healthcare, facial
recognition,
and blockchain involves private and sensitive data. One of the most promising candidates for
inference
on encrypted data, termed Fully Homomorphic Encryption (FHE), preserves the privacy of both data
and
the
ML model. However, it slows down plaintext inference by six magnitudes, with a root cause of
replacing
non-polynomial operators with latency-prohibitive 27-degree Polynomial Approximated Function
(PAF).
While prior research has investigated low-degree PAFs, naive stochastic gradient descent (SGD)
training
fails to converge on PAFs with degrees higher than 5, leading to limited accuracy compared to the
state-of-the-art 27-degree PAF. Therefore, we propose four training techniques to enable
convergence
in
the post-approximation model using PAFs with an arbitrary degree, including (1) Dynamic Scaling
(DS)
and
Static Scaling (SS) to enable minimal approximation error during approximation, (2) Coefficient
Tuning
(CT) to obtain a good initial coefficient value for each PAF, (3) Progressive Approximation (PA)
to
simply the two-variable regression optimization problem into single-variable for fast and easy
convergence, and (4) Alternate Training (AT) to retraining the post-replacement PAFs and other
linear
layers in a decoupled divide-and-conquer manner. A combination of DS/SS, CT, PA, and AT enables
the
exploration of accuracy-latency space for FHEdomain ReLU replacement. Leveraging the proposed
techniques, we propose a systematic approach (PAF-FHE) to enable low-degree PAF to demonstrate the
same
accuracy as SotA high-degree PAFs. We evaluated PAFs with various degrees on different models and
variant datasets, and PAF-FHE consistently enables low-degree PAF to achieve higher accuracy than
SotA
PAFs. Specifically, for ResNet-18 under the ImageNet-1k dataset, our spotted optimal 12-degree PAF
reduces 56% latency compared to the SotA 27-degree PAF with the same post-replacement accuracy
(69.4%).
While as for VGG-19 under the CiFar-10 dataset, optimal 12-degree PAF achieves even 0.84% higher
accuracy with 72% latency saving. Our code is open-sourced at:
https://github.com/TorchFHE/PAF-FHE.
@misc {PPR:PPR658940,
Title = {PAF-FHE: Low-Cost Accurate Non-Polynomial Operator Polynomial Approximation in Fully
Homomorphic Encryption Based ML Inference},
Author = {Dang, Jingtian and Tong, Jianming and Golder, Anupam and Raychowdhury, Arijit and Hao,
Cong
and Krishna, Tushar},
DOI = {10.21203/rs.3.rs-2910088/v1},
Abstract = {Machine learning (ML) is getting more pervasive. Wide adoption of ML in healthcare,
facial
recognition, and blockchain involves private and sensitive data. One of the most promising
candidates
for inference on encrypted data, termed Fully Homomorphic Encryp-tion (FHE), preserves the privacy
of
both data and the ML model. However, it slows down plaintext inference by six magnitudes, with a
root
cause of replacing non-polynomial operators with latency-prohibitive 27-degree Polynomial
Approximated
Function (PAF). While prior research has investigated low-degree PAFs, naive stochastic gradient
descent
(SGD) training fails to converge on PAFs with degrees higher than 5, leading to limited accuracy
compared to the state-of-the-art 27-degree PAF. Therefore, we propose four training techniques to
enable
convergence in the post-approximation model using PAFs with an arbitrary degree, including (1)
Dynamic
Scaling (DS) and Static Scaling (SS) to enable minimal approximation error during approximation,
(2)
Coefficient Tuning (CT) to obtain a good initial coefficient value for each PAF, (3) Progressive
Approximation (PA) to simply the two-variable regression optimization problem into single-variable
for
fast 1 and easy convergence, and (4) Alternate Training (AT) to retraining the post-replacement
PAFs
and
other linear layers in a decoupled divide-and-conquer manner. A combination of DS/SS, CT, PA, and
AT
enables the exploration of accuracy-latency space for FHE-domain ReLU replacement. Leveraging the
proposed techniques, we propose a systematic approach (PAF-FHE) to enable low-degree PAF to
demonstrate
the same accuracy as SotA high-degree PAFs. We evaluated PAFs with various degrees on different
models
and variant datasets, and PAF-FHE consistently enables low-degree PAF to achieve higher accuracy
than
SotA PAFs. Specifically, for ResNet-18 under the ImageNet-1k dataset, our spotted optimal
12-degree
PAF
reduces 56% latency compared to the SotA 27-degree PAF with the same post-replacement accuracy
(69.4%).
While as for VGG-19 under the CiFar-10 dataset, optimal 12-degree PAF achieves even 0.84% higher
accuracy with 72% latency saving. Our code is open-sourced at:
https://github.com/TorchFHE/PAF-FHE},
Publisher = {Research Square},
Year = {2023},
URL = {https://doi.org/10.21203/rs.3.rs-2910088/v1},
}