Jianming Tong

jianming [dot] tong [at] gatech [dot] edu

Ph.D. at Georgia Tech starting from Spring 2021

Advisor: Tushar Krishna

Main Developer for CROSS, FEATHER

I'm funded by Qualcomm Innovation Fellowship and SRC Jump 2.0

Cryptography Acceleration Lead in Synergy Lab @ Gatech

CV Google Scholar GitHub LinkedIn ORCID

Mission: Build high-performance systems that the real world actually uses.

Research Interest

I'm a Computer Architect, focusing on system for AI and Cryptography, e.g., enabling today’s AI systems to work in a privacy-preserving manner without sacrificing performance.

Model (Software): ML Model with Privacy-preserving Capability by Design, e.g. Crypto-friendly ML Model.
SmartPAF-MLSys'24 Privatar-UsenixSecurity'25
System: Latency/Accuracy Navigation for Multi-Query Streams.
SUSHI-IEEE Micro'23 SUSHI-MLSys'23
Compilation: Convert Crypto Algorithms to be Efficient on Commodity Hardware.
CROSS-HPCA'26
Architecture (Hardware): Reconfigurable Dataflow Accelerator for ML and Crypto.
FEATHER-ISCA'24
Performance Modeling: Performance Analysis Tool for ML Accelerators.
LayoutLoop-ISCA'24 ScaleSimV3-ISPASS'25 SquareLoop-HASP'25

News

[Nov. 2025] [Paper] Our work CROSS: Enable AI Accelerator for Homomorphic Encryption is being accepted to High-Performance Computer Architecture (HPCA'26), and will be presented at Sydney
[Nov. 2025] [Teaching] I gave a guest lecture of FEATHER at the "ECE 6120: Machine Intelligence" course at GWU hosted by Prof. Nan Wu.
[Oct. 2025] [talk] I presented a poster on CROSS: Enable AI Accelerator for Homomorphic Encryption at ACE Annual Review @ Chicago
[Oct. 2025] [DEMO] I give a multi-university demo on FEATHER at ACE Annual Review @ Chicago with Prof. Zhiru Zhang , Prof. Charith Mendis , and Prof. Subhasish Mitra , big thanks to the team Devansh Jain, Niansong Zhang, Hongzheng Chen and Saranyu Chattopadhyay!
[Sep. 2025] [Talk] I gave a talk on CROSS: Enable AI Accelerator for Homomorphic Encryption at TechCon See u'all at Austin!
[Sep. 2025] [Talk] I give a talk on FEATHER at UT Austin hosted by Prof. Mattan Erez!
[Sep. 2025] [Paper] Our work SquareLoop: Explore Optimal Authentication Block Strategy for ML is being accepted to Hardware and Architectural Support for Security and Privacy ( HASP'25 ), co-located with MICRO'25 at Seoul!
[Aug. 2025] [Award] Our work CROSS: Enable AI Accelerator for Homomorphic Encryption won the GT NEXT Award in recognition of our commitment to research and development that has the potential to significantly contribute to societal betterment! Go Yellow Jackets!
[Jul. 2025] [Poster] Our work Privatar: Enabling Privacy-preserving Real-time Multi-user VR via Secure Outsourcing is being accepted at Usenix Security'25 as poster! See u'all Aug 13~15 at Seattle!
[Jul. 2025] [Poster] Our work CROSS: Enable AI Accelerator for Homomorphic Encryption and Zero Knowledge Proof is being accepted at Usenix Security'25 as poster! See u'all Aug 13~15 at Seattle!
[Jul. 2025] [Service] I co-interview Prof. Mengjia Yan on behalf of TcuArch and IEEE Micro Sipping Matcha of Security: A Fireside Chat With Mengjia Yan goes online! Check out the video recording here Video!
[Jun. 2025] [Award] Our work CROSS: Enable AI Accelerator for Homomorphic Encryption won 2rd place at Unversity DEMO at DAC'25 ! U could run encrypted digit detction serving on Google Cloud with TPUv4 for free today!
[Jun. 2025] [DEMO] I will give a demo on CROSS: Enable AI Accelerator for Homomorphic Encryption at DAC'25 See u all at SF!
[May. 2025] [Talk] I gave a talk on CROSS: Enable AI Accelerator for Homomorphic Encryption and Zero Knowledge Proof at UMich hosted by Prof. Todd Austin!
[Mar. 2025] [Tool] LayoutLoop from FEATHER [ISCA'24] has been integrated into NVlabs/Timeloop, details could be found at this PR and this slide, enjoy precise layout modeling!
[Mar. 2025] [Paper] Our work Constrained Dataflow Accelerator for Real-Time Multi-Task Multi-Model Machine Learning Workloads has been accepted by ISPASS'25!
[Mar. 2025] [Paper] Our work Scale-sim V3 has been accepted by ISPASS'25!
[Jan. 2025] [Paper] Our work Leveraging ASIC AI Chips for Homomorphic Encryption has online released now!
[Nov. 2024] [Teaching] I gave a guest lecture of FEATHER at the "Advanced Computer Architecture for Machine Learning" course hosted by Prof. Tony Geng.
[Nov. 2024] [Talk] I give a talk on Leveraging AI ASIC for Homomorphic Encryption at NYU hosted by Prof. Brandon Reagon and Karthik Garimella!
[Nov. 2024] [Service] I co-organized JOBS Workshop to help faciliating new grads for job hunting - go JOBS!
[Nov. 2024] [Talk] I give a talk on FEATHER at WDDSA workshop co-located with MICRO'24 at Austin!
[Oct. 2024] [Talk] I demo FEATHER at SRC ACE annual review ACE at Chicago!
[Sep. 2024] [Paper] Our work Real-time Digital RF Emulation – II: A Near Memory Custom Accelerator is accepted to the IEEE Transactions on Radar Systems (TRadar'24).
[Aug. 2024] [Award] I was selected as the student for ACE Newsletter highlight by SRC!
[Aug. 2024] [Talk] I give a talk on FEATHER at SRC Liaison Meeting of ACE Center!
[Aug. 2024] [Career] I join Google as a student researcher in Phazon team of PSS, more realistic privacy-preserving acceleration are coming, stay tuned!
[Jul. 2024] [Talk] I give a talk on FEATHER at NVidia (HQ) and NVidia (Westford)!
[Jun. 2024] [Talk] We debut FEATHER A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching at ISCA, Buenos Aires!
[May. 2024] [Talk] I give a talk on FEATHER at MIT
[May. 2024] [Award] I am selected as "ML and System Rising Star" by ML Commons, excited to meet you all at Nvidia HQ at Jul 15~16.
[May. 2024] [Award] Our team "CipherFlitFort" is awarded Startup Launch by CreateX at Georgia Tech, Go Jackets!
[Apr. 2024] [Award] I am selected as DAC Young Fellow for DAC 2024.
[Mar. 2024] [Paper] Our work FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching is accepted to the International Symposium on Computer Architecture (ISCA'24).
[Feb. 2024] [Paper] Our work SmartPAF: Accurate Low-Degree Polynomial Approximation of Non-polynomial Operators for Fast Private Inference in Homomorphic Encryption is accepted to the Seventh Conference on Machine Learning and Systems (MLSys'24).
[Feb. 2024] [Service] We started course 6.192 Constructive Computer Architecture in three schools together this year (MIT, EPFL, GaTech) - recordings available online, Go Architects!
[Jan. 2024] [Service] I served AEC for ISCA'24.
[Nov. 2023] [Service] I join Computer Architecture Student Association ( CASA ) steering team, from the architects for the architects.
[Oct. 2023] [Talk] I gave a talk on SUSHI and PAF-FHE at HAN Lab @ MIT.
[Sep. 2023] [Award] I won Best Poster Award for presenting our work SUSHI at (IAP Workshop@MIT).
[Sep. 2023] [Paper] Our work Hardware-Software co-design for real-time latency-accuracy navigation in tinyML applications is accepted to the Journal (IEEE micro).
[Sep. 2023] [Career] I join MIT as a visiting researcher in CSAIL hosted by Dr. Arvind.
[Aug. 2023] [Paper] Our work SNATCH: Stealing Neural Network Architecture from ML Accelerator in Intelligent Sensors is accepted to the IEEE SENSORS conference (SENSORS'23).
[Jul. 2023] [Paper] Our work On Continuing DNN Accelerator Architecture Scaling Using Tightly-coupled Compute-on-Memory 3D ICs is accepted to the IEEE Transactions on Very Large Scale Integration Systems (TVLSI'23).
[Jul. 2023] [Award] I win 2023 Qualcomm Innovation Fellowship, thank you Qualcomm!
[Jul. 2023] [Service] I serve as AEC for ASPLOS'24.
[Jun. 2023] [Talk] I gave a talk on SUSHI and PAF-FHE at CAG Lab @ XJTU University.
[May. 2023] [Paper] Our work A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching accepted to the 3rd On-Device Intelligence Workshop (ODIW'23@MLSys'23).
[May. 2023] [Paper] Our work ReLU-FHE: Low-cost Accurate ReLU polynomial approximation in Fully Homomorphic Encryption Based ML Inference accepted to the 3rd On-Device Intelligence Workshop (ODIW'23@MLSys23) .
[Apr. 2023] [Paper] Our work SUSHI: SubGraph Stationary Hardware-Software Inference Co-design accepted to the Sixth Conference on Machine Learning and Systems (MLSys'23).
[Apr. 2023] [Paper] Our work FPGA-Based High-Performance Real-Time Emulation of Radar System using Direct Path Compute Model accepted to the International Microwave Symposium (IMS'23).
[Mar. 2023] [Talk] I give a talk on Enable Best ML Inference and Training: A systematic Approach at EIC Lab @ Georgia Tech.
[Mar. 2023] [Paper] Our work A High Performance Computing Architecture for Real-Time Digital Emulation of RF Interactions accepted to the In Proc of IEEE Radar Conference (RadarConf'23).
[Nov. 2022] [Talk] I give a talk on Full-Stack ML Dataflow, Mapping and SW/HW Co-Design and Search at NICS-EFC Lab @ Tsinghua University.
[Jul. 2022] [Tutorial] I give a tutorial on MAERI 2.0: An End-to-end framework to explore architecture design space on FPGA at ICS 2022.
[Jul. 2022] [Talk] I present our work FastSwtich: Enabling Real-time DNN Switching via Weight-Sharing at the 2nd Architecture, Compiler, and System Support for Multi-model DNN Workloads Workshop Workshop @ ISCA'23 .
[Apr. 2022] [Award] I receive Finalist in Qualcomm Innovation Fellowship, thank you Qualcomm!
[Mar. 2022] [Award] I win 2nd place in SCS Poster Competition at Georgia Tech, thank you SCS!
[Nov. 2021] [Paper] Our work A Configurable Architecture for Efficient Sparse FIR Computation in Real-time Radio Frequency Systems accepted to International Microwave Symposium (IMS'21).
[Aug. 2021] [Paper] Our work ac2SLAM: FPGA Accelerated High-Accuracy SLAM with Heapsort and Parallel Keypoint Extractor accepted to FPT'21.code
[Mar. 2021] [Paper] Our work SMMR-explore: Submap-based multi-robot exploration system with multi-robot multi-target potential field exploration method accepted to ICRA'21.code demo
[Mar. 2021] [Book] Our translated book On-chip Network publicly released [purchase translated version] [English version -- Free for University]
[Feb. 2021] [Paper] Our work PIT: Processing-In-Transmission with Fine-Grained Data Manipulation Networks accepted to ToC'21.
[Jan. 2021] [Career] I kick-off my Ph.D. career at Georgia Tech, go Yellow Jackets!
[Dec. 2020] [Paper] Our work COCOA: Content-Oriented Configurable Architecture Based on Highly-Adaptive Data Transmission Networks accepted to GLSVLSI'21.

Leading Publications (* Equal Contribution)

Leveraging ASIC AI Chips for Homomorphic Encryption

Jianming Tong, Tianhao Huang, Jingtian Dang, Leo De Castro, Anirudh Itagi, Anupam Golder, Asra Ali, Jevin Jiang, Jeremy Kun, Arvind, G. Edward Suh, Tushar Krishna

High-Performance Computer Architecture (HPCA), Jan 2026.

++CROSS is Deployed in Google TPU Cloud

++CROSS won 2nd place at DAC university demo

++CROSS won the GT NEXT Award

abstract paper code tutorial bibtex

FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching

Jianming Tong, Anirudh Itagi, Prasanth Chatarasi, Tushar Krishna

International Symposium on Computer Architecture (ISCA), Jun 2024.

++LayoutLoop is Integrated into NVLabs/Timeloop

abstract paper code tutorial slide ISCA Talk Deep Dive Talk LayoutLoop bibtex

SquareLoop: Explore Optimal Authentication Block Strategy for ML

Jan Strzeszynski*, Jianming Tong*, Kyungmi Lee*, Nathan Xiong, Angshuman Parashar, Joel S Emer, Tushar Krishna, and Mengjia Yan

Proceedings of the 14th International Workshop on Hardware and Architectural Support for Security and Privacy (HASP), Oct 2025.

abstract paper code bibtex

SCALE-Sim v3: A Modular Cycle-Accurate Systolic Accelerator Simulator for End-to-End System Analysis

Ritik Raj, Sarbartha Banerjee*, Nikhil Srinivas*, Zishen Wan*, Jianming Tong*, Ananda Samajdar, Tushar Krishna

IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Sep 2025.

abstract paper code bibtex

SmartPAF: Accurate Low-Degree Polynomial Approximation of Non-polynomial Operators for Fast Private Inference in Homomorphic Encryption

Jianming Tong*, Jingtian Dang*, Anupam Golder, Callie Hao, Arijit Raychowdhury, Tushar Krishna

In Proc of Seventh Conference on Machine Learning and Systems, (MLSys), May 2024.

abstract paper code bibtex

Hardware-Software co-design for real-time latency-accuracy navigation in tinyML applications

Payman Behnam*, Jianming Tong*, Alind Khare, Yangyu Chen, Yue Pan, Abhimanyu Bambhaniya, Pranav Gadikar, Tushar Krishna, and Alexey Tumanov

(IEEE micro), Sep 2023.

abstract paper bibtex

SUSHI: SUbgraph Stationary Hardware-software Inference Co-design

Payman Behnam*, Jianming Tong*, Alind Khare, Yangyu Chen, Yue Pan, Abhimanyu Bambhaniya, Pranav Gadikar, Tushar Krishna, and Alexey Tumanov

In Proc of Sixth Conference on Machine Learning and Systems (MLSys), Jun 2023.

++Qualcomm Innovation Fellowship

++Best Poster Award (IAP2023@MIT)

abstract paper bibtex

SMMR-explore: Submap-based multi-robot exploration system with multi-robot multi-target potential field exploration method

Jincheng Yu*, Jianming Tong*, Yuanfan Xu, Zhilin Xu, Haolin Dong, Tianxiang Yang, and Yu Wang.

IEEE International Conference on Robotics and Automation (ICRA), 2021. Oral

abstract paper code demo bibtex

Collaborative Publications (* Equal Contribution)

As Collaborator or Mentor

Exploring Constrained Dataflow Accelerator for Real-Time Multi-Task Multi-Model Machine Learning Workloads

Jamin Seo, Jianming Tong, Hyoukjun Kwon, Tushar Krishna and Saibal Mukhopadhyay.

IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Sep 2025.

abstract paper bibtex

Real-time Digital RF Emulation – II: A Near Memory Custom Accelerator

Xiangyu Mao, Mandovi Mukherjee, Nael Mizanur Rahman, Coleman B DeLude, Joseph W. Driscoll, Sudarshan Sharma, Payman Behnam, Uday Kamal, Jongseok Woo, Daehyun Kim, Sharjeel M. Khan, Jianming Tong, Jamin Seo, Prachi Sinha, Madhavan Swaminathan, Tushar Krishna, Santosh Pande, Justin Romberg, and Saibal Mukhopadhyay.

IEEE Transactions on Radar Systems (TRadar), Sep 2024.

abstract paper bibtex

SNATCH: Stealing Neural Network Architecture from ML Accelerator in Intelligent Sensors

Sudarshan Sharma, Uday Kamal, Jianming Tong, Tushar Krishna, and Saibal Mukhopadhyay.

IEEE SENSORS conference(SENSORS), Aug 2023.

abstract poster

On Continuing DNN Accelerator Architecture Scaling Using Tightly-coupled Compute-on-Memory 3D ICs

Gauthaman Murali, Aditya Iyer, Lingjun Zhu, Jianming Tong, Francisco Munoz Martinez, Srivatsa Rangachar Srinivasa, Tanay Karnik, Tushar Krishna Sung Kyu Lim

IEEE Transactions on Very Large Scale Integration Systems (TVLSI), Jul 2023.

abstract paper bibtex

FPGA-Based High-Performance Real-Time Emulation of Radar System using Direct Path Compute Model

Xiangyu Mao*, Mandovi Mukherjee*, Nael Mizanur Rahman*, Uday Kamal, Sudarshan Sharma, Payman Behnam, Jianming Tong, Jongseok Woo, Coleman B DeLude, Joseph W. Driscoll, Jamin Seo, Santosh Pande, Tushar Krishna, Justin Romberg, Madhavan Swaminathan, and Saibal Mukhopadhyay.

International Microwave Symposium (IMS), Jun 2023.

abstract paper bibtex

A High Performance Computing Architecture for Real-Time Digital Emulation of RF Interactions

Mandovi Mukherjee*, Nael Mizanur Rahman*, Coleman B. DeLude*, Joseph W. Driscoll*, Uday Kamal, Jongseok Woo, Jamin Seo, Sudarshan Sharma, Xiangyu Mao, Payman Behnam,, Sharjeel M. Khan, Daehyun Kim, Jianming Tong, Prachi Sinha, Santosh Pande, Tushar Krishna, Justin Romberg, Madhavan Swaminathan, and Saibal Mukhopadhyay.

In Proc of IEEE Radar Conference, (RadarConf), May 2023.

abstract paper bibtex

A Configurable Architecture for Efficient Sparse FIR Computation in Real-time Radio Frequency Systems

Jamin Seo, Nael Mizanur Rahman, Mandovi Mukherjee, Coleman DeLude, Jianming Tong, Justin Romberg, Tushar Krishna, and Saibal Mukhopadhyay.

International Microwave Symposium (IMS), 2021.

abstract paper bibtex

ac2SLAM: FPGA Accelerated High-Accuracy SLAM with Heapsort and Parallel Keypoint Extractor

Cheng Wang, Yinkun Liu, Kedai Zuo, Jianming Tong, Yan Ding, and Pengju Ren.

International Conference on Field-Programmable Technology (FPT), 2021. Full Paper

abstract paper code bibtex

PIT: Processing-In-Transmission with Fine-Grained Data Manipulation Networks

Pengchen Zong*, Tian Xia*, Haoran Zhao, Jianming Tong, Zehua Li, Wenzhe Zhao, Nanning Zheng, and Pengju Ren.

IEEE Transactions on Computers (TOC), 2021.

abstract paper bibtex

COCOA: Content-Oriented Configurable Architecture Based on Highly-Adaptive Data Transmission Networks

Tian Xia, Pengchen Zong, Haoran Zhao, Jianming Tong, Wenzhe Zhao, Nanning Zheng, and Pengju Ren.

Proceedings of the 2020 on Great Lakes Symposium on VLSI (GLSVLSI), 2020.

Insight: Adding NoC between Mem-Cache-CPU for supporting Sorting, Ordering and Multicasting (SOM) could boost 25X CPU perfromance for matrix inversion.

abstract paper bibtex

Workshops

FLOFA: Federated Once-for-All Networks

Alind Khare, Jianming Tong, Animesh Agrawal, Manas Sahni, Shreya Varshini, Alexey Tumanov, Jimeng Sun, Tushar Krishna, Vivek Sarkar, and Dawn Song.

SysML4Health: Scalable Systems for ML-driven Analytics in Healthcare Workshop, MLSys 2021

A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching

Jianming Tong, Anirudh Itagi, Tushar Krishna

The 3rd On-Device Intelligence Workshop@MLSys'23

abstract poster

ReLU-FHE: Low-cost Accurate ReLU Polynoimal Approximation in Fully Homomorphic Encryption Based ML Inference

Jingtian Dang*, Jianming Tong*, Anupam Golder, Callie Hao, Tushar Krishna

The 3rd On-Device Intelligence Workshop@MLSys'23

abstract paper code bibtex

Machine learning (ML) is getting more pervasive. Wide adoption of ML in healthcare, facial recognition, and blockchain involves private and sensitive data. One of the most promising candidates for inference on encrypted data, termed Fully Homomorphic Encryption (FHE), preserves the privacy of both data and the ML model. However, it slows down plaintext inference by six magnitudes, with a root cause of replacing non-polynomial operators with latency-prohibitive 27-degree Polynomial Approximated Function (PAF). While prior research has investigated low-degree PAFs, naive stochastic gradient descent (SGD) training fails to converge on PAFs with degrees higher than 5, leading to limited accuracy compared to the state-of-the-art 27-degree PAF. Therefore, we propose four training techniques to enable convergence in the post-approximation model using PAFs with an arbitrary degree, including (1) Dynamic Scaling (DS) and Static Scaling (SS) to enable minimal approximation error during approximation, (2) Coefficient Tuning (CT) to obtain a good initial coefficient value for each PAF, (3) Progressive Approximation (PA) to simply the two-variable regression optimization problem into single-variable for fast and easy convergence, and (4) Alternate Training (AT) to retraining the post-replacement PAFs and other linear layers in a decoupled divide-and-conquer manner. A combination of DS/SS, CT, PA, and AT enables the exploration of accuracy-latency space for FHEdomain ReLU replacement. Leveraging the proposed techniques, we propose a systematic approach (PAF-FHE) to enable low-degree PAF to demonstrate the same accuracy as SotA high-degree PAFs. We evaluated PAFs with various degrees on different models and variant datasets, and PAF-FHE consistently enables low-degree PAF to achieve higher accuracy than SotA PAFs. Specifically, for ResNet-18 under the ImageNet-1k dataset, our spotted optimal 12-degree PAF reduces 56% latency compared to the SotA 27-degree PAF with the same post-replacement accuracy (69.4%). While as for VGG-19 under the CiFar-10 dataset, optimal 12-degree PAF achieves even 0.84% higher accuracy with 72% latency saving. Our code is open-sourced at: https://github.com/TorchFHE/PAF-FHE.

@misc {PPR:PPR658940, Title = {PAF-FHE: Low-Cost Accurate Non-Polynomial Operator Polynomial Approximation in Fully Homomorphic Encryption Based ML Inference}, Author = {Dang, Jingtian and Tong, Jianming and Golder, Anupam and Raychowdhury, Arijit and Hao, Cong and Krishna, Tushar}, DOI = {10.21203/rs.3.rs-2910088/v1}, Abstract = {Machine learning (ML) is getting more pervasive. Wide adoption of ML in healthcare, facial recognition, and blockchain involves private and sensitive data. One of the most promising candidates for inference on encrypted data, termed Fully Homomorphic Encryp-tion (FHE), preserves the privacy of both data and the ML model. However, it slows down plaintext inference by six magnitudes, with a root cause of replacing non-polynomial operators with latency-prohibitive 27-degree Polynomial Approximated Function (PAF). While prior research has investigated low-degree PAFs, naive stochastic gradient descent (SGD) training fails to converge on PAFs with degrees higher than 5, leading to limited accuracy compared to the state-of-the-art 27-degree PAF. Therefore, we propose four training techniques to enable convergence in the post-approximation model using PAFs with an arbitrary degree, including (1) Dynamic Scaling (DS) and Static Scaling (SS) to enable minimal approximation error during approximation, (2) Coefficient Tuning (CT) to obtain a good initial coefficient value for each PAF, (3) Progressive Approximation (PA) to simply the two-variable regression optimization problem into single-variable for fast 1 and easy convergence, and (4) Alternate Training (AT) to retraining the post-replacement PAFs and other linear layers in a decoupled divide-and-conquer manner. A combination of DS/SS, CT, PA, and AT enables the exploration of accuracy-latency space for FHE-domain ReLU replacement. Leveraging the proposed techniques, we propose a systematic approach (PAF-FHE) to enable low-degree PAF to demonstrate the same accuracy as SotA high-degree PAFs. We evaluated PAFs with various degrees on different models and variant datasets, and PAF-FHE consistently enables low-degree PAF to achieve higher accuracy than SotA PAFs. Specifically, for ResNet-18 under the ImageNet-1k dataset, our spotted optimal 12-degree PAF reduces 56% latency compared to the SotA 27-degree PAF with the same post-replacement accuracy (69.4%). While as for VGG-19 under the CiFar-10 dataset, optimal 12-degree PAF achieves even 0.84% higher accuracy with 72% latency saving. Our code is open-sourced at: https://github.com/TorchFHE/PAF-FHE}, Publisher = {Research Square}, Year = {2023}, URL = {https://doi.org/10.21203/rs.3.rs-2910088/v1}, }

FastSwtich: Enabling Real-time DNN Switching via Weight-Sharing

Jianming Tong, Yangyu Chen, Yue Pan, Abhimanyu Bambhaniya, Alind Khare, Taekyung Heo, Alexey Tumanov, and Tushar Krishna

The 2nd Architecture, Compiler, and System Support for Multi-model DNN Workloads Workshop@ISCA'22

abstract paper bibtex

A growing number of applications depend on Machine Learning (ML) functionality and benefits from both higher quality ML predictions and better timeliness (latency) at the same time. A growing body of research in computer architecture, ML, and systems software literature focuses on reaching better latency/accuracy tradeoffs for ML models. Efforts include compression, quantization, pruning, early-exit models, mixed DNN precision, as well as ML inference accelerator designs that minimize latency and energy, while preserving delivered accuracy. All of them, however, yield improvements for a single static point in the latency/accuracy tradeoff space. We make a case for applications that operate in dynamically changing deployment scenarios, where no single static point is optimal. We draw on a recently proposed weight-shared SuperNet mechanism to enable serving a stream of queries that uses (activates) different SubNets within this weight-shared construct. This creates an opportunity to exploit the inherent temporal locality with our proposed SubGraph Stationary (SGS) optimization. We take a hardware-software co-design approach with a real implementation of SGS in SushiAccel and the implementation of a software scheduler SushiSched controlling which SubNets to serve and what to cache in real-time. Combined, they are vertically integrated into SUSHI---an inference serving stack. For the stream of queries, SUSHI yields up to 25% improvement in latency, 0.98% increase in served accuracy. SUSHI can achieve up to 78.7% off-chip energy savings.

Book

On-chip Network (Chinese)
Translator: Pengju Ren, Tian Xia, Jianming Tong, Pengcheng Zong, Haoran Zhao
Abstract
This book targets engineers and researchers familiar with basic computer architecture concepts who are interested in learning about on-chip networks. This work is designed to be a short synthesis of the most critical concepts in on-chip network design. It is a resource for both understanding on-chip network basics and for providing an overview of state of-the-art research in on-chip networks.

Chinese Version
English Version

Education

Georgia Institute of Technology, USA
Ph.D. in Computer Science • Jan. 2021 to Present
Advisor: Prof. Tushar Krishna

Georgia Institute of Technology, USA
MS. in Computer Science • Jan. 2021 to May 2024
Advisor: Prof. Tushar Krishna

Xi'An Jiaotong University, China
B.E. in Electrical Engineering • Sep. 2016 to Jun. 2020
Advisor: Prof. Pengju Ren

Experience

Google, USA
Student Researcher • Aug. 2024 to Apr. 2025
Host: Asra Ali Jevin Jiang

Massachusetts Institute of Technology, USA
Research Associative • Feb. 2024 to Feb. 2025
Advisor: Prof. Tushar Krishna , Host: Prof. Arvind

Rivos Inc., Mountain View CA
Ph.D. Intern in Computer Architecture • May. 2023 to Aug 2023

Pacific Northwest National Lab (PNNL), Battelle WA
Research Intern in Computer Architecture • Jun. 2022 to Aug 2022

Alibaba DAMO Academy, Beijing
Research Intern in Fully Homormophic Encryption Accelerator • Jul. 2021 to Aug. 2021

Tsinghua University, Beijing
(Visiting Student) Research Assistant in Robotics • Aug. 2020 to Jan. 2021
Advisor: Prof. Yu Wang

Honors and Awards

2nd Place in University DEMO
first demonstration of superiority of AI ASICs in HE acceleration
Jun. 2025

ML and System Rising Star
in recognition of reconfigurable AI Computing
Jul. 2024

Best Poster Award
in recognition of reconfigurable AI Computing
Sep. 2023

Winner in Qualcomm Innovation Fellowship
in recognition of runtime latency-accuracy navigation
Jul. 2023

Services

Artifact Evaluation Committee in ASPLOS'24, ISCA'24

Reviewer in TOC'26 MLSys'25, IEEE Micro'25, CAL'25, TOC'25, IROS'25, AsiaCCS'25, TVLSI'24, ICRA'24

Steering Committee in Computer Architecture Student Association (CASA)
interview Prof. Mengjia Yan
interview Prof. Todd Austin
Co-organize JOBS workshop @ MICRO'24

Life

I love writing songs, playing piano, guitar, singing and fitting in. I'm available on major music distributor like Apple Music, Spotify, QQ music and NetEase etc (Search my name in platforms to find me XD). Some thing about me could be also found here
"Name"(Live) Youtube - Magic Mushroom Bilibili - Magic Mushroom