李士刚北京邮电大学, Homepage of Shigang Li @ BUPT

李士刚北京邮电大学 | Shigang Li @ BUPT
Professor, PhD Supervisor, Director of ParCIS Lab, Beijing University of Posts and Telecommunications
Links: [Google Scholar] [ResearchGate] [GitHub] [ORCID]

Research interests: Parallel Computing, Deep Learning Systems, High-Performance Computing, Computer Architecture, Heterogeneous Computing
Emails: shigangli.cs@gmail.com; lishigang@bupt.edu.cn

Brief Biography

Dr. Shigang Li is currently a Professor in School of Computer Science, Beijing University of Posts and Telecommunications, where he is leading the Parallel Computing and Intelligent Systems Laboratory (ParCIS Lab). His research interests include parallel and distributed deep learning systems, high performance computing, and heterogeneous computing. He was a Postdoctoral Researcher in SPCL Lab, ETH Zurich from Aug. 2018 to Aug. 2022. He received the Bachelor's degree majored in Computer Science and the Ph.D degree majored in Computer Architecture from University of Science and Technology Beijing, in 2009 and 2014, respectively. He has been a joint Ph.D student in Department of Computer Science, University of Illinois at Urbana-Champaign from Sep. 2011 to Sep. 2013. He has been an Assistant Professor in State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences from 2014 to 2018. He got the Best Paper Nominations (as the leading author) in SC'22, SC'21, PPoPP'20 and HPDC'13, and Outstanding Paper Award of MLSys'21, Best Reproducibility Advancement Award of SC'22. He has served as the PC members in top conferences (SC, PPoPP, IPDPS, IEEE Cluster, ICPP, etc.) and the invited reviewers in prestigious journals (IEEE TPDS, IEEE TSC, IEEE TBD, JPDC, etc.). He is the Associate Editor of Cluster Computing and the Youth Editor of CCF THPC, and has been Publicity Co-Chair of PPoPP'23, Publications Chair of IISWC'20, and Workshop Co-Chair of ICS'18. He is an Executive Committee Member of CCF TCHPC, and an Executive Committee Member of ACM SIGHPC China Chapter. He is a senior member of IEEE, ACM and CCF.

Position Openings

I'm leading the Parallel Computing and Intelligent Systems Laboratory (ParCIS Lab) in BUPT, and we are looking for highly self-motivated PhD and Master students, Postdocs, and higher-level talents. Let's work together on HPC+AI and make something cool! Contact me directly with your CV if you're interested.

Talks

Dec. 10, 2024 — 2024 Computing Foundation Software Scholars Committee Workshop, Beijing — Acceleration for Unstructured Sparse Matrix Operators.
Nov. 30, 2024 — CCF China Storage Conference, Forum of Large Model Training and Parallel Processing, Guangzhou — Accelerating Unstructured Sparsity on Tensor Cores for Large Models.
Oct. 28, 2024 — Seminar on AI Large Model Reasoning, Beijing — Parallel and Distributed Inference for Sparse Large Models.
Aug. 11, 2024 — CSML 2024, Shanghai — Research on Efficient and Scalable Parallel Strategies for Large Models.
June 15, 2024 — BAAI Conference 2024, AI System Forum, Beijing — Research on Efficient and Scalable Parallel Strategies for Large Models.
May 21, 2024 — STW 2024, Shenzhen — Exploring scalable and efficient parallel schemes for large models.
Nov. 25, 2023 — Semiconductor Technology Forum, 2023, Beijing — Integrating Fisher Information Matrix with Pipeline Parallelism for Large Model Training.
Aug. 26, 2023 — CCF HPC China 2023, Forum on Compiler, Runtime, and Performance Optimization for Diverse Computing Power, Qingdao — Integrating Fisher Information Matrix with Pipeline Parallelism for Large Model Training.
Mar. 9, 2023 — The Second Workshop for Advanced Computing, Beijing — Magicube: Efficient Quantized Sparse Matrix Operations on Tensor Cores.
Dec. 12, 2022 — CCF HPC China 2022, the First Forum for High-Performance Deep Learning Systems — Challenges of High-Performance Deep Learning Systems and the Countermeasures.
Dec. 12, 2022 — HPCMid-2022 — Efficient Quantized Sparse Matrix Operations on Tensor Cores.
Dec. 15, 2022 — CCF HPC China 2022, the Fourth Forum for Architectures, Algorithms, and Applications of High-Performance Sparse Matrix Computation — Efficient Quantized Sparse Matrix Operations on Tensor Cores.
Dec. 15, 2022 — CCF HPC China 2022, the Sixth Forum for HPC Performance Model — 大模型双向流水线并行系统Chimera.
Sept. 20, 2022 — The 18th International Symposium on Applied Reconfigurable Computing (ARC 2022), Beijing — Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines.
April 4, 2022 — PPoPP'22 — Near-Optimal Sparse Allreduce for Distributed Deep Learning. [Video]
Nov. 16, 2021 — SC'21 — Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines. [Video]
Aug. 28, 2021 — ECM2'21 — Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines. [Video]
Feb. 24, 2020 — PPoPP'20, San Diego — Taming unbalanced training workloads in deep learning with partial collective operations. [Video]
Oct. 26, 2016 — PADAL'16, Kobe — Cache-Oblivious MPI All-to-All Collectives. [Video]

Publications

[SC'2025] Shigang Li, Jingkun Dong, Jiahao Chen, Zhi Ma, Zhongzhe Hu. Hypertron: Efficiently Scaling Large Models by Exploring High-Dimensional Parallelization Space. The International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2025. [Paper]

[ICS'2025] Lixing Zhang, Yingxia Shao Shigang Li. CoLa: Towards Communication-Efficient Distributed Sparse Matrix-Matrix Multiplication on GPUs. The 39th ACM International Conference on Supercomputing, 2025. [Paper]

[TACO'2025] Xueying Wang, Shigang Li*, Hao Qian, et al. OptiFX: Automatic Optimization for Convolutional Neural Networks with Aggressive Operator Fusion on GPUs. ACM Transactions on Architecture and Code Optimization, 2025. [Paper]

[DAC'2025] Junyu Gu, Shunde Li, Rongqiang Cao, Jue Wang, Zijian Wang, Zhiqiang Liang, Fang Liu, Shigang Li, Chunbao Zhou, Yangang Wang, Xuebin Chi. ParGNN: A Scalable Graph Neural Network Training Framework on multi-GPUs. The 62nd ACM/IEEE Design Automation Conference, 2025. [Paper]

[CCF THPC'2025] Youxuan Xu, Tong Wu, Shigang Li*, Xueying Wang, Jingjing Wang. SparkAttention: High-Performance Multi-Head Attention for Large Models on Volta GPU Architecture. CCF Transactions on High Performance Computing, 2025. [Paper]

[PPoPP'2025] Jinliang Shi, Shigang Li*, Youxuan Xu, Rongtian Fu, Xueying Wang, Tong Wu. FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores. The 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025. [Paper][Code]

[TPDS'2024] Jinfan Chen, Shigang Li*, Ran Guo, Jinhui Yuan, Torsten Hoefler. AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth Cost. IEEE Transactions on Parallel and Distributed Systems (2024). [Paper]

[CACM'2024] Torsten Hoefler, Tommaso Bonato, Daniele De Sensi, Salvatore Di Girolamo, Shigang Li, Marco Heddes, Deepak Goel, Miguel Castro, Steve Scott. HammingMesh: A Network Topology for Large-Scale Deep Learning. Communications of the ACM, Volume 67, Issue 12, Pages 97-105, 2024. [Paper]

[NSDI'2024] Nils Blach, Maciej Besta, Daniele De Sensi, Jens Domke, Hussein Harake, Shigang Li, Patrick Iff, Marek Konieczny, Kartik Lakhotia, Ales Kubicek, Marcel Ferrari, Fabrizio Petrini, Torsten Hoefler. A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network. The 20th USENIX Symposium on Networked Systems Design and Implementation, 2024. [Paper]

[PPoPP'2024] Shunde Li, Junyu Gu, Jue Wang, Tiechui Yao, Zhiqiang Liang, Yumeng Shi, Shigang Li, Weiting Xi, Shushen Li, Chunbao Zhou, Yangang Wang, Xuebin Chi. ParGNN: Efficient Training for Large-Scale Graph Neural Network on GPU Clusters. Poster. In Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024. [Paper]

[TPDS'2023] Hang Cao, Liang Yuan, He Zhang, Yunquan Zhang, Baodong Wu, Kun Li, Shigang Li, Minghua Zhang, Pengqi Lu, Junmin Xiao. AGCM-3DLF: Accelerating Atmospheric General Circulation Model via 3D Parallelization and Leap-Format. IEEE Transactions on Parallel and Distributed Systems (2023). [Paper]

[SC'2023] Wenqi Jiang, Shigang Li, Yu Zhu, Johannes de Fine Licht, Zhenhao He, Runbin Shi, Cedric Renggli, Shuai Zhang, Theodoros Rekatsinas, Torsten Hoefler, Gustavo Alonso. Co-Design Hardware and Algorithm for Vector Search. The International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2023. [Paper]

[SC'2023] Shunde Li, Zongguo Wang, Lingkun Bu, Jue Wang, Zhikuang Xin, Shigang Li, Yangang Wang, Yangde Feng, Peng Shi, Yun Hu, Xuebin Chi. ANT-MOC: Scalable Neutral Particle Transport Using 3D Method of Characteristics on Multi-GPU Systems. The International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2023. (Best Paper Finalist, Best Student Paper Finalist) [Paper]

[SC'2023] Yumeng Shi, Ningming Nie, Shunde Li, Jue Wang, Kehao Lin, Chunbao Zhou, Shigang Li, Kehan Yao, Yangde Feng, Yan Zeng, Fang Liu, Yangang Wang, Yue Gao. Large-Scale Simulation of Structural Dynamics Computing on GPU Clusters. The International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2023. [Paper]

[MLSys'2023] Kazuki Osawa*, Shigang Li*, and Torsten Hoefler. PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices. The 6th Conference on Machine Learning and Systems, 2023. [Paper]

[IPDPS'2023] Daning Cheng, Shigang Li, Yunquan Zhang. Asynch-SGBDT: Train Stochastic Gradient Boosting Decision Trees in an Asynchronous Parallel Manner. In the 37th IEEE International Parallel and Distributed Processing Symposium, 2023. [Paper]

[PPoPP'2023] Kehao Lin, Chunbao Zhou, Yan Zeng, Ningming Nie, Jue Wang, Shigang Li, Yangde Feng, Yangang Wang, Kehan Yao, Tiechui Yao, Jilin Zhang, Jian Wan. A Scalable Hybrid Total FETI Method for Massively Parallel FEM Simulations. In Proceedings of the 28th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2023. [Paper]

[SC'2022] Shigang Li, Kazuki Osawa, Torsten Hoefler. Efficient Quantized Sparse Matrix Operations on Tensor Cores. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2022. (Best Paper Finalist) [Paper][Talk][Slides][Code]

[SC'2022] Torsten Hoefler, Tommaso Bonato, Daniele De Sensi, Salvatore Di Girolamo, Shigang Li, Marco Heddes, Jon Belk, Deepak Goel, Miguel Castro, Steve Scott. HammingMesh: A Network Topology for Large-Scale Deep Learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2022. (Best Reproducibility Advancement Award, CACM Research Highlights)[Paper]

[ICS'2022] Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, and Torsten Hoefler. A Data-Centric Optimization Framework for Machine Learning. The 36th ACM International Conference on Supercomputing, 2022. [Paper][Code]

[PPoPP'2022] Shigang Li, Torsten Hoefler. Near-Optimal Sparse Allreduce for Distributed Deep Learning. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2022. [Paper][Talk][Slides][Code]

[SC'2021] Daniele De Sensi, Salvatore Di Girolamo, Saleh Ashkboos, Shigang Li, Torsten Hoefler. Flare: Flexible In-Network Allreduce. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2021. [Paper]

[SC'2021] Shigang Li, Torsten Hoefler. Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2021. (Best Paper Finalist) [Paper][Talk][Slides][Code]

[NeurIPS'2021] Giorgi Nadiradze, Amirmojtaba Sabour, Peter Davies, Shigang Li, Dan Alistarh. Asynchronous Decentralized SGD with Quantized and Local Updates. In Proceedings of the Thirty-fifth Conference on Neural Information Processing Systems, 2021. [Paper]

[MLSys'2021] Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, and Torsten Hoefler. Data Movement is All You Need: A Case Study on Optimizing Transformers. The 4th Conference on Machine Learning and Systems, 2021. (Outstanding Paper Award, 5/52) [Paper][Code]

[T SUSTAIN ENERG'2021] Tiechui Yao, Jue Wang, Haoyan Wu, Pei Zhang, Shigang Li, Ke Xu, Xiaoyan Liu, and Xuebin Chi. Intra-hour Photovoltaic Generation Forecasting based on Multi-source Data and Deep Learning Methods. IEEE Transactions on Sustainable Energy, 2021.

[PTRSA'2021] Peter Grönquist, Chengyuan Yao, Tal Ben-Nun, Nikoli Dryden, Peter Dueben, Shigang Li, and Torsten Hoefler. Deep Learning for Post-Processing Ensemble Weather Forecasts. Philosophical Transactions of the Royal Society A. [Paper][Code]

[TPDS'2021] Daning Cheng#, Shigang Li#, Hanping Zhang, Fen Xia, and Yunquan Zhang. Why Dataset Properties Bound the Scalability of Parallel Machine Learning Training Algorithms. IEEE Transactions on Parallel and Distributed Systems (2021). [Paper]

[TPDS'2021] Shigang Li, Tal Ben-Nun, Dan Alistarh, Salvatore Di Girolamo, Nikoli Dryden, and Torsten Hoefler. Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging. IEEE Transactions on Parallel and Distributed Systems. [Paper][Code]

[JPDC'2020] Daning Cheng, Shigang Li*, Yunquan Zhang. WP-SGD: Weighted parallel SGD for distributed unbalanced-workload training system. Journal of Parallel and Distributed Computing 145 (2020): 202-216. [Paper]

[PPoPP'2020] Shigang Li, Tal Ben-Nun, Salvatore Di Girolamo, Dan Alistarh, and Torsten Hoefler. Taming unbalanced training workloads in deep learning with partial collective operations. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 45-61. 2020. (Acceptance rate: 23%, 28/121; Best Paper Nomination, 5/28) [Paper][Talk][Code]

[JAMES'2020] He Zhang, Minghua Zhang, ..., Shigang Li, et al. CAS‐ESM 2: Description and climate simulation performance of the Chinese Academy of Sciences (CAS) Earth System Model (ESM) Version 2. Journal of Advances in Modeling Earth Systems (2020): e2020MS002210.

[IPDPS'2020] Hang Cao, Liang Yuan, He Zhang, Baodong Wu, Shigang Li, Pengqi Lu, Yunquan Zhang, Yongjun Xu, and Minghua Zhang. A Highly Efficient Dynamical Core of Atmospheric General Circulation Model based on Leap-Format. In 2020 IEEE International Parallel and Distributed Processing Symposium, pp. 95-104. IEEE, 2020. [Paper]

[SC'2019] Kun Li, Honghui Shang, Yunquan Zhang, Shigang Li, Baodong Wu, Dong Wang, Libo Zhang, Fang Li, Dexun Chen, and Zhiqiang Wei. OpenKMC: a KMC design for hundred-billion-atom simulation using millions of cores on Sunway Taihulight. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 68. ACM, 2019. (Acceptance rate: 22.7%, 78/344)

[ICTAI'2019] Daning Cheng, Hanping Zhang, Fen Xia, Shigang Li, and Yunquan Zhang. Using Gradient based multikernel Gaussian Process and Meta-acquisition function to Accelerate SMBO. In 2019 IEEE 31st International Conference on Tools with Artificial Intelligence, pp. 440-447. IEEE, 2019.

[JSUPERCOMPUT'2019] Kun Li, Shigang Li*, Shan Huang, Yifeng Chen, and Yunquan Zhang. FastNBL: fast neighbor lists establishment for molecular dynamics simulation based on bitwise operations. The Journal of Supercomputing (2019): 1-20.

[ISPA'2019] Kun Li, Shigang Li, Bei Wang, Yifeng Chen, and Yunquan Zhang. swMD: Performance Optimizations for Molecular Dynamics Simulation on Sunway Taihulight. In 2019 IEEE International Symposium on Parallel & Distributed Processing with Applications, pp. 511-518. IEEE, 2019.

[TPDS'2018] Shigang Li, Yunquan Zhang, and Torsten Hoefler. Cache-oblivious MPI all-to-all communications based on Morton order. IEEE Transactions on Parallel and Distributed Systems 29, no. 3 (2018): 542-555. [Paper][Talk][Code]

[ICPP'2018] Shigang Li, Baodong Wu, Yunquan Zhang, Xianmeng Wang, Jianjiang Li, Changjun Hu, Jue Wang, Yangde Feng, and Ningming Nie. Massively scaling the metal microscopic damage simulation on sunway taihulight supercomputer. In Proceedings of the 47th International Conference on Parallel Processing, pp. 1-11. 2018. [Paper][Slides]

[ICPP'2018] Junmin Xiao, Shigang Li, Baodong Wu, He Zhang, Kun Li, Erlin Yao, Yunquan Zhang, and Guangming Tan. Communication-avoiding for dynamical core of atmospheric general circulation model. In Proceedings of the 47th International Conference on Parallel Processing, pp. 1-10. 2018.

[JPDC'2018] Zhihao Li, Haipeng Jia, Yunquan Zhang, Shice Liu, Shigang Li, Xiao Wang, and Hao Zhang. Efficient parallel optimizations of a high-performance SIFT on GPUs. Journal of Parallel and Distributed Computing 124 (2019): 78-91.

[ICPADS'2018] Baodong Wu, Shigang Li*, Hang Cao, Yunquan Zhang, He Zhang, Junmin Xiao, and Minghua Zhang. AGCM3D: A Highly Scalable Finite-Difference Dynamical Core of Atmospheric General Circulation Model Based on 3D Decomposition. In 2018 IEEE 24th International Conference on Parallel and Distributed Systems, pp. 355-364. IEEE, 2018. (Corresponding Author) [Paper][Slides]

[PPoPP'2017] Shigang Li, Yunquan Zhang, and Torsten Hoefler. Cache-oblivious MPI all-to-all communications on many-core architectures. Poster, ACM SIGPLAN Notices 52, no. 8 (2017): 445-446. [Paper]

[CPC'2017] Changjun Hu, Xianmeng Wang, Jianjiang Li, Xinfu He, Shigang Li, Yangde Feng, Shaofeng Yang, and He Bai. Kernel optimization for short-range molecular dynamics. Computer Physics Communications 211 (2017): 31-40.

[CPC'2017] Baodong Wu, Shigang Li*, Yunquan Zhang, and Ningming Nie. Hybrid-optimization strategy for the communication of large-scale Kinetic Monte Carlo simulation. Computer Physics Communications 211 (2017): 113-123. (Corresponding Author)

[TACO'2016] Yunquan Zhang, Shigang Li*, Shengen Yan*, and Huiyang Zhou. A cross-platform spmv framework on many-core architectures. ACM Transactions on Architecture and Code Optimization (TACO) 13, no. 4 (2016): 1-25. (Corresponding Author) [Paper][Code]

[PIEEE'2016] Yunquan Zhang, Ting Cao, Shigang Li, Xinhui Tian, Liang Yuan, Haipeng Jia, and Athanasios V. Vasilakos. Parallel processing systems for big data: a survey. Proceedings of the IEEE 104, no. 11 (2016): 2114-2136.

[SCIS'2015] Shigang Li, ChangJun Hu, JunChao Zhang, and YunQuan Zhang. Automatic tuning of sparse matrix-vector multiplication on multicore clusters. Science China Information Sciences 58, no. 9 (2015): 1-14.

[HPCC'2015] Shigang Li, Yunquan Zhang, Chunyang Xiang, and Lei Shi. Fast convolution operations on many-core architectures. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, pp. 316-323. IEEE, 2015.[Paper][Slides]

[CCGrid'2015] Xiaomin Zhu, Junchao Zhang, Kazutomo Yoshii, Shigang Li, Yunquan Zhang, and Pavan Balaji. Analyzing MPI-3.0 process-level shared memory: A case study with stencil computations. In 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Workshop, pp. 1099-1106. IEEE, 2015.

[CLUSTER COMPUT'2014] Shigang Li, Torsten Hoefler, Chungjin Hu, and Marc Snir. Improved MPI collectives for MPI processes in shared address spaces. Cluster computing 17, no. 4 (2014): 1139-1155.

[HPDC'2013] Shigang Li, Torsten Hoefler, and Marc Snir. NUMA-aware shared-memory collective communication for MPI. In Proceedings of the 22nd international symposium on High-performance parallel and distributed computing, pp. 85-96. 2013. (Acceptance rate: 15%, 20/131; Best Paper Nomination, 3/20) [Paper]

[PDP'2013] Shigang Li, Jingyuan Hu, Xin Cheng, and Chongchong Zhao. Asynchronous work stealing on distributed memory systems. In 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 198-202. IEEE, 2013.

[ICA3PP'2011] Shigang Li, Shucai Yao, Haohu He, Lili Sun, Yi Chen, and Yunfeng Peng. Extending synchronization constructs in openMP to exploit pipeline parallelism on heterogeneous multi-core. In International Conference on Algorithms and Architectures for Parallel Processing, Workshop, pp. 54-63. Springer, Berlin, Heidelberg, 2011.

Teaching

Lecture — Beijing University of Posts and Telecommunications, Computer Organization and Architecture, from Spring 2025
Lecture — Beijing University of Posts and Telecommunications, Computer Architecture, from Spring 2024
Lecture — Beijing University of Posts and Telecommunications, Digital Logic and Systems, from Autumn 2023
Exercise session — ETH Zurich, Parallel Programming, Spring 2022
Exercise session — ETH Zurich, Parallel Programming, Spring 2021
Exercise session — ETH Zurich, Parallel Programming, Spring 2019
Exercise session — ETH Zurich, Numerical Methods for CSE, C++/Eigen programming, Autumn 2019
Exercise session — ETH Zurich, Numerical Methods for CSE, C++/Eigen programming, Autumn 2018

Academic Services

TPC Track Chair, CCF HPC China 2024, 2023
Publicity Chair (Europe), PPoPP 2023
Publications Chair, IISWC 2020
Workshop Co-Chair, ICS 2018
Local Co-Chair, CCF SYS 2025
Program Committee Member, SC 2026, 2025, 2023, 2022, 2021
Program Committee Member, PPoPP 2026, 2022
Program Committee Member, IEEE Cluster 2025, 2024, 2022, 2021
Program Committee Member, IPDPS 2024, 2021, 2018, 2017
Program Committee Member, ICPP 2023, 2022, 2017
Program Committee Member, ICPADS 2022, 2018
Program Committee Member, APPT 2025
Program Committee Member, NPC 2025, 2024
Program Committee Member, HPC Asia 2024, 2021, 2020, 2019, 2018
Program Committee Member, CCF HPC China 2022, 2021, 2019, 2018, 2017, 2016
Program Committee Member, ChinaSys 2024
Program Committee Member, DPCS 2023, 2022
Program Committee Member, SBAC-PAD 2022, 2020, 2016 (ERC)
Program Committee Member, PMAM 2023, 2022, 2021
Program Committee Member, HP3C 2020, 2019, 2018
Program Committee Member, INFOCOMP 2022, 2021, 2020
Research Posters Member, SC 2023
Associate Editor of Cluster Computing (CLUS) - Springer
Youth Editor of CCF THPC
Reviewer of IEEE Transactions on Parallel and Distributed Systems (TPDS)
Reviewer of IEEE Transactions on Services Computing (TSC)
Reviewer of IEEE Transactions on Big Data (TBD)
Reviewer of IEEE Transactions on Network Science and Engineering
Reviewer of IEEE Transactions on Circuits and Systems II: Express Briefs
Reviewer of Journal of Parallel and Distributed Computing (JPDC) - Elsevier
Reviewer of Journal of Supercomputing - Springer
Reviewer of Concurrency and Computation: Practice and Experience
Reviewer of Mobile Networks and Applications – Springer
Program Committee Member, IEEE TPDS Special Section on Parallel and Distributed Computing Techniques for AI, ML, and DL, 2020