Publications | Talks | Teaching | Services | Position Openings

 

Shigang Li (李士刚)
Professor, PhD Supervisor, Beijing University of Posts and Telecommunications
Links:     [Google Scholar]   [ResearchGate]   [LinkedIn]   [GitHub]   [ORCID]

Research interests:   Parallel Computing,   High-Performance Deep Learning Systems,   GPU,   MPI,   Heterogeneous Computing
Emails:   shigangli.cs@gmail.com;   lishigang@bupt.edu.cn

Brief Biography

        Dr. Shigang Li is currently a Professor (PhD supervisor) in School of Computer Science, Beijing University of Posts and Telecommunications, where he is leading the Parallel Computing and Intelligent Systems Laboratory. His research interests include parallel and distributed deep learning systems, high performance computing, and heterogeneous computing. He was a Postdoctoral Researcher in SPCL Lab, ETH Zurich from Aug. 2018 to Aug. 2022. He received the Bachelor's degree majored in Computer Science and the Ph.D degree majored in Computer Architecture from University of Science and Technology Beijing, in 2009 and 2014, respectively. He has been a joint Ph.D student in Department of Computer Science, University of Illinois at Urbana-Champaign from Sep. 2011 to Sep. 2013. He has been an Assistant Professor in State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences from 2014 to 2018. He got the Best Paper Nominations (as the leading author) in SC'22, SC'21, PPoPP'20 and HPDC'13, and Outstanding Paper Award of MLSys'21, Best Reproducibility Advancement Award of SC'22. He has served as the PC members in top conferences (SC, PPoPP, IPDPS, IEEE Cluster, ICPP, etc.) and the invited reviewers in prestigious journals (IEEE TPDS, IEEE TSC, IEEE TBD, JPDC, etc.). He is the Associate Editor of Cluster Computing and the Youth Editor of CCF THPC, and has been Publicity Co-Chair of PPoPP'23, Publications Chair of IISWC'20, and Workshop Co-Chair of ICS'18. He is an Executive Committee Member of CCF TCHPC, and an Executive Committee Member of ACM SIGHPC China Chapter. He is a senior member of IEEE, ACM and CCF.

Position Openings

        I'm leading the Parallel Computing and Intelligent Systems Lab in BUPT, and we are looking for highly self-motivated PhD/Master students, Postdocs, and higher-level talents. Let's work together on HPC+AI and make something cool! Contact me directly with your CV if you're interested.

Talks

Selected Publications

  • [TPDS'2023]   Hang Cao, Liang Yuan, He Zhang, Yunquan Zhang, Baodong Wu, Kun Li, Shigang Li, Minghua Zhang, Pengqi Lu, Junmin Xiao. AGCM-3DLF: Accelerating Atmospheric General Circulation Model via 3D Parallelization and Leap-Format. IEEE Transactions on Parallel and Distributed Systems (2023). [Paper]
  • [SC'2023]   Wenqi Jiang, Shigang Li, Yu Zhu, Johannes de Fine Licht, Zhenhao He, Runbin Shi, Cedric Renggli, Shuai Zhang, Theodoros Rekatsinas, Torsten Hoefler, Gustavo Alonso. Co-Design Hardware and Algorithm for Vector Search. The International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2023.
  • [SC'2023]   Shunde Li, Zongguo Wang, Lingkun Bu, Jue Wang, Zhikuang Xin, Shigang Li, Yangang Wang, Yangde Feng, Peng Shi, Yun Hu, Xuebin Chi. ANT-MOC: Scalable Neutral Particle Transport Using 3D Method of Characteristics on Multi-GPU Systems. The International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2023. (Best Paper Finalist, Best Student Paper Finalist)
  • [SC'2023]   Yumeng Shi, Ningming Nie, Shunde Li, Jue Wang, Kehao Lin, Chunbao Zhou, Shigang Li, Kehan Yao, Yangde Feng, Yan Zeng, Fang Liu, Yangang Wang, Yue Gao. Large-Scale Simulation of Structural Dynamics Computing on GPU Clusters. The International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2023.
  • [MLSys'2023]   Kazuki Osawa, Shigang Li, and Torsten Hoefler. PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices. The 6th Conference on Machine Learning and Systems, 2023. [Paper]
  • [IPDPS'2023]   Daning Cheng, Shigang Li, Yunquan Zhang. Asynch-SGBDT: Train Stochastic Gradient Boosting Decision Trees in an Asynchronous Parallel Manner. In the 37th IEEE International Parallel and Distributed Processing Symposium, 2023.
  • [PPoPP'2023]   Kehao Lin, Chunbao Zhou, Yan Zeng, Ningming Nie, Jue Wang, Shigang Li, Yangde Feng, Yangang Wang, Kehan Yao, Tiechui Yao, Jilin Zhang, Jian Wan. A Scalable Hybrid Total FETI Method for Massively Parallel FEM Simulations. In Proceedings of the 28th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2023. [Paper]
  • [SC'2022]   Shigang Li, Kazuki Osawa, Torsten Hoefler. Efficient Quantized Sparse Matrix Operations on Tensor Cores. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2022. (Best Paper Finalist) [Paper][Talk][Slides][Code]
  • [SC'2022]   Torsten Hoefler, Tommaso Bonato, Daniele De Sensi, Salvatore Di Girolamo, Shigang Li, Marco Heddes, Jon Belk, Deepak Goel, Miguel Castro, Steve Scott. HammingMesh: A Network Topology for Large-Scale Deep Learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2022. (Best Reproducibility Advancement Award, CACM Research Highlights)[Paper]
  • [ICS'2022]   Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, and Torsten Hoefler. A Data-Centric Optimization Framework for Machine Learning. The 36th ACM International Conference on Supercomputing, 2022. [Paper][Code]
  • [PPoPP'2022]   Shigang Li, Torsten Hoefler. Near-Optimal Sparse Allreduce for Distributed Deep Learning. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2022. [Paper][Talk][Slides][Code]
  • [SC'2021]   Daniele De Sensi, Salvatore Di Girolamo, Saleh Ashkboos, Shigang Li, Torsten Hoefler. Flare: Flexible In-Network Allreduce. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2021. [Paper]
  • [SC'2021]   Shigang Li, Torsten Hoefler. Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2021. (Best Paper Finalist) [Paper][Talk][Slides][Code]
  • [NeurIPS'2021]   Giorgi Nadiradze, Amirmojtaba Sabour, Peter Davies, Shigang Li, Dan Alistarh. Asynchronous Decentralized SGD with Quantized and Local Updates. In Proceedings of the Thirty-fifth Conference on Neural Information Processing Systems, 2021. [Paper]
  • [MLSys'2021]   Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, and Torsten Hoefler. Data Movement is All You Need: A Case Study on Optimizing Transformers. The 4th Conference on Machine Learning and Systems, 2021. (Outstanding Paper Award, 5/52) [Paper][Code]
  • [T SUSTAIN ENERG'2021]   Tiechui Yao, Jue Wang, Haoyan Wu, Pei Zhang, Shigang Li, Ke Xu, Xiaoyan Liu, and Xuebin Chi. Intra-hour Photovoltaic Generation Forecasting based on Multi-source Data and Deep Learning Methods. IEEE Transactions on Sustainable Energy, 2021.
  • [PTRSA'2021]   Peter Grönquist, Chengyuan Yao, Tal Ben-Nun, Nikoli Dryden, Peter Dueben, Shigang Li, and Torsten Hoefler. Deep Learning for Post-Processing Ensemble Weather Forecasts. Philosophical Transactions of the Royal Society A. [Paper][Code]
  • [TPDS'2021]   Daning Cheng#, Shigang Li#, Hanping Zhang, Fen Xia, and Yunquan Zhang. Why Dataset Properties Bound the Scalability of Parallel Machine Learning Training Algorithms. IEEE Transactions on Parallel and Distributed Systems (2021). [Paper]
  • [TPDS'2021]   Shigang Li, Tal Ben-Nun, Dan Alistarh, Salvatore Di Girolamo, Nikoli Dryden, and Torsten Hoefler. Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging. IEEE Transactions on Parallel and Distributed Systems. [Paper][Code]
  • [JPDC'2020]   Daning Cheng, Shigang Li*, Yunquan Zhang. WP-SGD: Weighted parallel SGD for distributed unbalanced-workload training system. Journal of Parallel and Distributed Computing 145 (2020): 202-216. [Paper]
  • [PPoPP'2020]   Shigang Li, Tal Ben-Nun, Salvatore Di Girolamo, Dan Alistarh, and Torsten Hoefler. Taming unbalanced training workloads in deep learning with partial collective operations. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 45-61. 2020. (Acceptance rate: 23%, 28/121; Best Paper Nomination, 5/28) [Paper][Talk][Code]
  • [JAMES'2020]   He Zhang, Minghua Zhang, ..., Shigang Li, et al. CAS‐ESM 2: Description and climate simulation performance of the Chinese Academy of Sciences (CAS) Earth System Model (ESM) Version 2. Journal of Advances in Modeling Earth Systems (2020): e2020MS002210.
  • [IPDPS'2020]   Hang Cao, Liang Yuan, He Zhang, Baodong Wu, Shigang Li, Pengqi Lu, Yunquan Zhang, Yongjun Xu, and Minghua Zhang. A Highly Efficient Dynamical Core of Atmospheric General Circulation Model based on Leap-Format. In 2020 IEEE International Parallel and Distributed Processing Symposium, pp. 95-104. IEEE, 2020. [Paper]
  • [SC'2019]   Kun Li, Honghui Shang, Yunquan Zhang, Shigang Li, Baodong Wu, Dong Wang, Libo Zhang, Fang Li, Dexun Chen, and Zhiqiang Wei. OpenKMC: a KMC design for hundred-billion-atom simulation using millions of cores on Sunway Taihulight. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 68. ACM, 2019. (Acceptance rate: 22.7%, 78/344)
  • [ICTAI'2019]   Daning Cheng, Hanping Zhang, Fen Xia, Shigang Li, and Yunquan Zhang. Using Gradient based multikernel Gaussian Process and Meta-acquisition function to Accelerate SMBO. In 2019 IEEE 31st International Conference on Tools with Artificial Intelligence, pp. 440-447. IEEE, 2019.
  • [JSUPERCOMPUT'2019]   Kun Li, Shigang Li*, Shan Huang, Yifeng Chen, and Yunquan Zhang. FastNBL: fast neighbor lists establishment for molecular dynamics simulation based on bitwise operations. The Journal of Supercomputing (2019): 1-20.
  • [ISPA'2019]   Kun Li, Shigang Li, Bei Wang, Yifeng Chen, and Yunquan Zhang. swMD: Performance Optimizations for Molecular Dynamics Simulation on Sunway Taihulight. In 2019 IEEE International Symposium on Parallel & Distributed Processing with Applications, pp. 511-518. IEEE, 2019.
  • [TPDS'2018]   Shigang Li, Yunquan Zhang, and Torsten Hoefler. Cache-oblivious MPI all-to-all communications based on Morton order. IEEE Transactions on Parallel and Distributed Systems 29, no. 3 (2018): 542-555. [Paper][Talk][Code]
  • [ICPP'2018]   Shigang Li, Baodong Wu, Yunquan Zhang, Xianmeng Wang, Jianjiang Li, Changjun Hu, Jue Wang, Yangde Feng, and Ningming Nie. Massively scaling the metal microscopic damage simulation on sunway taihulight supercomputer. In Proceedings of the 47th International Conference on Parallel Processing, pp. 1-11. 2018. [Paper][Slides]
  • [ICPP'2018]   Junmin Xiao, Shigang Li, Baodong Wu, He Zhang, Kun Li, Erlin Yao, Yunquan Zhang, and Guangming Tan. Communication-avoiding for dynamical core of atmospheric general circulation model. In Proceedings of the 47th International Conference on Parallel Processing, pp. 1-10. 2018.
  • [JPDC'2018]   Zhihao Li, Haipeng Jia, Yunquan Zhang, Shice Liu, Shigang Li, Xiao Wang, and Hao Zhang. Efficient parallel optimizations of a high-performance SIFT on GPUs. Journal of Parallel and Distributed Computing 124 (2019): 78-91.
  • [ICPADS'2018]   Baodong Wu, Shigang Li*, Hang Cao, Yunquan Zhang, He Zhang, Junmin Xiao, and Minghua Zhang. AGCM3D: A Highly Scalable Finite-Difference Dynamical Core of Atmospheric General Circulation Model Based on 3D Decomposition. In 2018 IEEE 24th International Conference on Parallel and Distributed Systems, pp. 355-364. IEEE, 2018. (Corresponding Author) [Paper][Slides]
  • [PPoPP'2017]   Shigang Li, Yunquan Zhang, and Torsten Hoefler. Cache-oblivious MPI all-to-all communications on many-core architectures. Poster, ACM SIGPLAN Notices 52, no. 8 (2017): 445-446. [Paper]
  • [CPC'2017]   Changjun Hu, Xianmeng Wang, Jianjiang Li, Xinfu He, Shigang Li, Yangde Feng, Shaofeng Yang, and He Bai. Kernel optimization for short-range molecular dynamics. Computer Physics Communications 211 (2017): 31-40.
  • [CPC'2017]   Baodong Wu, Shigang Li*, Yunquan Zhang, and Ningming Nie. Hybrid-optimization strategy for the communication of large-scale Kinetic Monte Carlo simulation. Computer Physics Communications 211 (2017): 113-123. (Corresponding Author)
  • [TACO'2016]   Yunquan Zhang, Shigang Li*, Shengen Yan*, and Huiyang Zhou. A cross-platform spmv framework on many-core architectures. ACM Transactions on Architecture and Code Optimization (TACO) 13, no. 4 (2016): 1-25. (Corresponding Author) [Paper][Code]
  • [PIEEE'2016]   Yunquan Zhang, Ting Cao, Shigang Li, Xinhui Tian, Liang Yuan, Haipeng Jia, and Athanasios V. Vasilakos. Parallel processing systems for big data: a survey. Proceedings of the IEEE 104, no. 11 (2016): 2114-2136.
  • [SCIS'2015]   Shigang Li, ChangJun Hu, JunChao Zhang, and YunQuan Zhang. Automatic tuning of sparse matrix-vector multiplication on multicore clusters. Science China Information Sciences 58, no. 9 (2015): 1-14.
  • [HPCC'2015]   Shigang Li, Yunquan Zhang, Chunyang Xiang, and Lei Shi. Fast convolution operations on many-core architectures. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, pp. 316-323. IEEE, 2015.[Paper][Slides]
  • [CCGrid'2015]   Xiaomin Zhu, Junchao Zhang, Kazutomo Yoshii, Shigang Li, Yunquan Zhang, and Pavan Balaji. Analyzing MPI-3.0 process-level shared memory: A case study with stencil computations. In 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Workshop, pp. 1099-1106. IEEE, 2015.
  • [CLUSTER COMPUT'2014]   Shigang Li, Torsten Hoefler, Chungjin Hu, and Marc Snir. Improved MPI collectives for MPI processes in shared address spaces. Cluster computing 17, no. 4 (2014): 1139-1155.
  • [HPDC'2013]   Shigang Li, Torsten Hoefler, and Marc Snir. NUMA-aware shared-memory collective communication for MPI. In Proceedings of the 22nd international symposium on High-performance parallel and distributed computing, pp. 85-96. 2013. (Acceptance rate: 15%, 20/131; Best Paper Nomination, 3/20) [Paper]
  • [PDP'2013]   Shigang Li, Jingyuan Hu, Xin Cheng, and Chongchong Zhao. Asynchronous work stealing on distributed memory systems. In 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 198-202. IEEE, 2013.
  • [ICA3PP'2011]   Shigang Li, Shucai Yao, Haohu He, Lili Sun, Yi Chen, and Yunfeng Peng. Extending synchronization constructs in openMP to exploit pipeline parallelism on heterogeneous multi-core. In International Conference on Algorithms and Architectures for Parallel Processing, Workshop, pp. 54-63. Springer, Berlin, Heidelberg, 2011.

Teaching

Academic Services

  • TPC Track Chair, HPC China 2023
  • Publicity Chair (Europe), PPoPP 2023
  • Publications Chair, IISWC 2020
  • Workshop Co-Chair, ICS 2018
  • Research Posters Member, SC 2023
  • Program Committee Member, SC 2023, 2022, 2021
  • Program Committee Member, PPoPP 2022
  • Program Committee Member, IEEE Cluster 2024, 2022, 2021
  • Program Committee Member, IPDPS 2024, 2021, 2018, 2017
  • Program Committee Member, ICPP 2023, 2022, 2017
  • Program Committee Member, ICPADS 2022, 2018
  • Program Committee Member, HPC Asia 2024, 2021, 2020, 2019, 2018
  • Program Committee Member, HPC China 2022, 2021, 2019, 2018, 2017, 2016
  • Program Committee Member, DPCS 2023, 2022
  • Program Committee Member, SBAC-PAD 2022, 2020, 2016 (ERC)
  • Program Committee Member, PMAM 2023, 2022, 2021
  • Program Committee Member, HP3C 2020, 2019, 2018
  • Program Committee Member, INFOCOMP 2022, 2021, 2020

  • Associate Editor of Cluster Computing (CLUS) - Springer
  • Youth Editor of CCF THPC
  • Reviewer of IEEE Transactions on Parallel and Distributed Systems (TPDS)
  • Reviewer of IEEE Transactions on Services Computing (TSC)
  • Reviewer of IEEE Transactions on Big Data (TBD)
  • Reviewer of IEEE Transactions on Network Science and Engineering
  • Reviewer of IEEE Transactions on Circuits and Systems II: Express Briefs
  • Reviewer of Journal of Parallel and Distributed Computing (JPDC) - Elsevier
  • Reviewer of Journal of Supercomputing - Springer
  • Reviewer of Concurrency and Computation: Practice and Experience
  • Reviewer of Mobile Networks and Applications – Springer
  • Program Committee Member, IEEE TPDS Special Section on Parallel and Distributed Computing Techniques for AI, ML, and DL, 2020