Publications | Talks | Teaching | Services | Position Openings

 

Shigang Li (李士刚)
Professor, PhD Supervisor, Beijing University of Posts and Telecommunications
Links:     [Google Scholar]   [ResearchGate]   [LinkedIn]   [GitHub]   [ORCID]

Research interests:   Parallel Computing,   High-Performance Deep Learning Systems,   GPU,   MPI,   Heterogeneous Computing
Emails:   shigangli.cs@gmail.com;   lishigang@bupt.edu.cn

Brief Biography

        Dr. Shigang Li is currently a Professor (PhD supervisor) in School of Computer Science, Beijing University of Posts and Telecommunications, where he is leading the Parallel Computing and Systems Laboratory. His research interests include parallel and distributed deep learning systems, high performance computing, and heterogeneous computing. He was a Postdoctoral Researcher in SPCL Lab, ETH Zurich from Aug. 2018 to Aug. 2022. He received the Bachelor's degree majored in Computer Science and the Ph.D degree majored in Computer Architecture from University of Science and Technology Beijing, in 2009 and 2014, respectively. He has been a joint Ph.D student in Department of Computer Science, University of Illinois at Urbana-Champaign from Sep. 2011 to Sep. 2013. He has been an Assistant Professor in State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences from 2014 to 2018. He got the Best Paper Nominations (as the leading author) in SC'22, SC'21, PPoPP'20 and HPDC'13, and Outstanding Paper Award of MLSys'21, Best Reproducibility Advancement Award of SC'22. He has served as the PC members in top conferences (SC, PPoPP, IPDPS, IEEE Cluster, ICPP, etc.) and the invited reviewers in prestigious journals (IEEE TPDS, IEEE TSC, IEEE TBD, JPDC, etc.). He is the Associate Editor of Cluster Computing and the Youth Editor of CCF THPC, and has been Publicity Co-Chair of PPoPP'23, Publications Chair of IISWC'20, and Workshop Co-Chair of ICS'18. He is an Executive Committee Member of CCF TCHPC, and an Executive Committee Member of ACM SIGHPC China Chapter. He is a senior member of IEEE, ACM and CCF.

Position Openings

        I'm leading the Parallel Computing and Systems Laboratory in BUPT, and we are looking for highly self-motivated PhD/Master students, Postdocs, and higher-level talents. Let's work together on HPC+AI and make something cool! Contact me directly with your CV if you're interested.

Talks

Selected Publications

  • [SC'2022]   Shigang Li, Kazuki Osawa, Torsten Hoefler. Efficient Quantized Sparse Matrix Operations on Tensor Cores. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2022. (Best Paper Finalist) [Paper][Talk][Slides][Code]
  • [SC'2022]   Torsten Hoefler, Tommaso Bonato, Daniele De Sensi, Salvatore Di Girolamo, Shigang Li, Marco Heddes, Jon Belk, Deepak Goel, Miguel Castro, Steve Scott. HammingMesh: A Network Topology for Large-Scale Deep Learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2022. (Best Reproducibility Advancement Award)[Paper]
  • [ICS’2022]   Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, and Torsten Hoefler. A Data-Centric Optimization Framework for Machine Learning. The 36th ACM International Conference on Supercomputing, 2022. [Paper][Code]
  • [PPoPP'2022]   Shigang Li, Torsten Hoefler. Near-Optimal Sparse Allreduce for Distributed Deep Learning. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. [Paper][Talk][Slides][Code]
  • [SC'2021]   Daniele De Sensi, Salvatore Di Girolamo, Saleh Ashkboos, Shigang Li, Torsten Hoefler. Flare: Flexible In-Network Allreduce. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2021. [Paper]
  • [SC'2021]   Shigang Li, Torsten Hoefler. Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2021. (Best Paper Finalist) [Paper][Talk][Slides][Code]
  • [NeurIPS’2021]   Giorgi Nadiradze, Amirmojtaba Sabour, Peter Davies, Shigang Li, Dan Alistarh. Asynchronous Decentralized SGD with Quantized and Local Updates. In Proceedings of the Thirty-fifth Conference on Neural Information Processing Systems, 2021. [Paper]
  • [MLSys’2021]   Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, and Torsten Hoefler. Data Movement is All You Need: A Case Study on Optimizing Transformers. The 4th Conference on Machine Learning and Systems, 2021. (Outstanding Paper Award, 5/52) [Paper][Code]
  • [T SUSTAIN ENERG’2021]   Tiechui Yao, Jue Wang, Haoyan Wu, Pei Zhang, Shigang Li, Ke Xu, Xiaoyan Liu, and Xuebin Chi. Intra-hour Photovoltaic Generation Forecasting based on Multi-source Data and Deep Learning Methods. IEEE Transactions on Sustainable Energy, 2021.
  • [PTRSA’2021]   Peter Grönquist, Chengyuan Yao, Tal Ben-Nun, Nikoli Dryden, Peter Dueben, Shigang Li, and Torsten Hoefler. Deep Learning for Post-Processing Ensemble Weather Forecasts. Philosophical Transactions of the Royal Society A. [Paper][Code]
  • [TPDS'2021]   Daning Cheng#, Shigang Li#, Hanping Zhang, Fen Xia, and Yunquan Zhang. Why Dataset Properties Bound the Scalability of Parallel Machine Learning Training Algorithms. IEEE Transactions on Parallel and Distributed Systems (2021). [Paper]
  • [TPDS'2021]   Shigang Li, Tal Ben-Nun, Dan Alistarh, Salvatore Di Girolamo, Nikoli Dryden, and Torsten Hoefler. Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging. IEEE Transactions on Parallel and Distributed Systems. [Paper][Code]
  • [JPDC'2020]   Daning Cheng, Shigang Li*, Yunquan Zhang. WP-SGD: Weighted parallel SGD for distributed unbalanced-workload training system. Journal of Parallel and Distributed Computing 145 (2020): 202-216. [Paper]
  • [PPoPP'2020]   Shigang Li, Tal Ben-Nun, Salvatore Di Girolamo, Dan Alistarh, and Torsten Hoefler. Taming unbalanced training workloads in deep learning with partial collective operations. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 45-61. 2020. (Acceptance rate: 23%, 28/121; Best Paper Nomination, 5/28) [Paper][Talk][Code]
  • [JAMES'2020]   He Zhang, Minghua Zhang, ..., Shigang Li, et al. CAS‐ESM 2: Description and climate simulation performance of the Chinese Academy of Sciences (CAS) Earth System Model (ESM) Version 2. Journal of Advances in Modeling Earth Systems (2020): e2020MS002210.
  • [IPDPS'2020]   Hang Cao, Liang Yuan, He Zhang, Baodong Wu, Shigang Li, Pengqi Lu, Yunquan Zhang, Yongjun Xu, and Minghua Zhang. A Highly Efficient Dynamical Core of Atmospheric General Circulation Model based on Leap-Format. In 2020 IEEE International Parallel and Distributed Processing Symposium, pp. 95-104. IEEE, 2020. [Paper]
  • [SC'2019]   Kun Li, Honghui Shang, Yunquan Zhang, Shigang Li, Baodong Wu, Dong Wang, Libo Zhang, Fang Li, Dexun Chen, and Zhiqiang Wei. OpenKMC: a KMC design for hundred-billion-atom simulation using millions of cores on Sunway Taihulight. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 68. ACM, 2019. (Acceptance rate: 22.7%, 78/344)
  • [ICTAI'2019]   Daning Cheng, Hanping Zhang, Fen Xia, Shigang Li, and Yunquan Zhang. Using Gradient based multikernel Gaussian Process and Meta-acquisition function to Accelerate SMBO. In 2019 IEEE 31st International Conference on Tools with Artificial Intelligence, pp. 440-447. IEEE, 2019.
  • [JSUPERCOMPUT'2019]   Kun Li, Shigang Li*, Shan Huang, Yifeng Chen, and Yunquan Zhang. FastNBL: fast neighbor lists establishment for molecular dynamics simulation based on bitwise operations. The Journal of Supercomputing (2019): 1-20.
  • [ISPA'2019]   Kun Li, Shigang Li, Bei Wang, Yifeng Chen, and Yunquan Zhang. swMD: Performance Optimizations for Molecular Dynamics Simulation on Sunway Taihulight. In 2019 IEEE International Symposium on Parallel & Distributed Processing with Applications, pp. 511-518. IEEE, 2019.
  • [TPDS'2018]   Shigang Li, Yunquan Zhang, and Torsten Hoefler. Cache-oblivious MPI all-to-all communications based on Morton order. IEEE Transactions on Parallel and Distributed Systems 29, no. 3 (2018): 542-555. [Paper][Talk][Code]
  • [ICPP'2018]   Shigang Li, Baodong Wu, Yunquan Zhang, Xianmeng Wang, Jianjiang Li, Changjun Hu, Jue Wang, Yangde Feng, and Ningming Nie. Massively scaling the metal microscopic damage simulation on sunway taihulight supercomputer. In Proceedings of the 47th International Conference on Parallel Processing, pp. 1-11. 2018. [Paper][Slides]
  • [ICPP'2018]   Junmin Xiao, Shigang Li, Baodong Wu, He Zhang, Kun Li, Erlin Yao, Yunquan Zhang, and Guangming Tan. Communication-avoiding for dynamical core of atmospheric general circulation model. In Proceedings of the 47th International Conference on Parallel Processing, pp. 1-10. 2018.
  • [JPDC'2018]   Zhihao Li, Haipeng Jia, Yunquan Zhang, Shice Liu, Shigang Li, Xiao Wang, and Hao Zhang. Efficient parallel optimizations of a high-performance SIFT on GPUs. Journal of Parallel and Distributed Computing 124 (2019): 78-91.
  • [ICPADS'2018]   Baodong Wu, Shigang Li*, Hang Cao, Yunquan Zhang, He Zhang, Junmin Xiao, and Minghua Zhang. AGCM3D: A Highly Scalable Finite-Difference Dynamical Core of Atmospheric General Circulation Model Based on 3D Decomposition. In 2018 IEEE 24th International Conference on Parallel and Distributed Systems, pp. 355-364. IEEE, 2018. (Corresponding Author) [Paper][Slides]
  • [PPoPP'2017]   Shigang Li, Yunquan Zhang, and Torsten Hoefler. Cache-oblivious MPI all-to-all communications on many-core architectures. Poster, ACM SIGPLAN Notices 52, no. 8 (2017): 445-446. [Paper]
  • [CPC'2017]   Changjun Hu, Xianmeng Wang, Jianjiang Li, Xinfu He, Shigang Li, Yangde Feng, Shaofeng Yang, and He Bai. Kernel optimization for short-range molecular dynamics. Computer Physics Communications 211 (2017): 31-40.
  • [CPC'2017]   Baodong Wu, Shigang Li*, Yunquan Zhang, and Ningming Nie. Hybrid-optimization strategy for the communication of large-scale Kinetic Monte Carlo simulation. Computer Physics Communications 211 (2017): 113-123. (Corresponding Author)
  • [TACO'2016]   Yunquan Zhang, Shigang Li*, Shengen Yan*, and Huiyang Zhou. A cross-platform spmv framework on many-core architectures. ACM Transactions on Architecture and Code Optimization (TACO) 13, no. 4 (2016): 1-25. (Corresponding Author) [Paper][Code]
  • [PIEEE'2016]   Yunquan Zhang, Ting Cao, Shigang Li, Xinhui Tian, Liang Yuan, Haipeng Jia, and Athanasios V. Vasilakos. Parallel processing systems for big data: a survey. Proceedings of the IEEE 104, no. 11 (2016): 2114-2136.
  • [SCIS'2015]   Shigang Li, ChangJun Hu, JunChao Zhang, and YunQuan Zhang. Automatic tuning of sparse matrix-vector multiplication on multicore clusters. Science China Information Sciences 58, no. 9 (2015): 1-14.
  • [HPCC'2015]   Shigang Li, Yunquan Zhang, Chunyang Xiang, and Lei Shi. Fast convolution operations on many-core architectures. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, pp. 316-323. IEEE, 2015.[Paper][Slides]
  • [CCGrid'2015]   Xiaomin Zhu, Junchao Zhang, Kazutomo Yoshii, Shigang Li, Yunquan Zhang, and Pavan Balaji. Analyzing MPI-3.0 process-level shared memory: A case study with stencil computations. In 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Workshop, pp. 1099-1106. IEEE, 2015.
  • [CLUSTER COMPUT'2014]   Shigang Li, Torsten Hoefler, Chungjin Hu, and Marc Snir. Improved MPI collectives for MPI processes in shared address spaces. Cluster computing 17, no. 4 (2014): 1139-1155.
  • [HPDC'2013]   Shigang Li, Torsten Hoefler, and Marc Snir. NUMA-aware shared-memory collective communication for MPI. In Proceedings of the 22nd international symposium on High-performance parallel and distributed computing, pp. 85-96. 2013. (Acceptance rate: 15%, 20/131; Best Paper Nomination, 3/20) [Paper]
  • [PDP'2013]   Shigang Li, Jingyuan Hu, Xin Cheng, and Chongchong Zhao. Asynchronous work stealing on distributed memory systems. In 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 198-202. IEEE, 2013.
  • [ICA3PP'2011]   Shigang Li, Shucai Yao, Haohu He, Lili Sun, Yi Chen, and Yunfeng Peng. Extending synchronization constructs in openMP to exploit pipeline parallelism on heterogeneous multi-core. In International Conference on Algorithms and Architectures for Parallel Processing, Workshop, pp. 54-63. Springer, Berlin, Heidelberg, 2011.

Teaching

Academic Services

  • Publicity Chair (Europe), PPoPP 2023
  • Publications Chair, IISWC 2020
  • Workshop Co-Chair, ICS 2018
  • Program Committee Member, SC 2023, 2022, 2021
  • Program Committee Member, PPoPP 2022
  • Program Committee Member, IEEE Cluster 2022, 2021
  • Program Committee Member, IPDPS 2021, 2018, 2017
  • Program Committee Member, ICPP 2023, 2022, 2017
  • Program Committee Member, ICPADS 2022, 2018
  • Program Committee Member, HPC Asia 2021, 2020, 2019, 2018
  • Program Committee Member, HPC China 2022, 2021, 2019, 2018, 2017, 2016
  • Program Committee Member, DPCS 2022
  • Program Committee Member, SBAC-PAD 2022, 2020, 2016 (ERC)
  • Program Committee Member, PMAM 2023, 2022, 2021
  • Program Committee Member, HP3C 2020, 2019, 2018
  • Program Committee Member, INFOCOMP 2022, 2021, 2020

  • Associate Editor of Cluster Computing (CLUS) - Springer
  • Youth Editor of CCF THPC
  • Reviewer of IEEE Transactions on Parallel and Distributed Systems (TPDS)
  • Reviewer of IEEE Transactions on Services Computing (TSC)
  • Reviewer of IEEE Transactions on Big Data (TBD)
  • Reviewer of IEEE Transactions on Network Science and Engineering
  • Reviewer of IEEE Transactions on Circuits and Systems II: Express Briefs
  • Reviewer of Journal of Parallel and Distributed Computing (JPDC) - Elsevier
  • Reviewer of Journal of Supercomputing - Springer
  • Reviewer of Concurrency and Computation: Practice and Experience
  • Reviewer of Mobile Networks and Applications – Springer
  • Program Committee Member, IEEE TPDS Special Section on Parallel and Distributed Computing Techniques for AI, ML, and DL, 2020