Publications | Projects | Teaching | Services | News


Shigang Li (李士刚)
Postdoctoral Researcher, SPCL, Department of Computer Science, ETH Zurich
Links:     [Google Scholar]   [ResearchGate]   [ORCID]

Research interests:   parallel and distributed deep learning,   deep learning systems,   parallel and distributed computing

Brief Biography

        Dr. Shigang Li is currently a Postdoctoral Researcher in Department of Computer Science, ETH Zurich since Aug. 2018, supervised by Prof. Torsten Hoefler. He recieved the Bachelor's degree majored in Computer Science and the Ph.D degree majored in Computer Architecture from University of Science and Technology Beijing, in July 2009 and June 2014, respectively. He has been a joint Ph.D student in Department of Compuate Science, University of Illinois at Urbana-Champaign from Sep. 2011 to Sep. 2013, supervised by Prof. Marc Snir. He has been an Assistant Professor in State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences from June 2014 to Aug. 2018, mentored by Prof. Yunquan Zhang. He got the Best Paper Nominations in PPoPP'20 and HPDC'13, and Outstanding Paper of MLSys'21. He has served as the PC members in important academic conferences (e.g., SC, IPDPS, IEEE Cluster, ICPP, ICPADS, HPC China) and the invited reviewers in well-known journals (e.g., IEEE TPDS, IEEE TSC, IEEE TBD, JPDC). He is the Associate Editor of Cluster Computing (Springer), and has been the Publications Chair of IISWC'20 and the Workshop Co-chair of ICS'18. He is a member of ACM and IEEE, and a senior member of CCF.


  • [2020.10] I'm playing a very interesting newly discovered Jigsaw Puzzle game :-) It's super cool!


  • [MLSys’2021]   Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, and Torsten Hoefler. Data Movement is All You Need: A Case Study on Optimizing Transformers. The 4th Conference on Machine Learning and Systems, 2021. (Outstanding Paper, 5/52)
  • [PTRSA’2021]   Peter Grönquist, Chengyuan Yao, Tal Ben-Nun, Nikoli Dryden, Peter Dueben, Shigang Li, and Torsten Hoefler. Deep Learning for Post-Processing Ensemble Weather Forecasts. Philosophical Transactions of the Royal Society A.
  • [TPDS'2021]   Daning Cheng#, Shigang Li#, Hanping Zhang, Fen Xia, and Yunquan Zhang. Why Dataset Properties Bound the Scalability of Parallel Machine Learning Training Algorithms. IEEE Transactions on Parallel and Distributed Systems (2021). [Paper]
  • [TPDS'2020]   Shigang Li, Tal Ben-Nun, Dan Alistarh, Salvatore Di Girolamo, Nikoli Dryden, and Torsten Hoefler. Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging. IEEE Transactions on Parallel and Distributed Systems (2020). [Paper]
  • [JPDC'2020]   Daning Cheng, Shigang Li*, Yunquan Zhang. WP-SGD: Weighted parallel SGD for distributed unbalanced-workload training system. Journal of Parallel and Distributed Computing 145 (2020): 202-216. [Paper]
  • [PPoPP'2020]   Shigang Li, Tal Ben-Nun, Salvatore Di Girolamo, Dan Alistarh, and Torsten Hoefler. Taming unbalanced training workloads in deep learning with partial collective operations. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 45-61. 2020. (Acceptance rate: 23%, 28/121; best paper nomination, 5/28) [Paper][Slides]
  • [JAMES'2020]   He Zhang, Minghua Zhang, ..., Shigang Li, et al. CAS‐ESM 2: Description and climate simulation performance of the Chinese Academy of Sciences (CAS) Earth System Model (ESM) Version 2. Journal of Advances in Modeling Earth Systems (2020): e2020MS002210.
  • [IPDPS'2020]   Hang Cao, Liang Yuan, He Zhang, Baodong Wu, Shigang Li, Pengqi Lu, Yunquan Zhang, Yongjun Xu, and Minghua Zhang. A Highly Efficient Dynamical Core of Atmospheric General Circulation Model based on Leap-Format. In 2020 IEEE International Parallel and Distributed Processing Symposium, pp. 95-104. IEEE, 2020. [Paper]
  • [SC'2019]   Kun Li, Honghui Shang, Yunquan Zhang, Shigang Li, Baodong Wu, Dong Wang, Libo Zhang, Fang Li, Dexun Chen, and Zhiqiang Wei. OpenKMC: a KMC design for hundred-billion-atom simulation using millions of cores on Sunway Taihulight. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 68. ACM, 2019. (Acceptance rate: 22.7%, 78/344)
  • [ICTAI'2019]   Daning Cheng, Hanping Zhang, Fen Xia, Shigang Li, and Yunquan Zhang. Using Gradient based multikernel Gaussian Process and Meta-acquisition function to Accelerate SMBO. In 2019 IEEE 31st International Conference on Tools with Artificial Intelligence, pp. 440-447. IEEE, 2019.
  • [JSUPERCOMPUT'2019]   Kun Li, Shigang Li*, Shan Huang, Yifeng Chen, and Yunquan Zhang. FastNBL: fast neighbor lists establishment for molecular dynamics simulation based on bitwise operations. The Journal of Supercomputing (2019): 1-20.
  • [ISPA'2019]   Kun Li, Shigang Li, Bei Wang, Yifeng Chen, and Yunquan Zhang. swMD: Performance Optimizations for Molecular Dynamics Simulation on Sunway Taihulight. In 2019 IEEE International Symposium on Parallel & Distributed Processing with Applications, pp. 511-518. IEEE, 2019.
  • [TPDS'2018]   Shigang Li, Yunquan Zhang, and Torsten Hoefler. Cache-oblivious MPI all-to-all communications based on Morton order. IEEE Transactions on Parallel and Distributed Systems 29, no. 3 (2018): 542-555. [Paper][Slides]
  • [ICPP'2018]   Shigang Li, Baodong Wu, Yunquan Zhang, Xianmeng Wang, Jianjiang Li, Changjun Hu, Jue Wang, Yangde Feng, and Ningming Nie. Massively scaling the metal microscopic damage simulation on sunway taihulight supercomputer. In Proceedings of the 47th International Conference on Parallel Processing, pp. 1-11. 2018. [Paper][Slides]
  • [ICPP'2018]   Junmin Xiao, Shigang Li, Baodong Wu, He Zhang, Kun Li, Erlin Yao, Yunquan Zhang, and Guangming Tan. Communication-avoiding for dynamical core of atmospheric general circulation model. In Proceedings of the 47th International Conference on Parallel Processing, pp. 1-10. 2018.
  • [JPDC'2018]   Zhihao Li, Haipeng Jia, Yunquan Zhang, Shice Liu, Shigang Li, Xiao Wang, and Hao Zhang. Efficient parallel optimizations of a high-performance SIFT on GPUs. Journal of Parallel and Distributed Computing 124 (2019): 78-91.
  • [ICPADS'2018]   Baodong Wu, Shigang Li*, Hang Cao, Yunquan Zhang, He Zhang, Junmin Xiao, and Minghua Zhang. AGCM3D: A Highly Scalable Finite-Difference Dynamical Core of Atmospheric General Circulation Model Based on 3D Decomposition. In 2018 IEEE 24th International Conference on Parallel and Distributed Systems, pp. 355-364. IEEE, 2018. (Corresponding Author) [Paper][Slides]
  • [PPoPP'2017]   Shigang Li, Yunquan Zhang, and Torsten Hoefler. Cache-oblivious MPI all-to-all communications on many-core architectures. Poster, ACM SIGPLAN Notices 52, no. 8 (2017): 445-446. [Paper]
  • [CPC'2017]   Changjun Hu, Xianmeng Wang, Jianjiang Li, Xinfu He, Shigang Li, Yangde Feng, Shaofeng Yang, and He Bai. Kernel optimization for short-range molecular dynamics. Computer Physics Communications 211 (2017): 31-40.
  • [CPC'2017]   Baodong Wu, Shigang Li*, Yunquan Zhang, and Ningming Nie. Hybrid-optimization strategy for the communication of large-scale Kinetic Monte Carlo simulation. Computer Physics Communications 211 (2017): 113-123. (Corresponding Author)
  • [TACO'2016]   Yunquan Zhang, Shigang Li*, Shengen Yan*, and Huiyang Zhou. A cross-platform spmv framework on many-core architectures. ACM Transactions on Architecture and Code Optimization (TACO) 13, no. 4 (2016): 1-25. (Corresponding Author) [Paper]
  • [PIEEE'2016]   Yunquan Zhang, Ting Cao, Shigang Li, Xinhui Tian, Liang Yuan, Haipeng Jia, and Athanasios V. Vasilakos. Parallel processing systems for big data: a survey. Proceedings of the IEEE 104, no. 11 (2016): 2114-2136.
  • [SCI CHINA INFORM SCI'2015]   Shigang Li, ChangJun Hu, JunChao Zhang, and YunQuan Zhang. Automatic tuning of sparse matrix-vector multiplication on multicore clusters. Science China Information Sciences 58, no. 9 (2015): 1-14.
  • [HPCC'2015]   Shigang Li, Yunquan Zhang, Chunyang Xiang, and Lei Shi. Fast convolution operations on many-core architectures. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, pp. 316-323. IEEE, 2015. [Slides]
  • [CCGrid'2015]   Xiaomin Zhu, Junchao Zhang, Kazutomo Yoshii, Shigang Li, Yunquan Zhang, and Pavan Balaji. Analyzing MPI-3.0 process-level shared memory: A case study with stencil computations. In 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Workshop, pp. 1099-1106. IEEE, 2015.
  • [CLUSTER COMPUT'2014]   Shigang Li, Torsten Hoefler, Chungjin Hu, and Marc Snir. Improved MPI collectives for MPI processes in shared address spaces. Cluster computing 17, no. 4 (2014): 1139-1155.
  • [HPDC'2013]   Shigang Li, Torsten Hoefler, and Marc Snir. NUMA-aware shared-memory collective communication for MPI. In Proceedings of the 22nd international symposium on High-performance parallel and distributed computing, pp. 85-96. 2013. (Acceptance rate: 15%, 20/131; best paper nomination, 3/20) [Paper]
  • [PDP'2013]   Shigang Li, Jingyuan Hu, Xin Cheng, and Chongchong Zhao. Asynchronous work stealing on distributed memory systems. In 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 198-202. IEEE, 2013.
  • [ICA3PP'2011]   Shigang Li, Shucai Yao, Haohu He, Lili Sun, Yi Chen, and Yunfeng Peng. Extending synchronization constructs in openMP to exploit pipeline parallelism on heterogeneous multi-core. In International Conference on Algorithms and Architectures for Parallel Processing, Workshop, pp. 54-63. Springer, Berlin, Heidelberg, 2011.
  • [ICCS'2011]   Yunfeng Peng, Changjun Hu, Chongchong Zhao, Shigang Li, and Shucai Yao. Management of Non-functional Attributes of Parallel Components. Procedia Computer Science 4 (2011): 461-470.
  • [ICA3PP'2010]   Qian Cao, Changjun Hu, Haohu He, Xiang Huang, and Shigang Li. Support for OpenMP tasks on cell architecture. In International Conference on Algorithms and Architectures for Parallel Processing, Workshop, pp. 308-317. Springer, Berlin, Heidelberg, 2010.


  • Project Leader — MPI Model Extension and Performance Optimization for Many-Core Clusters, National Natural Science Foundation of China
  • Project Leader — MPI Communication Optimization for Irregular Parallel Algorithms, State Key Laboratory of Computer Architecture Foundation
  • Technical Principal — High Performance Deep Learning Library Development on CPU and GPU Architectures, IT Company Foundation
  • Technical Principal — Large-Scale Deep Learning Training System on Heterogeneous Parallel Machines, IT Company Foundation



  • Associate editor of Cluster Computing (CLUS) - Springer
  • Reviewer of IEEE Transactions on Parallel and Distributed Systems (TPDS)
  • Reviewer of IEEE Transactions on Services Computing (TSC)
  • Reviewer of IEEE Transactions on Big Data (TBD)
  • Reviewer of Journal of Parallel and Distributed Computing (JPDC) - Elsevier
  • Reviewer of Journal of Supercomputing - Springer
  • Reviewer of Concurrency and Computation: Practice and Experience
  • Reviewer of IEEE Transactions on Circuits and Systems II: Express Briefs
  • Reviewer of Mobile Networks and Applications – Springer
  • Program Committee Member, IEEE TPDS Special Section on Parallel and Distributed Computing Techniques for AI, ML, and DL
  • Publications Chair, IISWC 2020
  • Workshop Co-chair, ICS 2018
  • Program Committee Member, SC 2021
  • Program Committee Member, IEEE Cluster 2021
  • Program Committee Member, PMAM 2021
  • Program Committee Member, ICPADS 2018
  • Program Committee Member, IPDPS 2021, 2018, 2017
  • Program Committee Member, HPC Asia 2021, 2020, 2019, 2018
  • Program Committee Member, ICPP 2017
  • Program Committee Member, HPC China 2019, 2018, 2017, 2016
  • Program Committee Member, SBAC-PAD 2020, 2016
  • Program Committee Member, HP3C 2020, 2019, 2018
  • Program Committee Member, INFOCOMP 2020