Han Zhao 赵涵

Han Zhao 赵涵

Assistant Professor

Shanghai Jiao Tong University (SJTU)

Biography

I am an assistant professor at Computer Science and Engineering Department in Shanghai Jiao Tong University (SJTU). I received the Master and Ph.D. degrees from Shanghai Jiao Tong University under the supervision of Prof. Quan Chen and Prof. Minyi Guo. For now, I still work closely with Prof. Quan Chen and Assist Prof. Weihao Cui.

My previous research focused on task scheduling across various architectures, resource management in datacenters, and DNN inference system design. Currently, my research explores cloud computing and deep learning systems, including LLM inference and training systems, serverless architectures for diverse applications, and advanced resource management in datacenters.

I am now looking for perspective Ph.D students and Master Students (Enrollment Date: 2025.09 & 2026.09). If you are interested in above areas, we should talk.

Interests
  • Cloud computing
  • LLM inference system
  • LLM training system
  • Resource management in Datacenter
Education
  • PhD in Computer Science, 2019-2022

    Shanghai Jiao Tong University

  • MSc in Computer Science, 2016-2019

    Shanghai Jiao Tong University

  • BSc in Computer Science, 2012-2016

    Huazhong University of Science and Technology

Recent Publications

(2025). Taming Flexible Job Packing in Deep Learning Training Clusters. In TACO2025 (CCF-A).

PDF Cite

(2025). XPUTIMER: Anomaly Diagnostics for Divergent LLM Training in GPU Clusters of Thousand-Plus Scale. In Arxiv (Under review).

PDF

(2025). ARACHNE: Optimizing Distributed Parallel Applications with Reduced Inter-Process Communication. In TACO2025 (CCF-A).

(2025). Improving GPU Sharing Performance through Adaptive Bubbleless Spatial-Temporal Sharing. In Eurosys2025 (CCF-A).

PDF

(2024). Potamoi: Accelerating neural rendering via a unified streaming architecture. In TACO2024 (CCF-A).

PDF Cite

(2024). Adaptive Kernel Fusion for Improving the GPU Utilization while Ensuring QoS. In TC2024 (CCF-A).

PDF Cite

(2024). Exploiting all intra-SM parallelism to maximize the throughput while ensuring QoS. In Chinese Science Information Science 2024 (CCF-A).

PDF

(2024). FaaSMem: Improving Memory Efficiency of Serverless Computing with Memory Pool Architecture. In ASPLOS2024 (CCF-A).

PDF Cite

(2023). Maximizing the Utilization of GPUs Used by Cloud Gaming through Adaptive Co-location with Combo. In SoCC2023 (CCF-B) (Corresponding author).

PDF Cite

(2023). Improving Cluster Utilization Through Adaptive Resource Management for Deep Neural Network and CPU Jobs Colocation. In TC2023 (CCF-A).

PDF Cite

(2022). ISPA: Exploiting Intra-SM Parallelism in GPUs via Fine-grained Resource Management. In TC2022 (CCF-A).

PDF Cite

(2022). DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Services on GPUs. In ATC2022 (CCF-A).

PDF Cite

(2022). Tacker:Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS. In HPCA2022 (CCF-A).

PDF Cite

(2021). Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction. In SC2021 (CCF-A).

PDF Cite

(2021). Exploiting Intra-SM Parallelism in GPUs via Persistent and Elastic Blocks. In ICCD2021 (CCF-B).

PDF Cite

(2020). E2bird: Enhanced Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services. In TPDS2020 (CCF-A).

PDF Cite

(2020). CODA: Improving Resource Utilization by Slimming and Co-locating DNN and CPU Jobs. In ICDCS2020 (CCF-B).

PDF Cite

(2019). Bandwidth and Locality Aware Task-stealing for Manycore Architectures with Bandwidth-Asymmetric Memory. In TACO2019 (CCF-A).

PDF Cite

Accomplish­ments

CCF体系结构优博奖
SC2021最佳实现奖

Contact