Publications

(2024). FaaSMem: Improving Memory Efficiency of Serverless Computing with Memory Pool Architecture. In ASPLOS2024 (CCF-A).

PDF Cite

(2023). Maximizing the Utilization of GPUs Used by Cloud Gaming through Adaptive Co-location with Combo. In SoCC2023 (CCF-B) (Corresponding author).

PDF Cite

(2023). Improving Cluster Utilization Through Adaptive Resource Management for Deep Neural Network and CPU Jobs Colocation. In TC2023 (CCF-A).

PDF Cite

(2022). ISPA: Exploiting Intra-SM Parallelism in GPUs via Fine-grained Resource Management. In TC2022 (CCF-A).

PDF Cite

(2022). DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Services on GPUs. In ATC2022 (CCF-A).

PDF Cite

(2022). Tacker:Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS. In HPCA2022 (CCF-A).

PDF Cite

(2021). Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction. In SC2021 (CCF-A).

PDF Cite

(2021). Exploiting Intra-SM Parallelism in GPUs via Persistent and Elastic Blocks. In ICCD2021 (CCF-B).

PDF Cite

(2020). E2bird: Enhanced Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services. In TPDS2020 (CCF-A).

PDF Cite

(2020). CODA: Improving Resource Utilization by Slimming and Co-locating DNN and CPU Jobs. In ICDCS2020 (CCF-B).

PDF Cite

(2019). Bandwidth and Locality Aware Task-stealing for Manycore Architectures with Bandwidth-Asymmetric Memory. In TACO2019 (CCF-A).

PDF Cite