Improving GPU Sharing Performance through Adaptive Bubbleless Spatial-Temporal Sharing

Abstract

Data centers now allow multiple applications that have lightweight workloads to share a GPU. Existing temporal or spatial sharing systems struggle to provide efficient and accurate quota assignments. We observe that the performance of the multi-user system is often underestimated because of the existence of unused GPU “bubbles” and can be enhanced by squeezing the bubbles. Based on this observation, we design Bless, a bubble-less spatial-temporal sharing GPU system that fine-tunes the GPU resource allocation to improve multi-user performance. Bless leverages precise computing resource management and fine-grained kernel scheduling to ensure stringent quota guarantees and reduce latency fairly for applications with varying GPU quotas. We implement and evaluate Bless with multiple applications and workloads. Our result shows that Bless achieves 21.1% − 37.3% average latency reduction over the state-of-the-art while guaranteeing the promised quota for all applications.

Publication
In European Conference on Computer System