Maximizing the Utilization of GPUs Used by Cloud Gaming through Adaptive Co-location with Combo


Cloud vendors are now providing cloud gaming services with GPUs. GPUs in cloud gaming experience periods of idle because not every frame in a game always keeps the GPU busy for rendering. Previous works temporally co-locate games with best-effort applications to harvest these idle cycles. However, these works ignore the spatial sharing of GPUs, leading to not maximized throughput improvement. The newly introduced RT (ray tracing) Cores inside GPU SMs for ray tracing exacerbate the situation. This paper presents Combo, which efficiently leverages two-level spatial sharing, intra-SM and inter-SM sharing, for throughput improvement while guaranteeing the QoS of rendering games’ frames. Combo is novel in two ways. First, based on the investigation of programming models for RT Cores, Combo devises a neat compilation method to convert the kernels that use RT Cores for fine-grained resource management. We utilize the fine-grained kernel management to construct spatial sharing schemes. Second, since the performance of spatial sharing varies with the actual co-located kernels, two efficient spatial sharing schemes are proposed, exact integrated SM sharing and relaxed intra-SM sharing. In order to maximize the throughput of BE applications, Combo identifies the best-fit scenarios for these two schemes by considering runtime rendering load. Our evaluation shows Combo can achieve up to 38.2% (14.0% on average) throughput improvement compared with the state-of-the-art temporal-only solution.

In 2023 ACM Symposium on Cloud Computing