Improving the Multi-Tenancy GPU Performance through Adaptive Bubbleless Spatial-Temporal Sharing

Abstract

In serverless computing, an idle container is not recycled directly, in order to mitigate time-consuming cold container startup. These idle containers still occupy the memory, exasperating the memory shortage of today’s data centers. By offloading their cold memory to remote memory pool could potentially resolve this problem. However, existing offloading policies either hurt the Quality of Service (QoS) or are too coarse-grained in serverless computing scenarios. We therefore propose FaaSMem, a dedicated memory offloading mechanism tailored for serverless computing with memory poor architecture. It is proposed based on our finding that the memory of a serverless container allocated in different stages has different usage patterns. Specifically, FaaSMem proposes Page Bucket (Pucket) to segregate the memory pages in different segments, and applies segment-wise offloading policies for them. FaaSMem also proposes a semi-warm period during keep-alive stage, to seek a sweet spot between the offloading effort and the remote access penalty. Experimental results show that FaaSMem reduces the average local memory footprint by 9.9% - 79.8% and improves the container deployment density to 108% - 218%, with negligible 95%-ile latency increase.

Publication
In The International Conference on Architectural Support for Programming Languages and Operating Systems