Draconis: Network-Accelerated Scheduling for Microsecond-Scale Workloads

Created with Sketch.

We present Draconis, a novel scheduler for workloads in the range of tens to hundreds of microseconds. Draconis challenges the popular belief that programmable switches cannot house the complex data structures, such as queues, needed to support an in-network scheduler. Using programmable switches, Draconis achieves the low scheduling tail latency and high throughput needed to support these microsecond scale workloads on large clusters. Furthermore, Draconis supports a wide range of complex scheduling policies, including locality-aware scheduling, priority-based scheduling, and resource-based scheduling.

Draconis reduces the 99th percentile scheduling latencies by 3×–200× when compared to state-of-the-art software-based and network-accelerated schedulers, on a range of synthetic workloads. Our evaluation also demonstrates that Draconis has 52× higher throughput than server-based scheduling systems.

Downloads


People


Publications


[1] Draconis: Network-Accelerated Scheduling for Microsecond-Scale Workloads
Sreeharsha Udayashankar, Ashraf Abdel-Hadi, Ali Mashtizadeh, Samer Al-Kiswany
The ACM SIGOPS European Conference on Computer Systems (EuroSys), Apr. 2024 [pdf]