Projects

Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling

Published in Proceedings of the 29th Symposium on Operating Systems Principles (SOSP 23), 2023

Artifacts available at https://github.com/siasosp23/artifacts

Recommended citation: Suhas Jayaram Subramanya, Daiyaan Arfeen, Shouxu Lin, Aurick Qiao, Zhihao Jia, and Gregory R. Ganger. 2023. Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling. In Proceedings of the 29th Symposium on Operating Systems Principles (SOSP 23). Association for Computing Machinery, New York, NY, USA, 642–657. https://doi.org/10.1145/3600006.3613175 /files/sia-sosp23.pdf

Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

Published in Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21) 2021, 2021

This paper won the Best Paper Award at OSDI 2021. Code available at https://github.com/petuum/adaptdl/tree/osdi21-artifact.

Recommended citation: Qiao A, Choe SK, Subramanya SJ, Neiswanger W, Ho Q, Zhang H, Ganger GR, Xing EP. Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21) 2021. /files/pollux-osdi.pdf

PACEMAKER: Avoiding HeART attacks in storage clusters with disk-adaptive redundancy

Published in Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) 2020, 2020

Code available at https://github.com/thesys-lab/pacemaker-hdfs

Recommended citation: Kadekodi S, Maturana F, Subramanya SJ, Yang J, Rashmi KV, Ganger GR. PACEMAKER: Avoiding HeART attacks in storage clusters with disk-adaptive redundancy. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) 2020 (pp. 369-385). /files/pacemaker-osdi20.pdf

DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node

Published in Neural Information Processing Systems (NeurIPS), 2019

Code available at https://github.com/microsoft/DiskANN

Recommended citation: Suhas Jayaram Subramanya, Devvrit, Rohan Kadekodi, Harsha Vardhan Simhadri, and Ravishankar Krishaswamy. "DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node". In *Advances in Neural Information Processing Systems (NeurIPS) 2019* /files/diskann_neurips19.pdf

BLAS-on-flash: An Efficient Alternative for Large Scale ML Training and Inference?

Published in 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2019

Code available at https://github.com/Microsoft/blas-on-flash

Recommended citation: Suhas Jayaram Subramanya, et al. "BLAS-on-flash: An Efficient Alternative for Large Scale ML Training and Inference?." 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association. /files/blas-on-flash-nsdi19.pdf