Projects

Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling

Published in Proceedings of the 29th Symposium on Operating Systems Principles (SOSP 23), 2023

Artifacts available at https://github.com/siasosp23/artifacts

Recommended citation: Suhas Jayaram Subramanya, Daiyaan Arfeen, Shouxu Lin, Aurick Qiao, Zhihao Jia, and Gregory R. Ganger. 2023. Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling. In Proceedings of the 29th Symposium on Operating Systems Principles (SOSP 23). Association for Computing Machinery, New York, NY, USA, 642–657. https://doi.org/10.1145/3600006.3613175 /files/sia-sosp23.pdf

Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

Published in Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21) 2021, 2021

This paper won the Best Paper Award at OSDI 2021. Code available at https://github.com/petuum/adaptdl/tree/osdi21-artifact.

Recommended citation: Qiao A, Choe SK, Subramanya SJ, Neiswanger W, Ho Q, Zhang H, Ganger GR, Xing EP. Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21) 2021. /files/pollux-osdi.pdf

FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search

Published in ArXiv, 2021

Code available at https://github.com/Microsoft/DiskANN

Recommended citation: Singh A, Subramanya SJ, Krishnaswamy R, Simhadri HV. FreshDiskANN: A fast and accurate graph-based ann index for streaming similarity search. arXiv preprint arXiv:2105.09613. 2021 May 20. /files/freshdiskann-arxiv.pdf

PACEMAKER: Avoiding HeART attacks in storage clusters with disk-adaptive redundancy

Published in Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) 2020, 2020

Code available at https://github.com/thesys-lab/pacemaker-hdfs

Recommended citation: Kadekodi S, Maturana F, Subramanya SJ, Yang J, Rashmi KV, Ganger GR. PACEMAKER: Avoiding HeART attacks in storage clusters with disk-adaptive redundancy. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) 2020 (pp. 369-385). /files/pacemaker-osdi20.pdf

DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node

Published in Neural Information Processing Systems (NeurIPS), 2019

Code available at https://github.com/microsoft/DiskANN

Recommended citation: Suhas Jayaram Subramanya, Devvrit, Rohan Kadekodi, Harsha Vardhan Simhadri, and Ravishankar Krishaswamy. "DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node". In *Advances in Neural Information Processing Systems (NeurIPS) 2019* /files/diskann_neurips19.pdf

BLAS-on-flash: An Efficient Alternative for Large Scale ML Training and Inference?

Published in 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2019

Code available at https://github.com/Microsoft/blas-on-flash

Recommended citation: Suhas Jayaram Subramanya, et al. "BLAS-on-flash: An Efficient Alternative for Large Scale ML Training and Inference?." 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association. /files/blas-on-flash-nsdi19.pdf

BLAS-on-flash: an alternative for training large ML models?

Published in Systems for Machine Learning Conference, 2018

Recommended citation: Suhas Jayaram Subramanya, Srajan Garg, and Harsha Vardhan Simhadri. "BLAS-on-flash: an alternative for training large ML models?." /files/sysml18.pdf

Exploration for Multi-task Reinforcement Learning with Deep Generative Models

Published in Deep Reinforcement Learning Workshop, Neural Information Processing Systems, 2016

Recommended citation: Sai Praveen Bangaru, JS Suhas and Balaraman Ravindran. Exploration for Multi-task Reinforcement Learning with Deep Generative Models. In Neural Information Processing Systems Deep Reinforcement Learning Workshop, 2016. https://arxiv.org/abs/1611.09894