Publications

BLAS-on-flash: An Efficient Alternative for Large Scale ML Training and Inference?

Published in 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2019

Many large scale machine learning training and inference tasks are memory-bound rather than compute-bound. That is, on large data sets, the working set of these algorithms does not fit in memory for jobs that could run overnight on a few multi-core processors. This often forces an expensive redesign of the algorithm for distributed platforms such as parameter servers and Spark. We propose an inexpensive and efficient alternative based on the observation that many ML tasks admit algorithms that can be programmed with linear algebra subroutines. A library that supports BLAS and sparseBLAS interface on large SSD-resident matrices can enable multi-threaded code to scale to industrial scale datasets on a single workstation. e demonstrate that not only can such a library provide near in-memory performance for BLAS, but can also be used to write implementations of complex algorithms such as eigensolvers that outperform in-memory (ARPACK) and distributed (Spark) counterparts. Existing multi-threaded in-memory code can link to our library with minor changes and scale to hundreds of gigabytes of training or inference data at near in-memory processing speeds. We demonstrate this with two industrial scale use cases arising in ranking and relevance pipelines: training large scale topic models and inference for extreme multi-label learning. This suggests that our approach could be an efficient alternative to expensive distributed big-data systems for scaling up structurally complex machine learning tasks.

Recommended citation: Subramanya, Suhas Jayaram, et al. "BLAS-on-flash: An Efficient Alternative for Large Scale ML Training and Inference?." 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association. http://harsha-simhadri.org/pubs/nsdi19_final.pdf

BLAS-on-flash: an alternative for training large ML models?

Published in Systems for Machine Learning Conference, 2018

Many ML training tasks admit learning algorithms that can be composed with linear algebra. On large datasets, the working set of these algorithms overflows the memory. For such scenarios, we propose a library that supports BLAS and sparseBLAS subroutines on large matrices resident on inexpensive non-volatile memory. We demonstrate that such libraries can achieve near in-memory performance and be used for fast implementations of complex algorithms such as eigen-solvers. We believe that this approach could be a cost- effective alternative to expensive big-data compute system

Recommended citation: Subramanya, Suhas Jayaram, Srajan Garg, and Harsha Vardhan Simhadri. "BLAS-on-flash: an alternative for training large ML models?." https://www.sysml.cc/doc/207.pdf

Exploration for Multi-task Reinforcement Learning with Deep Generative Models

Published in Deep Reinforcement Learning Workshop, Neural Information Processing Systems, 2016

Exploration in multi-task reinforcement learning is critical in training agents to deduce the underlying MDP. Many of the existing exploration frameworks such as E3, Rmax, Thompson sampling assume a single stationary MDP and are not suitable for system identification in the multi-task setting. We present a novel method to facilitate exploration in multi-task reinforcement learning using deep generative models. We supplement our method with a low dimensional energy model to learn the underlying MDP distribution and provide a resilient and adaptive exploration signal to the agent. We evaluate our method on a new set of environments and provide intuitive interpretation of our results.

Recommended citation: Sai Praveen Bangaru, JS Suhas and Balaraman Ravindran. Exploration for Multi-task Reinforcement Learning with Deep Generative Models. In Neural Information Processing Systems Deep Reinforcement Learning Workshop, 2016. https://arxiv.org/abs/1611.09894