Deep learning (DL) is gaining popularity across myriad domains due to the new ubiquity of unstructured data, tools such as TensorFlow, and easier access to GPUs. But building large-scale DL applications is still too resource-intensive and painful for all but the biggest tech firms. A key reason for this pain is DL’s flexibility, which leads to an expensive model selection process to get DL to work well. Alas, existing DL systems treat this process an afterthought, leading to massive resource wastage and a usability mess. To tackle these issues, we present our vision of a first-of-its-kind data platform for scalable DL, Cerebro, inspired by lessons from the database world. We elevate the DL model selection process with higher-level APIs already inherent in practice and devise a series of novel multi-query optimization techniques to substantially raise resource efficiency. We explain our system design philosophy and architecture, discuss our recent research and open research questions, present initial results, and discuss tangible paths to practical impact.
Yuhao Zhang is a PhD student in CSE at UC San Diego, advised by Prof. Arun Kumar. His research interest focuses on machine learning systems intending to make data science easier and faster. He has been working on data systems for video analytics and deep learning model selection.
Supun Nakandala is a fourth-year PhD student advised by Prof. Arun Kumar at UC San Diego. His research interest lies broadly in the intersection Systems and Machine Learning, an emerging area that is increasingly referred to as Systems for ML. He is a recipient of the 2019 SIGMOD best paper honorable mention award and a 2020 SIGMOD research highlight award. Systems and ideas based on his research have been released as part of the Apache MADlib library and open-source libraries from Microsoft.