Scalable systems for ML are largely siloed into dataflow systems for structured data and DL systems for unstructured data. This gap has left workloads that jointly analyze both forms of data with poor systems support, leading to both low system efficiency and grunt work. We plan to bridge this gap for a class of such workloads: feature transfer from Deep NNs for analyzing unstructured (images, text etc) along with structured data. Vista is a new data system that resolves systems issues by elevating entire transfer learning workloads to a declarative level on top of Parallel Dataflow (PD) and DL systems. Vista automatically optimizes configuration and execution of this workload to reduce computational redundancy and improve workload reliability. In this talk, I talk about some of the extensions I worked on for Vista, along with some of my current efforts to expand Vista to support general transfer learning workloads, along with the broader goal to integrate Vista with Cerebro.
Speaker bio:
Advitya Gemawat is a fourth-year undergraduate student advised by Prof. Arun Kumar at UC San Diego. His research interests lie along the intersections of Machine Learning and scalable Systems, an emerging area which is increasingly referred to as Systems for ML. He’s a recipient of the HDSI Undergraduate Scholarship, Siemens SRC Scholarship, and has been nominated for the CRA Outstanding Undergraduate Research Award (sponsored by Microsoft Research). He’s previously interned at Fidelity International, PwC, HP, and recently at VMware, and has spearheaded hyperparameter optimization and AutoML capabilities in the Apache MADlib library’s Deep Learning module, all of which are scheduled to be released in the next version 1.18.0.