Database Lab Research Seminar
If you are interested in database research, you are welcome to join us in our weekly database seminar. Every week a local or visiting researcher gives a talk on their research topic.
Location/Time: In Fall 2020, the seminar is held on Fridays 3:00-4:00pm PT on Zoom and hosted by Dr. Arun Kumar. The Zoom link will be emailed to the DBTalks mailing list.
Course Number: If you are a student and plan to attend, please enroll in CSE 239A for 1 credit.
Mailing List: Announcements about the seminar are sent to the DBTalks mailing list. To get subscribed, please submit this Google Form.
Programmatically Building & Managing Training Data with Snorkel
Dr. Alex Ratner (UW-Seattle and Snorkel AI) •
One of the key bottlenecks in building machine learning systems is creating and managing the massive training datasets that today’s models require. In this talk,...
Responsible Data Management
Dr. Julia Stoyanovich (NYU) •
The need for responsible data management intensifies with the growing impact of data on society. One central locus of the societal impact of data are...
Systems for Human Data Interaction
Dr. Eugene Wu (Columbia University) •
The rapid democratization of data has placed its access and analysis in the hands of the entire population. While the advances in rapid and large-scale...
Elements of Learning Systems
Dr. Tianqi Chen (CMU and OctoML) •
Data, models, and computing are the three pillars that enable machine learning to solve real-world problems at scale. Making progress on these three domains requires...
Interpretable Data Analysis with Explanations and Causality
Dr. Sudeepa Roy (Duke University) •
In current times, data is considered synonymous with knowledge, profit, power, and entertainment, requiring development of new techniques to extract useful information and insights from...
The Socio-Technical Phenomena of Data Integration and Knowledge Graph
Dr. Juan Sequeda (Data.World) •
Data Integration has been an active area of computer science research for over two decades. A modern manifestation is as Knowledge Graphs which integrates not...
An Exabyte scale global data infrastructure for CMS@LHC
Prof. Frank Wuerthwein (UCSD Physics and HDSI) •
The science program at the Large Hadron Collider at CERN is preparing to produce, distribute, and access an Exabyte of new data per year starting...
Grouped Learning: Group-By Machine Learning Model Selection Workloads
Side Li (UCSD CSE) •
ML practitioners routinely build separate models for data subsets based on some specified attribute(s), e.g., one model per state. We call this practice “ML over...
Vista: An End-to-end Declarative Transfer Learning System for Multimodal Analytics with Deep Neural Networks
Advitya Gemawat (UCSD HDSI) •
Scalable systems for ML are largely siloed into dataflow systems for structured data and DL systems for unstructured data. This gap has left workloads that...
Cerebro: A Layered Data Platform for Scalable Deep Learning
Yuhao Zhang and Supun Nakandala (UCSD CSE) •
Deep learning (DL) is gaining popularity across myriad domains due to the new ubiquity of unstructured data, tools such as TensorFlow, and easier access to...
Panorama: A Data System for Unbounded Vocabulary Querying over Video
Yuhao Zhang •
Deep convolutional neural networks (CNNs) achieve state-of-the-art accuracy for many computer vision tasks. But using them for video monitoring applications incurs high computational cost and...
SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data
Vraj Shah •
Speech-driven querying is becoming popular in new device environments such as smartphones, tablets, and even conversational assistants. However, such querying is largely restricted to natural...
Towards Scalable Hybrid Stores: Constraint-Based Rewriting to the Rescue
Rana Alotaibi •
Big data applications increasingly involve diverse datasets, conforming to different data models. Such datasets are routinely hosted in heterogeneous stores, each capable of handling one...
Vista: Declarative Feature Transfer from Deep CNNs at Scale
Supun Nakandala •
Scalable systems for machine learning (ML) are largely siloed into dataflow systems for structured data and deep learning systems for unstructured data. This gap has...
Knowledge Graph Use Cases at Intuit
Jay Yu (Intuit) •
Intuit, the leading financial software/service company behind TurboTax, Mint and Quickbooks, is embarking on a multi-year transformational journey into an AI-driven Expert Platform to help...
From Data to Models and Back: Experiences from Google's Production ML Pipelines
Alkis Polyzotis (Google Research) •
Building a good ML model requires good input data. Conversely, debugging a model inevitably involves data debugging and understanding. In this talk, I will present...
Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML
Carlo Curino and Kostas Karanasos (Microsoft Jim Gray Systems Lab) •
Machine learning (ML) has proven itself in high-value web applications such as search ranking and is emerging as a powerful tool in a much broader...
Analysis of data-driven workflows
Prof. Victor Vianu •
Software systems centered around databases have become pervasive in a wide variety of applications, including health-care management, e-commerce, business processes, scientific workflows, and e-government. Such...
Automatic Verification of Database-powered Workflows
Allessandro Gianola (Visitng PhD student from Free University of Bolzano) •
During the last two decades, a huge body of research has been dedicated to the challenging problem of reconciling data and process management within contemporary...