Database Lab Research Seminar
Announcements
If you are interested in database research, you are welcome to join us in our weekly database seminar. Every week a local or visiting researcher gives a talk on their research topic.
Location/Time: In Fall 2022, the seminar is held on Friday 1:00-2:20pm PT as a mixed modality meeting at 4217 CSE and sometimes on Zoom, organized by Arun Kumar. The Zoom link will be emailed to the DBTalks mailing list.
Course Number: If you are a student and plan to attend, you can enroll in CSE 239A for 1 credit.
Mailing List: Announcements about the seminar are sent to the DBTalks mailing list. To get subscribed, please submit this Google Form.
DB Seminars
Milvus: A Cloud-Native Vector Database
Frank Liu (Zilliz) • Oct 28, 2022
The total amount of digital data generated worldwide is increasing at a rapid rate. Simultaneously, approximately 80% (and growing) of this newly generated data is...
Read more
Dream the Stream: High Velocity Event Processing with a Converged Database
Shasank Chavan (Oracle) • Oct 21, 2022
Event stream processing is a rapidly growing category of workloads including IoT, Timeseries, Clickstream, Quality Control, Security, Auditing, Metrics, and Monitoring, etc. Analysts estimate the...
Read more
Autonomics in Amazon Redshift
Dr. Chunbin Lin (Amazon Web Services) • Oct 7, 2022
Amazon Redshift is Amazon’s petabyte-scale data warehouse service. It uses machine learning techniques in multiple areas of the service, e.g., automatic workload management. In this...
Read more
SpeakQL2: A Dialect System for Improving Speech-driven Querying of Structured Data
Kyle Luoma (UC San Diego) • Sep 30, 2022
SpeakQL2 builds upon prior work done within the ADALab on a speech + touch SQL query interface designed to enable effective SQL querying against databases...
Read more
DataPrep: Accelerate Data Preparation for AI
Dr. Jiannan Wang (Simon Fraser University) • Dec 1, 2021
Data scientists have been complaining about data preparation (data collection –> data understanding –> data cleaning –> data enrichment –> data integration –> feature engineering)...
Read more
Efficient and Reliable Query Processing using Machine Learning
Daniel Kang (Stanford University) • Nov 17, 2021
Given the rise of increasingly powerful models, machine learning (ML) can now be used to answer a range of queries over unstructured data (e.g., videos,...
Read more
A 3-year History of Instance Optimized DB Research at Microsoft
Dr. Umar Farooq Minhas (Microsoft Research) • Nov 3, 2021
Modern systems need to handle a variety of workloads and use cases. It is very difficult for one system architecture to cater to these use...
Read more
Hydra: A Data System for Large Multi-Model Deep Learning
Kabir Nagrecha (UC San Diego) • Oct 27, 2021
Recent advances in deep learning (DL) architectures have improved model quality in a variety of domains, but have come at the expense of a substantial...
Read more
Accelerating Analytic Queries on Oracle In-Memory Database
Dr. Weiwei Gong (Oracle) • Oct 18, 2021
Oracle In-Memory database was first released in 2014, with its unique dual-format architecture, Oracle Database In-Memory transparently accelerates analytics queries by orders of magnitude, and...
Read more
Self-Driving Database Management Systems: Forecasting, Modeling, And Planning
Dr. Lin Ma (CMU) • Oct 6, 2021
Database management systems (DBMSs) are an important part of modern data-driven applications. However, they are notoriously difficult to deploy and administer because they have many...
Read more
Programmatically Building & Managing Training Data with Snorkel
Dr. Alex Ratner (UW-Seattle and Snorkel AI) • Dec 4, 2020
One of the key bottlenecks in building machine learning systems is creating and managing the massive training datasets that today’s models require. In this talk,...
Read more
Responsible Data Management
Dr. Julia Stoyanovich (NYU) • Nov 20, 2020
The need for responsible data management intensifies with the growing impact of data on society. One central locus of the societal impact of data are...
Read more
Systems for Human Data Interaction
Dr. Eugene Wu (Columbia University) • Nov 13, 2020
The rapid democratization of data has placed its access and analysis in the hands of the entire population. While the advances in rapid and large-scale...
Read more
Elements of Learning Systems
Dr. Tianqi Chen (CMU and OctoML) • Nov 6, 2020
Data, models, and computing are the three pillars that enable machine learning to solve real-world problems at scale. Making progress on these three domains requires...
Read more
Interpretable Data Analysis with Explanations and Causality
Dr. Sudeepa Roy (Duke University) • Oct 30, 2020
In current times, data is considered synonymous with knowledge, profit, power, and entertainment, requiring development of new techniques to extract useful information and insights from...
Read more
The Socio-Technical Phenomena of Data Integration and Knowledge Graph
Dr. Juan Sequeda (Data.World) • Oct 23, 2020
Data Integration has been an active area of computer science research for over two decades. A modern manifestation is as Knowledge Graphs which integrates not...
Read more
An Exabyte scale global data infrastructure for CMS@LHC
Prof. Frank Wuerthwein (UCSD Physics and HDSI) • Oct 16, 2020
The science program at the Large Hadron Collider at CERN is preparing to produce, distribute, and access an Exabyte of new data per year starting...
Read more
Grouped Learning: Group-By Machine Learning Model Selection Workloads
Side Li (UCSD CSE) • Oct 9, 2020
ML practitioners routinely build separate models for data subsets based on some specified attribute(s), e.g., one model per state. We call this practice “ML over...
Read more
Vista: An End-to-end Declarative Transfer Learning System for Multimodal Analytics with Deep Neural Networks
Advitya Gemawat (UCSD HDSI) • Oct 9, 2020
Scalable systems for ML are largely siloed into dataflow systems for structured data and DL systems for unstructured data. This gap has left workloads that...
Read more
Cerebro: A Layered Data Platform for Scalable Deep Learning
Yuhao Zhang and Supun Nakandala (UCSD CSE) • Oct 2, 2020
Deep learning (DL) is gaining popularity across myriad domains due to the new ubiquity of unstructured data, tools such as TensorFlow, and easier access to...
Read more
Panorama: A Data System for Unbounded Vocabulary Querying over Video
Yuhao Zhang • Jun 3, 2020
Deep convolutional neural networks (CNNs) achieve state-of-the-art accuracy for many computer vision tasks. But using them for video monitoring applications incurs high computational cost and...
Read more
SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data
Vraj Shah • Jun 3, 2020
Speech-driven querying is becoming popular in new device environments such as smartphones, tablets, and even conversational assistants. However, such querying is largely restricted to natural...
Read more
Towards Scalable Hybrid Stores: Constraint-Based Rewriting to the Rescue
Rana Alotaibi • May 27, 2020
Big data applications increasingly involve diverse datasets, conforming to different data models. Such datasets are routinely hosted in heterogeneous stores, each capable of handling one...
Read more
Vista: Declarative Feature Transfer from Deep CNNs at Scale
Supun Nakandala • May 20, 2020
Scalable systems for machine learning (ML) are largely siloed into dataflow systems for structured data and deep learning systems for unstructured data. This gap has...
Read more
Knowledge Graph Use Cases at Intuit
Jay Yu (Intuit) • May 20, 2020
Intuit, the leading financial software/service company behind TurboTax, Mint and Quickbooks, is embarking on a multi-year transformational journey into an AI-driven Expert Platform to help...
Read more
From Data to Models and Back: Experiences from Google's Production ML Pipelines
Alkis Polyzotis (Google Research) • May 13, 2020
Building a good ML model requires good input data. Conversely, debugging a model inevitably involves data debugging and understanding. In this talk, I will present...
Read more
Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML
Carlo Curino and Kostas Karanasos (Microsoft Jim Gray Systems Lab) • May 6, 2020
Machine learning (ML) has proven itself in high-value web applications such as search ranking and is emerging as a powerful tool in a much broader...
Read more
Analysis of data-driven workflows
Prof. Victor Vianu • Apr 15, 2020
Software systems centered around databases have become pervasive in a wide variety of applications, including health-care management, e-commerce, business processes, scientific workflows, and e-government. Such...
Read more
Automatic Verification of Database-powered Workflows
Allessandro Gianola (Visitng PhD student from Free University of Bolzano) • Feb 21, 2020
During the last two decades, a huge body of research has been dedicated to the challenging problem of reconciling data and process management within contemporary...
Read more