Database Lab Research Seminar

Announcements

If you are interested in database research, you are welcome to join us in our weekly database seminar. Every week a local or visiting researcher gives a talk on their research topic.

Location/Time: In Fall 2022, the seminar is held on Friday 1:00-2:20pm PT as a mixed modality meeting at 4217 CSE and sometimes on Zoom, organized by Arun Kumar. The Zoom link will be emailed to the DBTalks mailing list.

Course Number: If you are a student and plan to attend, you can enroll in CSE 239A for 1 credit.

Mailing List: Announcements about the seminar are sent to the DBTalks mailing list. To get subscribed, please submit this Google Form.


DB Seminars

  • Milvus: A Cloud-Native Vector Database

    Frank Liu (Zilliz) •
    The total amount of digital data generated worldwide is increasing at a rapid rate. Simultaneously, approximately 80% (and growing) of this newly generated data is... Read more
  • Dream the Stream: High Velocity Event Processing with a Converged Database

    Shasank Chavan (Oracle) •
    Event stream processing is a rapidly growing category of workloads including IoT, Timeseries, Clickstream, Quality Control, Security, Auditing, Metrics, and Monitoring, etc. Analysts estimate the... Read more
  • Autonomics in Amazon Redshift

    Dr. Chunbin Lin (Amazon Web Services) •
    Amazon Redshift is Amazon’s petabyte-scale data warehouse service. It uses machine learning techniques in multiple areas of the service, e.g., automatic workload management. In this... Read more
  • SpeakQL2: A Dialect System for Improving Speech-driven Querying of Structured Data

    Kyle Luoma (UC San Diego) •
    SpeakQL2 builds upon prior work done within the ADALab on a speech + touch SQL query interface designed to enable effective SQL querying against databases... Read more
  • DataPrep: Accelerate Data Preparation for AI

    Dr. Jiannan Wang (Simon Fraser University) •
    Data scientists have been complaining about data preparation (data collection –> data understanding –> data cleaning –> data enrichment –> data integration –> feature engineering)... Read more
  • Efficient and Reliable Query Processing using Machine Learning

    Daniel Kang (Stanford University) •
    Given the rise of increasingly powerful models, machine learning (ML) can now be used to answer a range of queries over unstructured data (e.g., videos,... Read more
  • A 3-year History of Instance Optimized DB Research at Microsoft

    Dr. Umar Farooq Minhas (Microsoft Research) •
    Modern systems need to handle a variety of workloads and use cases. It is very difficult for one system architecture to cater to these use... Read more
  • Hydra: A Data System for Large Multi-Model Deep Learning

    Kabir Nagrecha (UC San Diego) •
    Recent advances in deep learning (DL) architectures have improved model quality in a variety of domains, but have come at the expense of a substantial... Read more
  • Accelerating Analytic Queries on Oracle In-Memory Database

    Dr. Weiwei Gong (Oracle) •
    Oracle In-Memory database was first released in 2014, with its unique dual-format architecture, Oracle Database In-Memory transparently accelerates analytics queries by orders of magnitude, and... Read more
  • Self-Driving Database Management Systems: Forecasting, Modeling, And Planning

    Dr. Lin Ma (CMU) •
    Database management systems (DBMSs) are an important part of modern data-driven applications. However, they are notoriously difficult to deploy and administer because they have many... Read more
  • Programmatically Building & Managing Training Data with Snorkel

    Dr. Alex Ratner (UW-Seattle and Snorkel AI) •
    One of the key bottlenecks in building machine learning systems is creating and managing the massive training datasets that today’s models require. In this talk,... Read more
  • Responsible Data Management

    Dr. Julia Stoyanovich (NYU) •
    The need for responsible data management intensifies with the growing impact of data on society. One central locus of the societal impact of data are... Read more
  • Systems for Human Data Interaction

    Dr. Eugene Wu (Columbia University) •
    The rapid democratization of data has placed its access and analysis in the hands of the entire population. While the advances in rapid and large-scale... Read more
  • Elements of Learning Systems

    Dr. Tianqi Chen (CMU and OctoML) •
    Data, models, and computing are the three pillars that enable machine learning to solve real-world problems at scale. Making progress on these three domains requires... Read more
  • Interpretable Data Analysis with Explanations and Causality

    Dr. Sudeepa Roy (Duke University) •
    In current times, data is considered synonymous with knowledge, profit, power, and entertainment, requiring development of new techniques to extract useful information and insights from... Read more
  • The Socio-Technical Phenomena of Data Integration and Knowledge Graph

    Dr. Juan Sequeda (Data.World) •
    Data Integration has been an active area of computer science research for over two decades. A modern manifestation is as Knowledge Graphs which integrates not... Read more
  • An Exabyte scale global data infrastructure for CMS@LHC

    Prof. Frank Wuerthwein (UCSD Physics and HDSI) •
    The science program at the Large Hadron Collider at CERN is preparing to produce, distribute, and access an Exabyte of new data per year starting... Read more
  • Grouped Learning: Group-By Machine Learning Model Selection Workloads

    Side Li (UCSD CSE) •
    ML practitioners routinely build separate models for data subsets based on some specified attribute(s), e.g., one model per state. We call this practice “ML over... Read more
  • Vista: An End-to-end Declarative Transfer Learning System for Multimodal Analytics with Deep Neural Networks

    Advitya Gemawat (UCSD HDSI) •
    Scalable systems for ML are largely siloed into dataflow systems for structured data and DL systems for unstructured data. This gap has left workloads that... Read more
  • Cerebro: A Layered Data Platform for Scalable Deep Learning

    Yuhao Zhang and Supun Nakandala (UCSD CSE) •
    Deep learning (DL) is gaining popularity across myriad domains due to the new ubiquity of unstructured data, tools such as TensorFlow, and easier access to... Read more
  • Panorama: A Data System for Unbounded Vocabulary Querying over Video

    Yuhao Zhang •
    Deep convolutional neural networks (CNNs) achieve state-of-the-art accuracy for many computer vision tasks. But using them for video monitoring applications incurs high computational cost and... Read more
  • SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data

    Vraj Shah •
    Speech-driven querying is becoming popular in new device environments such as smartphones, tablets, and even conversational assistants. However, such querying is largely restricted to natural... Read more
  • Towards Scalable Hybrid Stores: Constraint-Based Rewriting to the Rescue

    Rana Alotaibi •
    Big data applications increasingly involve diverse datasets, conforming to different data models. Such datasets are routinely hosted in heterogeneous stores, each capable of handling one... Read more
  • Vista: Declarative Feature Transfer from Deep CNNs at Scale

    Supun Nakandala •
    Scalable systems for machine learning (ML) are largely siloed into dataflow systems for structured data and deep learning systems for unstructured data. This gap has... Read more
  • Knowledge Graph Use Cases at Intuit

    Jay Yu (Intuit) •
    Intuit, the leading financial software/service company behind TurboTax, Mint and Quickbooks, is embarking on a multi-year transformational journey into an AI-driven Expert Platform to help... Read more
  • From Data to Models and Back: Experiences from Google's Production ML Pipelines

    Alkis Polyzotis (Google Research) •
    Building a good ML model requires good input data. Conversely, debugging a model inevitably involves data debugging and understanding. In this talk, I will present... Read more
  • Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML

    Carlo Curino and Kostas Karanasos (Microsoft Jim Gray Systems Lab) •
    Machine learning (ML) has proven itself in high-value web applications such as search ranking and is emerging as a powerful tool in a much broader... Read more
  • Analysis of data-driven workflows

    Prof. Victor Vianu •
    Software systems centered around databases have become pervasive in a wide variety of applications, including health-care management, e-commerce, business processes, scientific workflows, and e-government. Such... Read more
  • Automatic Verification of Database-powered Workflows

    Allessandro Gianola (Visitng PhD student from Free University of Bolzano) •
    During the last two decades, a huge body of research has been dedicated to the challenging problem of reconciling data and process management within contemporary... Read more