The Database Lab at UC San Diego is one of the leading academic research groups in the field of
data management, spanning the major themes of theory, systems, languages, interfaces, and applications,
as well as intersections with other data-oriented fields.
Areas of particular strength include database theory, data analytics, semistructured and graph data,
data integration and preparation, machine learning systems, causal inference, responsible data science,
query processing and optimization, and data exploration.
Application areas of particular interest have included healthcare, social media, and Internet of Things
Our members span the departments of Computer Science and Engineering
and Halıcıoğlu Data Science Institute.
DB Lab faculty are also affiliated with other research groups, including the
CSE Theory Group,
CSE AI Group,
HDSI AI and ML Group,
HDSI Data Infrastructure and Systems Group,
and Center for Networked Systems.
Alin's research interests include data publishing and integration, specification and verification of DB-powered business processes and semistructured and XML data.
Arun's research interests are in data management and systems for ML/AI-based data analytics. His work focuses on designing abstractions, algorithms, and systems to make it easier and faster to analyze large and complex datasets using ML/AI.
Babak's research interests are in data management, causal inference, responsible data science, and data ethics. His work unifies techniques from DB theory, causal inference, and ML to lay the foundations for decision-making and policy evaluation from complex relational data, algorithmic fairness, explainability, and accountability.
Victor's research interests are in DB systems and theory. His current work focuses on verification of DB-driven systems, at the intersection of DB and computer-aided verification, and on automatic verification of interactive data-driven Web services and business processes. He is also interested in the theory of query languages and computational logic.
Kamalika's research interests lie in the area of ML. Much of her work is on privacy-preserving ML and unsupervised learning, but she's also broadly interested in learning theory, including confidence-rated prediction, online learning, and active learning.
Yannis' research extends the capabilities of data platforms and query processors. He has published over 100 research articles with more than 14,000 citations.
Baharan's research interests lie in fair and explainable machine learning, debiasing the data, data cleaning for ML, and causal inference in relational data using different techniques like graph representation learning.
Animesh's research interests are in the areas of NLP and ML systems, with a focus on developing scalable and compute-efficient LLMs to improve training and inference with limited resources. His most recent work aims to develop improved benchmark standards for LLMs and create frameworks to analyze highly correlated text documents.
Kyle's research interests lie in the use of data science and data engineering methods within the government domain, specifically making data-based solution engineering within the Department of Defense more accessible to data analysts and researchers via novel technologies. His current work is on speech-based querying of relational databases.
Xiuwen's research interests lie in heterogeneous data management, polystore system, query optimization and AI in DB.
Jiongli's research interests lie in data cleaning and debiasing for machine learning applications.
Ilkay Altintas is the Director for the Center of Excellence in Workflows for Data Science at the San Diego Supercomputer Center (SDSC), UCSD.
Amarnath Gupta is a Research Scientist at the San Diego Supercomputer Center (SDSC) of the University of California San Diego.
June 2024
Kabir and Yuhao walk at the PhD commencement. Congrats and best wishes to them for their careers!
June 2024
Yuhao receives an ACM SIGMOD Distinguished PC Member award. Arun receives an ACM SIGMOD Distinguished Associate Editor award.
May 2024
Babak receives an NSF CAREER Award; link to news article .
August 2023
Yannis Papakonstantinou joins Google full time and switches to adjunct faculty at UC San Diego. Best wishes, Yannis!
April 2023
Supun receives the coveted ACM SIGMOD Jim Gray Doctoral Dissertation Award! He is the first recipient in the UCSD DB Lab's history, and this is the first time this award goes to work in the fast-growing arena of ML Systems. Congrats, Supun!
How do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses
Vraj Shah, Thomas Parashos, and Arun Kumar
2024 ; VLDB
Link to paper
Saturn: An Optimized Data System for Multi-Large-Model Deep Learning Workloads
Kabir Nagrecha and Arun Kumar
2024 ; VLDB
Link to paper
OTClean: Data Cleaning for Conditional Independence Violations using Optimal Transport
Alireza Pirhadi, Mohammad Hossein Moslemi, Alexander Cloninger, Mostafa Milani, and Babak Salimi
2024 ; SIGMOD
Link to paper
Consistent Range Approximation for Fair Predictive Modeling
Jiongli Zhu, Sainyam Galhotra, Nazanin Sabri, and Babak Salimi
2023 ; VLDB
Link to paper
Causal Data Integration.
Brit Youngmann, Michael Cafarella, Babak Salimi, and Anna Zeng
2023 ; VLDB
Link to paper
Lotan: Bridging the Gap between GNNs and Scalable Graph Analytics Engines
Yuhao Zhang and Arun Kumar
2023 ; VLDB
Link to paper
NEXUS: On Explaining Confounding Bias
Brit Youngmann, Michael Cafarella, Yuval Moskovitch, and Babak Salimi
2023 ; SIGMOD
Link to paper
On Explaining Confounding Bias
Brit Youngmann, Michael Cafarella, Babak Salimi, and Yuval Moskovitch
2023 ; ICDE
Link to paper
Causal What-If and How-To Analysis Using Hyper
Fangzhu Shen, Kayvon Heravi, Oscar Gomez, Sainyam Galhotra, Amir Gilad, Sudeepa Roy, and Babak Salimi
2023 ; ICDE
Link to paper
Database-Aware ASR Error Correction for Speech-to-SQL Parsing
Yutong Shao, Arun Kumar, and Ndapandula Nakashole
2023 ; IEEE ICASSP
Link to paper