The science program at the Large Hadron Collider at CERN is preparing to produce, distribute, and access an Exabyte of new data per year starting around 2028. The science program is fundamentally distributed, with participants from more than 200 institutions in more than 50 countries. Today, the infrastructure is built on a mix of top-down replication of data and bottoms-up application driven transfers. It includes data accesses at varying levels of granularity, from data sets containing thousands of files to objects within files. The more than an order of magnitude increase in data volume poses a number of challenges for the globally distributed infrastructure of the scientific program. This talk discusses these challenges, putting them into perspective with the fundamental science requirements at a level of detail computer scientists and engineers can understand. We are thus providing a very brief (less than 10minutes) intro to experimental particle physics for CS majors, followed by how these science requirements lead to a variety of challenges from data access, to management, to distribution.
Speaker webpage: https://www-physics.ucsd.edu/Directory/Person/323
Public video of talk: https://www.youtube.com/watch?v=owSTTETP7KA