Dynamically Creating Big Data Processing Centers – a Large Hadron Collider Case Study

Frank Würthwein, University of California at San Diego (UCSD)

The LHC experiments developed a global single sign-on system to support distributed data analysis at unprecedented scales. Several petabytes of data are transferred each day to feed the several 100k CPUs of this global infrastructure. Up until recently, the large experiments, ATLAS and CMS, owned and controlled the infrastructure they operated on, and accessed data only when it is locally available on the hardware they operated. As the big experiments expand their data taking rates in the future from a design rate of 150Hz to 1000Hz in 2015, and possibly 10kHz in 2020, this operational model is no longer viable to satisfy peak processing needs. Large scale processing centers need to be created dynamically, and a much more diverse set of data access paradigms need to be supported. In this talk we discuss this transition from the perspective of the CMS experiment. We describe both the conceptual changes necessary as well as practical experience gained from processing a 125TB CMS dataset using the Gordon Supercomputer at the San Diego Supercomputer Center. We discuss how a large scale processing center was created dynamically, and seamlessly integrated into the CMS production system.