Testing the data framework for an AI algorithm in
preparation for high data rate X-ray facilities
Hongwei Chen1,2,4∗, Sathya R. Chitturi1,2,3∗, Rajan Plumley1,2,5∗, Lingjia Shen1,2, Nathan C. Drucker1,2,6,
Nicolas Burdet1,2, Cheng Peng2, Sougata Mardanya7, Daniel Ratner1, Aashwin Mishra1, Chun Hong Yoon1,
Sanghoon Song1, Matthieu Chollet1, Gilberto Fabbris8, Mike Dunne1, Silke Nelson1, Mingda Li9,
Aaron Lindenberg2,3, Chunjing Jia2, Youssef Nashed1, Arun Bansil4, Sugata Chowdhury7,
Adrian E. Feiguin4, Joshua J. Turner1,2, Jana B. Thayer1
1Linac Coherent Light Source, SLAC National Accelerator Laboratory, Menlo Park, USA
2Stanford Institute for Materials and Energy Sciences, Stanford University, Stanford, USA
3Department of Materials Science and Engineering, Stanford University, Stanford, USA
4Department of Physics, Northeastern University, Boston, USA
5Department of Physics, Carnegie Mellon University, Pittsburgh, USA
6School of Engineering and Applied Sciences, Harvard University, Cambridge, USA
7Department of Physics and Astrophysics, Howard University, Washington, USA
8Advanced Photon Source, Argonne National Laboratory, Argonne, USA
9Department of Nuclear Science & Engineering, The Massachusetts Institute of Technology, Cambridge, USA
Abstract—The advent of next-generation X-ray free electron
lasers will be capable of delivering X-rays at a repetition rate ap-
proaching 1 MHz continuously. This will require the development
of data systems to handle experiments at these type of facilities,
especially for high throughput applications, such as femtosecond
X-ray crystallography and X-ray photon fluctuation spectroscopy.
Here, we demonstrate a framework which captures single shot
X-ray data at the LCLS and implements a machine-learning
algorithm to automatically extract the contrast parameter from
the collected data. We measure the time required to return the
results and assess the feasibility of using this framework at high
data volume. We use this experiment to determine the feasibility
of solutions for ‘live’ data analysis at the MHz repetition rate.
Index Terms—LCLS-II, X-ray, Machine Learning, Experimen-
tal Design, High Performance Computing
I. INTRODUCTION
X-ray Free Electron Laser (XFEL) light sources are scien-
tific user facilities with the goal of investigating atomic scale
processes at ultrafast, femtosecond (1 femtosecond = 10−15
seconds) time-scales. [1]–[6]. In a typical XFEL experiment,
researchers from around the world are given access to the
facility for a limited amount of beamtime to perform experi-
ments at one of its many specialized scientific instrument end-
stations. Awarded beamtime is highly sought-after, and it is
not unusual for users to wait months or years to carry out
their experiments. The experiments are often highly complex,
involving a number of additional capabilities, such as advanced
laser systems, cryogenics, precision motion, ultra-high vac-
uum, high magnetic or electric fields, computational support
through both controls and analysis, and highly sensitive sam-
ples. Furthermore, the measurements involving the incident X-
ray laser radiation must be remotely controlled from outside
*Authors Contributed Equally
the local laboratory setup for the safety of the experimentalists.
With a period of beamtime access typically lasting between
12-60 hours, it is crucial that tools are in place so that XFEL
users are able to use their time efficiently to collect data and
perform the experiments.
Since their inception, XFELs have attracted much attention
due to the new fields of science which they enable based on
their high pulse intensity, short pulse duration, available X-ray
energies, and coherent properties [7]–[11]. A new generation
of these lasers is currently being operated, constructed or
planned at several sites around the world. With the increase
in repetition rate, entirely new experimental methods will
be made possible. Furthermore, increasing the XFEL repe-
tition rate will also address feasibility challenges for certain
experiments which currently take heroic efforts to perform,
such as photoemission spectroscopy [12], transient grating
spectroscopy [13], and resonant inelastic X-ray scattering [14].
One such technique is X-ray Photon Fluctuation Spec-
troscopy (XPFS), a method whereby one is able to use
temporally separated X-ray pulses to measure fluctuations of
a system by probing changes at different ultrafast timescales
[15]. Although XPFS can provide an unprecedented level of
information in condensed matter systems, the data rates for
a typical experiment at the next-generation machines will be
intractable for realistic fast feedback during an experiment.
This hindrance is due to the fact that ∼106images on multiple
mega-pixel detectors will be collected per second and analyzed
‘on the fly’, where the analysis must keep up with the input
data rate to extract the full value from the experimental time.
This volume of data is a dramatic increase over the current
XFEL data rate of hundreds or thousands of images per
second.
In this paper, we address this problem of the data rate posed
arXiv:2210.10137v1 [physics.data-an] 18 Oct 2022