milk quality parameters, such as concentrations of fat, protein, casein, and lactose. These
parameters are used for milk quality-based payment schemes, genetic and genomic selection,
and as farmers’ support tool. Spectral information generated from MIRS analysis have also
proven to be effective in predicting fine milk quality parameters, including protein fractions,
free amino acids [Bonfatti et al.,2011;McDermott et al.,2016], individual and groups of fatty
acids [Soyeurt et al.,2006;Fleming et al.,2017], milk processing traits [Ferragina et al.,2013;
Visentin et al.,2015], animal-related characteristics [McParland et al.,2014;Shetty et al.,2017;
Ho et al.,2019], and can be used as a tool for the verification of the authenticity of agricultural
foods [Cozzolino,2012]. A more extended list of applications of MIRS in the dairy science
framework can be retrieved from the reviews by De Marchi et al. [2014] and Tiplady et al.
[2020].
The two-day event “International Workshop on Spectroscopy and Chemometrics” was orga-
nized by Vistamilk SFI Research Centre in April 2022, following its first edition held in 2021
[Frizzarin et al.,2021a]. The workshop focused on describing the main challenges and appli-
cations of near and mid-infrared spectroscopy in food, animal, and agricultural sciences with
internationally recognised researchers. Moreover, participants, on a voluntary basis, were pro-
vided with a large dataset containing individual cow milk spectra with the sole information on
animal’s diet for a chemometric data competition. Such data presented many challenges from
a methodological and statistical point of view, due to the high dimensionality of the spectral
matrices, and strong collinearity between adjacent spectral wavelengths. The chemometric chal-
lenge, therefore, encouraged the engagement of participants with different background and skills
and required the application of different statistical and machine learning strategies.
The purpose of the data challenge was to develop a model to predict the diet fed to dairy
cows by exploiting mid-infrared spectral information. Participants, or groups of participants,
were required to apply their developed model to a test set containing only individual milk spectra
and to submit their prediction of animals’ diet. Although the participation to the chemometric
challenge was extremely high among participants, only the best six contributions, in terms of
accuracy of prediction and methodological innovativeness, were selected to present their results
both at the workshop and in the present manuscript.
2 Data description and challenge
A dataset consisting of 4,364 individual milk spectra from 120 cows was collected between May
and August in 2015, 2016 and 2017 [O’Callaghan et al.,2016]. The samples were from Hol-
stein Friesian cows with different parity from Irish Dairy Research Herd in Teagasc Moorepark,
Fermoy, Co. Cork. Three dietary groups were evaluated with 54 cows being assigned to each di-
etary group each year. The three diet treatments were grass (GRS) which consisted of perennial
ryegrass only, clover (CLV) which consisted of perennial ryegrass with 20% annual clover sward,
and total mixed ration (TMR) where cows were fed grass silage, maize silage and concentrates
while being maintained indoors for the full season. Milk samples were collected in the morning
(AM) and evening (PM) milking session; subsequently AM+PM samples were pooled and anal-
ysed weekly using Pro-FOSS FT6000 (FOSS). A total of 1060 transmittance data points in the
region from 925 cm−1to 5,000 cm−1were collected.
The dataset was divided into training (3275 spectra) and test (1089 spectra) data; for the
latter only spectral information was provided, while diet information, to be used as a classifi-
cation variable, was available for the training set. The training data included 1094 spectra for
GRS, 1120 spectra from CLV and 1061 spectra for TMR. There were no missing values in the
training or test set. The specific information about the wavenumbers had not been shared with
the participants.
The three dietary groups were carefully selected based on their characteristics. As described
by Frizzarin et al. [2021b], pasture-based diets are easily discriminated from TMR diets, while
2