be ideal or provide meaningful results. For instance, in radio astronomy, the field has developed a
detailed understanding of radio galaxies and how they form, yet the same abstract classes defined
through the field’s understanding in the 1970s [
4
] still persists. We therefore propose that rather than
optimising predictions of ineffective class targets, in certain scenarios it may be more beneficial to
change the target classes with the aim of developing more robust, generalisable and feature rich
models. Consequently, in this work, we propose a task to derive semantic class targets.
Sec. 2 details the proposed task and its potential consequences for the physical sciences. Sec. 3
presents the proposed method. An application of the method to radio astronomy is presented in
Sec. 4 before conclusions are drawn in Sec. 5. Code and data used in this work are available at
https://github.com/mb010/Text2Tag.
2 Task
To improve target classes in labelled data sets, we propose a multi-modal task which can be phrased
as:
Given a set of documents describing labelled data samples, return a set of natural
language terms / phrases which capture the semantic features of the labelled data
set.
For any task, the derived set of class labels should be able to:
1. Map the science targets,
2. Map the semantic features of the data,
3. Use clear (non-technical) language.
The set of targets must be able to map to the previous set of classes, as otherwise a given scientific
community will not be able to translate classifications into the historical classes that they are used
to. For example, ‘fur length’ as the class target in the supervised task of classifying cats and dogs;
although useful, it does not suffice to classify a given image back into the cat/dog scheme.
Targets which map the semantic features of the data are ideal, as populations which contain semantic
feature differences may not be captured by abstract classes. For example, a classifier could be trained
to predict features of buildings (spires, column designs, materials used, etc.) rather than architectural
styles (gothic, baroque, neoclassical, brutalist etc.). This would enable the model to generalise to
architectural styles not included in the abstract target classes, and could even be used to highlight
designs which include hybrid elements.
The benefit of clear non-technical language is the ability it provides experts in a given field to
capture, communicate, and collaborate in and around their data. If the terms map the science targets
sufficiently well, they could even replace terms reducing that community’s dependence on obtuse,
and sometimes inconsistent, definitions of technical terminology. It could also lower barriers to entry
for inter-disciplinary research, outreach, and citizen science projects.
3 Method
The methods we discuss here are based on annotations and science targets. We use the term
annotations to describe short documents which each describe a feature of a single data sample
using non-technical terminology. We use the term science targets to mean the traditional abstract
classifications (or engineered features) for each annotated data sample.
There are many possible approaches to address this task. Two simple approaches include manual
selection of plain English terms by a panel of experts, or using a large language model (LLM) for a
zero-shot approach. We expect, given appropriate experts, that manual selection via an expert panel
would be acceptable to a given community. However, expecting a panel of experts to agree on a
set of plain English class targets may not be realistic depending on the background of each expert
and/or their ability to distil abstract concepts into simple terms. Manual selection may also lack the
reproducibility and tractability that the physical sciences should demand. Using a LLM in a zero-shot
approach may work; however, it is not clear how prompts should be engineered in order to extract
2