
  AI and Ethics
1 3
from each iteration are used in the design and control of the 
next iteration cycle—referred to as ‘phases’. To reduce and 
overcome the complexity of the iterative methodological 
process, and to progress towards a better understanding of 
the outcome of each iteration, a variety of complimentary 
but different techniques are used. For example, the article 
intersects methodologies from engineering and computer 
sciences, to resolve future concerns with autonomous pro-
cessing and analysing of real-time data from edge devices.
3   Design forautonomous AI—AutoAI
The design consists of 4 Phases. In Phase 1 automatic prepa-
ration and ingestion of raw OSINT data is synthesised for 
the construction of training scenarios for automated feature 
engineering, and to teach AI how to categorise and use (ana-
lyse) new and emerging forms of OSINT data. In Phase 2 
the domain knowledge is applied to extract features from 
raw data. Feature is considered as valid if the attributes or 
properties of the feature are useful or if the characteristics 
are helpful to the model. For the automation of feature engi-
neering two approaches are considered (1) multi-relational 
decision tree learning, which is a supervised algorithm based 
on decision tree and (2) Deep Feature Synthesis which is 
available as an open-source library named Featuretools.1    In 
Phase 3 selection algorithms [2] is used to identify hyper-
parameter values. The normal parameters are typically opti-
mised during training, but hyperparameters are generally 
optimised manually and this task is associated with a model 
designer. The automation scenario design with start with 
building upon the knowledge from biological behaviours. 
The particle swarm optimisation and evolutionary algo-
rithms can be used, both of which derive from biological 
behaviours. The particle swarm optimisation emerges from 
studies on biological communities’ interactions in individual 
and social levels and evolutionary algorithms emerge from 
studies on biological evolution. Second, the scenario design 
can apply Bayesian optimisation, which is the most used 
method for hyperparameter optimisation [3].
The combination of these methodological approaches 
is considered for automatic hyperparameter optimisation, 
but the concern is that edge devices are characterised by a 
large number of data points, and new and emerging forms 
of data are characterised by large configuration space and 
dimensionality. These factors in combination could create 
a longer than adequate time requirement for finding the 
optimal hyperparameters. Alternative method is a com-
bined algorithm selection and hyperparameter optimisa-
tion. The method selection includes testing for the most 
effective approach, starting from Bayesian optimisation, 
Bandit Search, Evolutionary Algorithms, Hierarchical Task 
Networks, Probabilistic Matrix Factorisation, Reinforce-
ment Learning and Monte Carlo Tree Search. In Phase 4 
an automated pipeline optimisation is designed, comparable 
to the Tree-based Pipeline Optimization Tool (TPOT) [4] 
but for autonomous optimising feature pre-processors for 
maximising classification accuracy on a unsupervised clas-
sification task. The above analysis of current AI algorithms 
is designed for low cost devices that contain substantively 
larger memory than IoT sensors. In other words, this design 
could work for a ‘Raspberry Pi’ device, but it won’t work 
for a low memory / low computation power sensor. Given 
this, the functionality of the proposed AutoAI needs to come 
in perspective. The proposed design could be applied in the 
metaverse, or in mobile phones, on edge devices with some 
memory and power, or in the metaverse. The design cannot 
be applied to sensors used to monitor water flow under a 
bridge, or the air pollution, or smoke detector sensors in 
the forest.
3.1   Phase 1: automated data preparation
O1: develop open access autonomous data preparation 
method (for digital healthcare data) from edge devices—
similar to the Oracle autonomous database,2 for autono-
mous ingestion of new and emerging forms of raw data, 
e.g., OSINT (big data). The first scientific milestone  (M1) 
is to build a new autonomous data preparation method that 
can serve for training an AutoAI algorithm to: (1) become 
self-driving by automating the data provisioning, tuning, 
and scaling; (2) become self-securing by automating the 
data protection and security; and (3) become self-repairing 
by automating failure detection, failover, and repairment. 
The new method design includes learning how to identify, 
map and ignore patterns of data pollution (e.g., using direct 
references to results obtained from OSINT queries) and 
become more efficient in autonomously building improved 
algorithms. To ensure the success of the autonomous data 
preparation method, a new scenario is constructed to teach 
the algorithm how adversarial systems pollute the training 
data and how to discard such data from the training scenar-
ios. While constructing the scenario, the search for training 
data expands in new and emerging forms of data (NEFD), 
e.g., open data—Open Data Institute,3 Elgin,4 DataViva5; 
1 https:// www. featu retoo ls. com/.
2 https:// www. oracle. com/ auton omous- datab ase/? fbclid= IwAR2 
Niqrm jTZ76 hj0gN a1gQU ixCLE WY4g4 tlvSc YK0fv lW6q8 HiXM- 
QXeC2A.
3 https:// theodi. org/.
4 https:// www. elgin tech. com/.
5 http:// datav iva. info/ en/.