as autonomous control of the endoscope (Gruijthuijsen et al. 2022). The knowledge
of the content area may also be useful when training and using deep learning vision
models. The regions outside of the content area contain non-informative details in the
form of noise and text overlays. Indeed, these details may, in fact, serve to bias a
model. For example, the information outside of the content area may be characteristic
of the endoscope used during the intervention. A model trained to detect the type
of intervention may learn to detect the characteristic non-content area information
present in the training data. Masking out these details could simplify the input and
remove such sources of bias. More speculatively, when training a segmentation or object
detection model, the loss function could be modified so it does not penalise predictions
made outside of the content area. In this way, the task to be learned may be simplified
as the model need not learn to correctly classify these regions. At inference time, any
activations in these regions may be discarded. Additionally, as the content area only
takes up a portion of the image, the amount of required computation can be reduced
by skipping border regions when performing inference for time-critical applications.
For any follow-up task to be able to rely on the estimated content area, a high level
of robustness must be achieved under all expected conditions. While still important,
precision is less of a concern, as a content area found to be slightly off from the true
content area will likely have little consequence on subsequent processing. To utilise
content area estimation in a real-time setting, the estimation would ideally use minimal
computing resources and processing time. Thereby leaving these resources available
for follow-up tasks.
As image sensor technology improves, and the manufacture of sensitive compact
image sensors becomes cheaper, it may become increasingly common to mount the
sensor on the end of the endoscope, known as chip-on-tip. This may remove the circular
border artefact currently mostly prominent when using proximal cameras. Should
chip-on-tip endoscopes become the only norm, estimation of the content area may
become less critical. Until such a time, it will remain important to be able to efficiently
and reliably detect the border. Several chip-on-tip endoscope manufacturer, especially
in the flexible endoscopy field, also continue to opt for an endoscope design with
incomplete content area. Finally, the ability to exploit historical endoscopic imaging
data also warrants the availability of robust content area estimation algorithms.
1.3. Challenges in content area estimation
Delineation between the border and the content area of the image is made non-trivial
by a few factors. Figure 3 shows a selection of endoscopic images demonstrating some
of these difficulties. Firstly, while the border is generally a uniform black, a fair amount
of low level noise is often observed, and imperfections in the scopes optics can result
in aberrations such as bright spots, diffuse light bleeding outside of the content area,
or imperfect circles. Secondly, the image within the content area may be adverse in
that it can have low brightness or contain within it a secondary circular oculus, such
as when the tip of the endoscope is only partially inserted through a trocar. Thirdly,
while the circular image projection is generally centred around the middle of the image,
it can in fact be significantly offset from the centre and its radius can fall within in
a large range, even passing beyond the horizontal extent of the image for much of
the image height. The spatial position and size of the circular image projection may
also be surprisingly dynamic throughout an intervention, varying due to mechanical
stresses placed through the endoscope and as the operator adjusts the zoom level on
3