2 of 12
the large sizes of high-performance models. Furthermore, numerous application-specific models are
needed for a device to support a matching number of applications or adapt to changes in time, locations,
and context, or users’ preferences and behaviors.
These issues can be addressed by on-demand model downloading from an AI library in the cloud to
meet a device’s real-time needs. A practically unlimited number of models can be kept in the AI library
and managed by grouping them into different categories according to service types, environment, user
preferences and requirements, and hardware specifications. Thereby, all possible needs of AI by devices
can potentially be met. The deployment of on-demand model downloading in 6G networks is still at a
nascent stage and faces two main challenges among others.
• In-time downloading. To avoid interrupting ongoing applications, downloading has to meet an
application-specific latency requirement. 6G aims to use AI to empower wide-ranging applications of
tactile communications — augmented/virtual reality (AR/VR), remote robotic controls, and auto-
navigation, to name a few [3]. Such applications demand end-to-end latency to be as low as several to
tens of milliseconds.
• Devices’ heterogeneous capacities. One aspect of heterogeneity in edge devices is their distribution
over a broad spectrum of computing-and-storage capacity with tablets and smartphones at the high-
end and passive RFID/NFC tags at the low-end. Another aspect of heterogeneity is reflected in devices’
communication capacities determined by available radio resources (e.g., array size, transmission
power, bandwidth, and coding/decoding complexity), channel states [4], and potential interference
from neighboring devices [5]. The above heterogeneity requires adaptation of the size of a model
being downloaded to the targeted device’ hardware and channel.
Other challenges include, for example, the need of local model fine-tuning, device-server cooperation,
and predictive downloading. The use of existing 5G technologies to support model downloading is
inefficient as they are not task-oriented and hence lack the desired high level of versatility and efficiency,
the native support of heterogeneity, and a guarantee on end-to-end performance.
To answer the call for developing an advanced technology to tackle the mentioned challenges, we
propose the framework of in-situ model downloading. To adapt to devices’ heterogeneous hardware
constraints and link rates, we first propose three approaches, termed depth-level, parameter-level, and
bit-level in-situ model downloading, by building on existing techniques from early exiting for inference,
model pruning, and quantization (see Section 2). The approaches enable adjustments of the number of
layers and parameters and the level of precision of a model to accommodate devices’ heterogeneous
requirements. It is possible to integrate these three approaches to generate a large-scale AI library,
which comprises high-granularity models, to support a versatile downloading service. Second, we
propose a 6G network architecture to implement in-situ model downloading with key features,
including a three-tier (edge, local, and central) AI library, cooperative network management by
operators and service providers, task-oriented communications, and mobile architecture for
transparent downloading (see Section 3). Third, we conduct experiments to quantify 6G connectivity
requirements for realizing in-situ model downloading (see Section 4). Last, we conclude with a
discussion on other research challenges for the new technology and potential solutions (see Section 5).
2. Techniques for In-situ Model Downloading
As mentioned earlier, the large population of edge devices exhibits a high level of heterogeneity in
different dimensions. This requires flexible methods for an in-situ model generation to accommodate
the heterogeneity and channel adaptive downloading to cope with time variations of link rates. In the