1 of 12 In-situ Model Downloading to Realize Versatile Edge AI in 6G Mobile Networks Kaibin Huang Hai Wu Zhiyan Liu and Xiaojuan Qi1

2025-04-28 0 0 1.39MB 12 页 10玖币

侵权投诉

1 of 12

In-situ Model Downloading to Realize Versatile

Edge AI in 6G Mobile Networks

Kaibin Huang, Hai Wu, Zhiyan Liu, and Xiaojuan Qi1

Abstract

The sixth-generation (6G) mobile networks are expected to feature the ubiquitous deployment of

machine learning and AI algorithms at the network edge. With rapid advancements in edge AI, the time

has come to realize intelligence downloading onto edge devices (e.g., smartphones and sensors). To

materialize this version, we propose a novel technology in this article, called in-situ model downloading,

that aims to achieve transparent and real-time replacement of on-device AI models by downloading

from an AI library in the network. Its distinctive feature is the adaptation of downloading to time-

varying situations (e.g., application, location, and time), devices' heterogeneous storage-and-computing

capacities, and channel states. A key component of the presented framework is a set of techniques that

dynamically compress a downloaded model at the depth-level, parameter-level, or bit-level to support

adaptive model downloading. We further propose a virtualized 6G network architecture customized for

deploying in-situ model downloading with the key feature of a three-tier (edge, local, and central) AI

library. Furthermore, experiments are conducted to quantify 6G connectivity requirements and

research opportunities pertaining to the proposed technology are discussed.

1. Introduction

In the 1999 Hollywood blockbuster movie "Matrix", the protagonist "Neo" acquires superhuman

capabilities (e.g., becoming a Judo master or dodging bullets) for his avatar in a virtual world by

downloading programs from servers to his brain over wire. Though realizing such intelligence

downloading to humans does not seem possible in the near future, the time for artificial intelligence (AI)

downloading to edge devices (e.g., smartphones and sensors) has come. In fact, AI model downloading

is one use case of edge AI being discussed for the standard of the sixth-generation (6G) mobile networks

[1]. In the article, we propose a novel technology, called in-situ model downloading, that aims to achieve

transparent and real-time replacement of on-device AI models by downloading from an AI library in the

network. Compared with a traditional approach, its distinctive feature is the adaptation of downloading

to time-varying situations (e.g., application, location, and time), devices' heterogeneous storage-and-

computing capacities, and channel states.

Being AI native, 6G is expected to feature the ubiquitous deployment of machine learning and AI

algorithms at the network edge, which are collectively known as edge AI [2]. AI models with relatively

small sizes can be completely downloaded onto devices to enjoy the advantages of better data security,

faster decision time, context and location awareness, and a lighter burden on uplinks. This is supported

by the latest generation of mobile chips designed by leading semiconductor companies, for example,

Qualcomm, Apple, Samsung, Huawei, and NVIDIA. They share the common feature of comprising

powerful graphics processing units (GPUs) or other AI acceleration hardware to support the training

and execution of AI models. The hardware will endow edge devices capabilities in natural language

processing, image recognition, and video content analysis, which lays a platform for implementing

intelligent IoT applications. Nevertheless, the sole reliance on mobile hardware to implement on-device

AI is confronted by the conflicts between the hardware’s limited storage and computation resources and

1 The authors are with Dept. of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong.

Contact: K. Huang (email: huangkb@eee.hku.hk).

2 of 12

the large sizes of high-performance models. Furthermore, numerous application-specific models are

needed for a device to support a matching number of applications or adapt to changes in time, locations,

and context, or users’ preferences and behaviors.

These issues can be addressed by on-demand model downloading from an AI library in the cloud to

meet a device’s real-time needs. A practically unlimited number of models can be kept in the AI library

and managed by grouping them into different categories according to service types, environment, user

preferences and requirements, and hardware specifications. Thereby, all possible needs of AI by devices

can potentially be met. The deployment of on-demand model downloading in 6G networks is still at a

nascent stage and faces two main challenges among others.

• In-time downloading. To avoid interrupting ongoing applications, downloading has to meet an

application-specific latency requirement. 6G aims to use AI to empower wide-ranging applications of

tactile communications — augmented/virtual reality (AR/VR), remote robotic controls, and auto-

navigation, to name a few [3]. Such applications demand end-to-end latency to be as low as several to

tens of milliseconds.

• Devices’ heterogeneous capacities. One aspect of heterogeneity in edge devices is their distribution

over a broad spectrum of computing-and-storage capacity with tablets and smartphones at the high-

end and passive RFID/NFC tags at the low-end. Another aspect of heterogeneity is reflected in devices’

communication capacities determined by available radio resources (e.g., array size, transmission

power, bandwidth, and coding/decoding complexity), channel states [4], and potential interference

from neighboring devices [5]. The above heterogeneity requires adaptation of the size of a model

being downloaded to the targeted device’ hardware and channel.

Other challenges include, for example, the need of local model fine-tuning, device-server cooperation,

and predictive downloading. The use of existing 5G technologies to support model downloading is

inefficient as they are not task-oriented and hence lack the desired high level of versatility and efficiency,

the native support of heterogeneity, and a guarantee on end-to-end performance.

To answer the call for developing an advanced technology to tackle the mentioned challenges, we

propose the framework of in-situ model downloading. To adapt to devices’ heterogeneous hardware

constraints and link rates, we first propose three approaches, termed depth-level, parameter-level, and

bit-level in-situ model downloading, by building on existing techniques from early exiting for inference,

model pruning, and quantization (see Section 2). The approaches enable adjustments of the number of

layers and parameters and the level of precision of a model to accommodate devices’ heterogeneous

requirements. It is possible to integrate these three approaches to generate a large-scale AI library,

which comprises high-granularity models, to support a versatile downloading service. Second, we

propose a 6G network architecture to implement in-situ model downloading with key features,

including a three-tier (edge, local, and central) AI library, cooperative network management by

operators and service providers, task-oriented communications, and mobile architecture for

transparent downloading (see Section 3). Third, we conduct experiments to quantify 6G connectivity

requirements for realizing in-situ model downloading (see Section 4). Last, we conclude with a

discussion on other research challenges for the new technology and potential solutions (see Section 5).

2. Techniques for In-situ Model Downloading

As mentioned earlier, the large population of edge devices exhibits a high level of heterogeneity in

different dimensions. This requires flexible methods for an in-situ model generation to accommodate

the heterogeneity and channel adaptive downloading to cope with time variations of link rates. In the

3 of 12

following sub-sections, we build on diversified techniques from the areas of split inference, early exiting,

and model compression to propose three techniques, namely depth-level, parameter-level, and bit-level

in-situ downloading. Though in different ways, they all enable a mobile model to be generated in real-

time (or retrieved from a pre-generated AI library) and downloaded onto a device based on its feedback

of device situation information (DSI), which specifies its capacities, hardware types, channel states,

location, and accuracy-and-latency requirements. It is even possible to integrate the three techniques to

achieve high granularity in model generation and downloading as illustrated in Fig. 1. Through adopting

the three-level compression, the processed in-situ model downloading promises to meet the real-time

requirements of edge intelligence applications on inference speed, energy consumption, prediction

accuracy, etc.

2.1 Depth-Level In-situ Model Downloading

Split inference refers to the class of techniques that divides a root deep neural network model into the

first and the second halves for deployment at a device and a server, respectively [3]. This requires

uploading of features output by the former, called a local model, for feeding into the server model for

generating the inference result. This leverages server computation resources to alleviate the device’s

computation load while preserving its data privacy. The splitting point can be adjusted for the purpose

of load balancing [6]. In the context of in-situ model downloading as illustrated in Fig. 1, we propose to

adapt the splitting point, or equivalently the depth of the mobile model being downloaded, to the

device’s DSI feedback for real-time model generation. Alternatively, sweeping the splitting point across

a root model generates a set of mobile models with complementary server models in the AI library. The

advantages of the depth-level method are threefold. First, its support of device-server cooperation

implies that even when the device’s hardware or radio resources are limited such that the downloaded

model is small, a high inference accuracy can be achieved with server assistance. Second, the depth-level

model downloading makes it easy to implement channel adaptive transmission, where the number of

downloaded layers is adapted to the downlink rate, or layer-by-layer progressive transmission. Third,

the uploading of features offers benefits for protecting data privacy at a level that increases as the

number of downloaded layers grows.

There exists a downlink-uplink tradeoff for the proposed depth-level technique. This results from the

well-known fact that for many popular models (e.g., auto-encoder, VGGNet, MobileNet, and the latest

Fig. 1. The integration of the techniques of bit-level, parameter-level, depth-level in-situ model downloading to enable high-

granularity real-time sub-model generation or AI library construction.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1of12In-situModelDownloadingtoRealizeVersatileEdgeAIin6GMobileNetworksKaibinHuang,HaiWu,ZhiyanLiu,andXiaojuanQi1AbstractThesixth-generation(6G)mobilenetworksareexpectedtofeaturetheubiquitousdeploymentofmachinelearningandAIalgorithmsatthenetworkedge.WithrapidadvancementsinedgeAI,thetimehascometoreali...

展开>> 收起<<

1 of 12 In-situ Model Downloading to Realize Versatile Edge AI in 6G Mobile Networks Kaibin Huang Hai Wu Zhiyan Liu and Xiaojuan Qi1.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 of 12 In-situ Model Downloading to Realize Versatile Edge AI in 6G Mobile Networks Kaibin Huang Hai Wu Zhiyan Liu and Xiaojuan Qi1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: