1 of 12 In-situ Model Downloading to Realize Versatile Edge AI in 6G Mobile Networks Kaibin Huang Hai Wu Zhiyan Liu and Xiaojuan Qi1

2025-04-28 0 0 1.39MB 12 页 10玖币
侵权投诉
1 of 12
In-situ Model Downloading to Realize Versatile
Edge AI in 6G Mobile Networks
Kaibin Huang, Hai Wu, Zhiyan Liu, and Xiaojuan Qi1
Abstract
The sixth-generation (6G) mobile networks are expected to feature the ubiquitous deployment of
machine learning and AI algorithms at the network edge. With rapid advancements in edge AI, the time
has come to realize intelligence downloading onto edge devices (e.g., smartphones and sensors). To
materialize this version, we propose a novel technology in this article, called in-situ model downloading,
that aims to achieve transparent and real-time replacement of on-device AI models by downloading
from an AI library in the network. Its distinctive feature is the adaptation of downloading to time-
varying situations (e.g., application, location, and time), devices' heterogeneous storage-and-computing
capacities, and channel states. A key component of the presented framework is a set of techniques that
dynamically compress a downloaded model at the depth-level, parameter-level, or bit-level to support
adaptive model downloading. We further propose a virtualized 6G network architecture customized for
deploying in-situ model downloading with the key feature of a three-tier (edge, local, and central) AI
library. Furthermore, experiments are conducted to quantify 6G connectivity requirements and
research opportunities pertaining to the proposed technology are discussed.
1. Introduction
In the 1999 Hollywood blockbuster movie "Matrix", the protagonist "Neo" acquires superhuman
capabilities (e.g., becoming a Judo master or dodging bullets) for his avatar in a virtual world by
downloading programs from servers to his brain over wire. Though realizing such intelligence
downloading to humans does not seem possible in the near future, the time for artificial intelligence (AI)
downloading to edge devices (e.g., smartphones and sensors) has come. In fact, AI model downloading
is one use case of edge AI being discussed for the standard of the sixth-generation (6G) mobile networks
[1]. In the article, we propose a novel technology, called in-situ model downloading, that aims to achieve
transparent and real-time replacement of on-device AI models by downloading from an AI library in the
network. Compared with a traditional approach, its distinctive feature is the adaptation of downloading
to time-varying situations (e.g., application, location, and time), devices' heterogeneous storage-and-
computing capacities, and channel states.
Being AI native, 6G is expected to feature the ubiquitous deployment of machine learning and AI
algorithms at the network edge, which are collectively known as edge AI [2]. AI models with relatively
small sizes can be completely downloaded onto devices to enjoy the advantages of better data security,
faster decision time, context and location awareness, and a lighter burden on uplinks. This is supported
by the latest generation of mobile chips designed by leading semiconductor companies, for example,
Qualcomm, Apple, Samsung, Huawei, and NVIDIA. They share the common feature of comprising
powerful graphics processing units (GPUs) or other AI acceleration hardware to support the training
and execution of AI models. The hardware will endow edge devices capabilities in natural language
processing, image recognition, and video content analysis, which lays a platform for implementing
intelligent IoT applications. Nevertheless, the sole reliance on mobile hardware to implement on-device
AI is confronted by the conflicts between the hardware’s limited storage and computation resources and
1 The authors are with Dept. of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong.
Contact: K. Huang (email: huangkb@eee.hku.hk).
2 of 12
the large sizes of high-performance models. Furthermore, numerous application-specific models are
needed for a device to support a matching number of applications or adapt to changes in time, locations,
and context, or users’ preferences and behaviors.
These issues can be addressed by on-demand model downloading from an AI library in the cloud to
meet a device’s real-time needs. A practically unlimited number of models can be kept in the AI library
and managed by grouping them into different categories according to service types, environment, user
preferences and requirements, and hardware specifications. Thereby, all possible needs of AI by devices
can potentially be met. The deployment of on-demand model downloading in 6G networks is still at a
nascent stage and faces two main challenges among others.
In-time downloading. To avoid interrupting ongoing applications, downloading has to meet an
application-specific latency requirement. 6G aims to use AI to empower wide-ranging applications of
tactile communications augmented/virtual reality (AR/VR), remote robotic controls, and auto-
navigation, to name a few [3]. Such applications demand end-to-end latency to be as low as several to
tens of milliseconds.
Devices’ heterogeneous capacities. One aspect of heterogeneity in edge devices is their distribution
over a broad spectrum of computing-and-storage capacity with tablets and smartphones at the high-
end and passive RFID/NFC tags at the low-end. Another aspect of heterogeneity is reflected in devices’
communication capacities determined by available radio resources (e.g., array size, transmission
power, bandwidth, and coding/decoding complexity), channel states [4], and potential interference
from neighboring devices [5]. The above heterogeneity requires adaptation of the size of a model
being downloaded to the targeted device’ hardware and channel.
Other challenges include, for example, the need of local model fine-tuning, device-server cooperation,
and predictive downloading. The use of existing 5G technologies to support model downloading is
inefficient as they are not task-oriented and hence lack the desired high level of versatility and efficiency,
the native support of heterogeneity, and a guarantee on end-to-end performance.
To answer the call for developing an advanced technology to tackle the mentioned challenges, we
propose the framework of in-situ model downloading. To adapt to devices’ heterogeneous hardware
constraints and link rates, we first propose three approaches, termed depth-level, parameter-level, and
bit-level in-situ model downloading, by building on existing techniques from early exiting for inference,
model pruning, and quantization (see Section 2). The approaches enable adjustments of the number of
layers and parameters and the level of precision of a model to accommodate devices’ heterogeneous
requirements. It is possible to integrate these three approaches to generate a large-scale AI library,
which comprises high-granularity models, to support a versatile downloading service. Second, we
propose a 6G network architecture to implement in-situ model downloading with key features,
including a three-tier (edge, local, and central) AI library, cooperative network management by
operators and service providers, task-oriented communications, and mobile architecture for
transparent downloading (see Section 3). Third, we conduct experiments to quantify 6G connectivity
requirements for realizing in-situ model downloading (see Section 4). Last, we conclude with a
discussion on other research challenges for the new technology and potential solutions (see Section 5).
2. Techniques for In-situ Model Downloading
As mentioned earlier, the large population of edge devices exhibits a high level of heterogeneity in
different dimensions. This requires flexible methods for an in-situ model generation to accommodate
the heterogeneity and channel adaptive downloading to cope with time variations of link rates. In the
3 of 12
following sub-sections, we build on diversified techniques from the areas of split inference, early exiting,
and model compression to propose three techniques, namely depth-level, parameter-level, and bit-level
in-situ downloading. Though in different ways, they all enable a mobile model to be generated in real-
time (or retrieved from a pre-generated AI library) and downloaded onto a device based on its feedback
of device situation information (DSI), which specifies its capacities, hardware types, channel states,
location, and accuracy-and-latency requirements. It is even possible to integrate the three techniques to
achieve high granularity in model generation and downloading as illustrated in Fig. 1. Through adopting
the three-level compression, the processed in-situ model downloading promises to meet the real-time
requirements of edge intelligence applications on inference speed, energy consumption, prediction
accuracy, etc.
2.1 Depth-Level In-situ Model Downloading
Split inference refers to the class of techniques that divides a root deep neural network model into the
first and the second halves for deployment at a device and a server, respectively [3]. This requires
uploading of features output by the former, called a local model, for feeding into the server model for
generating the inference result. This leverages server computation resources to alleviate the device’s
computation load while preserving its data privacy. The splitting point can be adjusted for the purpose
of load balancing [6]. In the context of in-situ model downloading as illustrated in Fig. 1, we propose to
adapt the splitting point, or equivalently the depth of the mobile model being downloaded, to the
device’s DSI feedback for real-time model generation. Alternatively, sweeping the splitting point across
a root model generates a set of mobile models with complementary server models in the AI library. The
advantages of the depth-level method are threefold. First, its support of device-server cooperation
implies that even when the device’s hardware or radio resources are limited such that the downloaded
model is small, a high inference accuracy can be achieved with server assistance. Second, the depth-level
model downloading makes it easy to implement channel adaptive transmission, where the number of
downloaded layers is adapted to the downlink rate, or layer-by-layer progressive transmission. Third,
the uploading of features offers benefits for protecting data privacy at a level that increases as the
number of downloaded layers grows.
There exists a downlink-uplink tradeoff for the proposed depth-level technique. This results from the
well-known fact that for many popular models (e.g., auto-encoder, VGGNet, MobileNet, and the latest
Fig. 1. The integration of the techniques of bit-level, parameter-level, depth-level in-situ model downloading to enable high-
granularity real-time sub-model generation or AI library construction.
摘要:

1of12In-situModelDownloadingtoRealizeVersatileEdgeAIin6GMobileNetworksKaibinHuang,HaiWu,ZhiyanLiu,andXiaojuanQi1AbstractThesixth-generation(6G)mobilenetworksareexpectedtofeaturetheubiquitousdeploymentofmachinelearningandAIalgorithmsatthenetworkedge.WithrapidadvancementsinedgeAI,thetimehascometoreali...

展开>> 收起<<
1 of 12 In-situ Model Downloading to Realize Versatile Edge AI in 6G Mobile Networks Kaibin Huang Hai Wu Zhiyan Liu and Xiaojuan Qi1.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:1.39MB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注