One-shot Offline and Production-Scalable PID Optimisation with Deep Reinforcement Learning

2025-05-02 0 0 988.04KB 30 页 10玖币
侵权投诉
One-shot, Offline and Production-Scalable PID
Optimisation with Deep Reinforcement Learning
Zacharaya Shabkaa,
, Michael Enricob, Nick Parsonsb, Georgios Zervasa
aUniversity College London, Roberts Building, Torrington Place, London, WC1E
7JE, United Kingdom
bHuber+Suhner Polatis, 332 Cambridge Science Park, Milton Road, Cambridge, CB4
0WN, United Kingdom
Abstract
Proportional-integral-derivative (PID) control underlies more than 97% of
automated industrial processes. Controlling these processes effectively with
respect to some specified set of performance goals requires finding an optimal
set of PID parameters to moderate the PID loop. Tuning these parameters
is a long and exhaustive process. A method (patent pending) based on deep
reinforcement learning is presented that learns a relationship between generic
system properties (e.g. resonance frequency), a multi-objective performance
goal and optimal PID parameter values. Performance is demonstrated in
the context of a real optical switching product of the foremost manufacturer
of such devices globally. Switching is handled by piezoelectric actuators
where switching time and optical loss are derived from the speed and sta-
bility of actuator-control processes respectively. The method achieves a 5×
improvement in the number of actuators that fall within the most challeng-
ing target switching speed, 20% improvement in mean switching speed
at the same optical loss and 75% reduction in performance inconsistency
when temperature varies between 5Cand 73C. Furthermore, once trained
(which takes O(hours), the model generates actuator-unique PID parame-
ters in a one-shot inference process that takes O(ms) in comparison to up to
O(week) required for conventional tuning methods, therefore accomplishing
these performance improvements whilst achieving up to a 106×speed-up. Af-
ter training, the method can be applied entirely offline, incurring effectively
Corresponding Author
Email address: uceezs0@ucl.ac.uk (Zacharaya Shabka)
Preprint
arXiv:2210.13906v1 [eess.SY] 25 Oct 2022
zero optimisation-overhead in production.
Keywords: deep reinforcement learning, PID tuning, optimal control,
actuator, manufacturing, optimisation
PACS: 0000, 1111
2000 MSC: 0000, 1111
1. Introduction
Proportional-integral-derivative (PID) control remains one of the most
widely used and reliable means of implementing online system control. It
is used extensively in many industries from oil refinement to paper produc-
tion and accounts for approximately 97% of control processes in industry
[1, 2]. PID has advantages with respect to both optimisation/tuning (it
only has 3 parameters to optimise) and computation (each control iteration
only involves a few simple operations which can easily be implemented on
low-cost/high-frequency hardware such as FPGAs or ASICs). However, the
best means of optimally tuning PID parameters still undermines it’s appli-
cation. In general, tuning faces a compromise between long and exhaustive
but highly optimal methods vs. fast and efficient but non-optimal ones.
Furthermore, in real world commercial scenarios, products/devices us-
ing control loops are manufactured at scale, where the performance of the
product/devices can be at least partially if not dominantly dependent on the
performance of the closed-loop control process used. For example, the work
presented here is done in the context of piezoelectric-actuator based optical
switching devices. Current models of switches can be built with up to 768
actuators (384 ports on both the input and output plane), where each is
controlled by 2 distinct PID loops (1 per axis). This means that to optimally
control each actuator in a single switch of this size, 1536 distinct sets of PID
parameters need to be determined. These switches benefit from fast and
stable reconfiguration times and low optical loss. Both of these properties
depend strongly on the control process underlying these switching processes.
Optimising PID parameters for a large number of non-identical devices
faces three primary difficulties. Firstly, since no manufacturing process is
perfect, no two manufactured devices will be identical. The subtle but defi-
nite differences can (as will be seen in this work) have significant impact on
how well they can be controlled by the same set of PID parameters, motivat-
ing a means of having unique PID parameters for each device rather than a
2
single generic set. An efficient optimisation method would be able to exploit
this device-level information, and use it effectively to generate parameters
that are suitable for that device.
Secondly, since the number of devices manufactured can be arbitrarily
large, it is desirable to minimise the amount of time it takes to generate these
parameters to avoid significant production overhead due to optimisation.
Devices can potentially undergo a large number of possible control processes
in their lifetime. For example, in a 384 ×384 all-to-all switch where each
actuator in each plane can move from pointing towards any position in the
opposite plane, to any of the remaining 383 positions, each actuator has
147,456 possible movements it can make per axis - almost 300,000 total per
actuator. Since each actuator switches at the order of O(10ms), checking
each of these movements for a single set of PID parameters will take at least
50 minutes. When a large number of combinations in a search process is
being explored, it is clear to see how this can easily incur days of overhead.
An ideal optimisation routine would not require explicit exposure to each of
these movements in order to evaluate if a set of parameters are suitable.
Thirdly, dynamic PID loops, where PID parameters are constantly ad-
justed over the lifetime of a control process based on the closed loop response
of the system, are not suitable in the case of low-cost/high-speed electronics
like FPGAs, since they incur additional in-loop computation requirements
in order to re-calculate PID parameters. As such it is desirable to find sin-
gle set of parameters per-device that achieves good control outcomes over
it’s lifetime and with respect to potentially multiple different performance
metrics.
Direct-search based methods have dominated tuning methodologies for
many decades [3, 4]. However, these methods must be repeated each time
a set of optimal parameters is to be found (i.e. for a new device) and often
require long monitoring cycle to iterate over various parameter combinations
to evaluate performance in comparison to some set criteria. They are also
difficult to implement in the context of multiple simultaneous (and possi-
bly contending) performance goals since they are typically designed for a
particular control outcome.
To summarise - the ideal method of PID parameter optimisation should be
able to generate PID parameters such that: 1. each device has a unique set of
parameters that are optimal for that device specifically; 2. these parameters
can be generated in a timely manner, not requiring long tuning times or
extensive closed-loop operation of the device to do so; 3. these parameters are
3
determined once for each device with respect to a flexible and multi-faceted
performance requirement and are consistent in the face of environmental and
operational variability.
This paper presents a method (patent pending) based on deep reinforce-
ment learning (DRL) that implements one-step, offline and instantaneous
optimisation of PID parameters. The method is trained (O(hours)) on a
set of devices where it learns a relationship between device information (e.g.
resonance-per-axis etc), a multi-objective performance criteria and PID pa-
rameter values. After training, the method can be applied to previously
unseen devices in a one-shot and offline inference procedure (O(ms)) where
some previously measured information (e.g. during post-manufacturing char-
acterisation processes) about the device can be used to directly generate PID
parameters that are performant for that device specifically. In this way the
method incurs effectively zero optimisation overhead as optimisation time for
large numbers of devices is trivial and can be done in parallel to some other
process once device information has been measured.
Compared to a direct-search based tuning method implemented in the
production setting of a world leading optical switch manufacturer, our method
ensures that 5×more switching events are equal to or less than the most
challenging target switching time whilst improving average switching time
by 23%. The standard deviation of switching times also improves by 45%,
allowing for more consistent switching performance as well as better perfor-
mance on average. Moreover, the method is also able to achieve 3.5×greater
thermal stability across temperatures ranging from 5C to 73C. In addition
to this, the proposed method takes O(hour) to train and ≤ O(ms) to gen-
erate new unique parameters for previously unseen actuators. By contrast,
manual (direct search) tuning takes O(week) to calculate a single set of con-
trol parameters for a given actuator and must be re-run if it is to be used on a
per-device basis; otherwise using generic parameters leads to (as seen in sec-
tion 6) much more inconsistent performance. The proposed method is able
to achieve a 106×speed up when generating device-specific PID parameters
that achieve better all around multi-objective control-performance.
2. Related Work
Classical PID tuning methods have historically been based on a cost-
function driven search process [3, 5, 6]. These processes are capable of pro-
ducing high quality parameters. However, such methods often rely on having
4
reliable system models, which is often not possible. They are also slow, re-
quiring a large number of iterations before they find good parameters. This
may be acceptable for one-off optimisation processes, but it is prohibitively
slow and costly when large numbers of systems have to be individually opti-
mised, where longer production-time per-system incurs additional cost. Fi-
nally, these methods are often designed to handle only single performance
objectives and their application becomes more complex when multiple, pos-
sibly conflicting, objectives are to be simultaneously handled.
More recent optimisation techniques that can handle multi-objective cri-
teria automatically without requiring exhaustive search have been presented
as promising PID auto-tuning techniques. Evolutionary/swarm optimisation
techniques such as particle swarm or genetic algorithms have been applied to
various formulations of the PID tuning problem [7, 8, 9], since they are com-
putationally more efficient than direct search, have good convergence prop-
erties and can handle multi-objective optimisation criteria flexibly. However,
one fundamental issue with such meta-heuristic algorithms is that the full
optimisation process has to be implemented every time a set of parameters
is to be found, meaning it is not appropriate when minimising optimisation
time is desirable and a large number of distinct devices/systems are to be
optimised.
Another general short-coming that applies to all of the methods men-
tioned above, is the lack of generlisability in the optimisation process. Con-
sider the case when PID parameters need to be found for many devices
which are similar (e.g. the devices are the same model of device but are
distinguished by inevitable manufacturing imperfections). In this case, it
can be reasonably expected that PID parameters would be similar, and that
sufficient exposure to a large number of such devices should be able to be
exploited in order to more efficiently find parameters for such devices. This
premise is not accounted for in the above methods, wich instead must be
re-run for each application.
DRL has emerged as another promising means of system control. It
has been demonstrated to be able to learn very complex control/operational
policies that can yield superior results compared to top human performers in
considerably complex and uncertain environments [10, 11]. While DRL can in
principle be used to control a system directly, replacing PID loops altogether,
the decision making process involves a forward pass through a neural network
- a process that generally requires expensive and power-hungry hardware such
as GPUs - rendering it often inappropriate for the kind of mass-production
5
摘要:

One-shot,OineandProduction-ScalablePIDOptimisationwithDeepReinforcementLearningZacharayaShabkaa,,MichaelEnricob,NickParsonsb,GeorgiosZervasaaUniversityCollegeLondon,RobertsBuilding,TorringtonPlace,London,WC1E7JE,UnitedKingdombHuber+SuhnerPolatis,332CambridgeSciencePark,MiltonRoad,Cambridge,CB40WN,...

展开>> 收起<<
One-shot Offline and Production-Scalable PID Optimisation with Deep Reinforcement Learning.pdf

共30页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:30 页 大小:988.04KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 30
客服
关注