One-shot Offline and Production-Scalable PID Optimisation with Deep Reinforcement Learning

2025-05-02 0 0 988.04KB 30 页 10玖币

侵权投诉

One-shot, Oﬄine and Production-Scalable PID

Optimisation with Deep Reinforcement Learning

Zacharaya Shabkaa,∗

, Michael Enricob, Nick Parsonsb, Georgios Zervasa

aUniversity College London, Roberts Building, Torrington Place, London, WC1E

7JE, United Kingdom

bHuber+Suhner Polatis, 332 Cambridge Science Park, Milton Road, Cambridge, CB4

0WN, United Kingdom

Abstract

Proportional-integral-derivative (PID) control underlies more than 97% of

automated industrial processes. Controlling these processes eﬀectively with

respect to some speciﬁed set of performance goals requires ﬁnding an optimal

set of PID parameters to moderate the PID loop. Tuning these parameters

is a long and exhaustive process. A method (patent pending) based on deep

reinforcement learning is presented that learns a relationship between generic

system properties (e.g. resonance frequency), a multi-objective performance

goal and optimal PID parameter values. Performance is demonstrated in

the context of a real optical switching product of the foremost manufacturer

of such devices globally. Switching is handled by piezoelectric actuators

where switching time and optical loss are derived from the speed and sta-

bility of actuator-control processes respectively. The method achieves a 5×

improvement in the number of actuators that fall within the most challeng-

ing target switching speed, ≥20% improvement in mean switching speed

at the same optical loss and ≥75% reduction in performance inconsistency

when temperature varies between 5◦Cand 73◦C. Furthermore, once trained

(which takes O(hours), the model generates actuator-unique PID parame-

ters in a one-shot inference process that takes O(ms) in comparison to up to

O(week) required for conventional tuning methods, therefore accomplishing

these performance improvements whilst achieving up to a 106×speed-up. Af-

ter training, the method can be applied entirely oﬄine, incurring eﬀectively

∗Corresponding Author

Email address: uceezs0@ucl.ac.uk (Zacharaya Shabka)

Preprint

arXiv:2210.13906v1 [eess.SY] 25 Oct 2022

zero optimisation-overhead in production.

Keywords: deep reinforcement learning, PID tuning, optimal control,

actuator, manufacturing, optimisation

PACS: 0000, 1111

2000 MSC: 0000, 1111

1. Introduction

Proportional-integral-derivative (PID) control remains one of the most

widely used and reliable means of implementing online system control. It

is used extensively in many industries from oil reﬁnement to paper produc-

tion and accounts for approximately 97% of control processes in industry

[1, 2]. PID has advantages with respect to both optimisation/tuning (it

only has 3 parameters to optimise) and computation (each control iteration

only involves a few simple operations which can easily be implemented on

low-cost/high-frequency hardware such as FPGAs or ASICs). However, the

best means of optimally tuning PID parameters still undermines it’s appli-

cation. In general, tuning faces a compromise between long and exhaustive

but highly optimal methods vs. fast and eﬃcient but non-optimal ones.

Furthermore, in real world commercial scenarios, products/devices us-

ing control loops are manufactured at scale, where the performance of the

product/devices can be at least partially if not dominantly dependent on the

performance of the closed-loop control process used. For example, the work

presented here is done in the context of piezoelectric-actuator based optical

switching devices. Current models of switches can be built with up to 768

actuators (384 ports on both the input and output plane), where each is

controlled by 2 distinct PID loops (1 per axis). This means that to optimally

control each actuator in a single switch of this size, 1536 distinct sets of PID

parameters need to be determined. These switches beneﬁt from fast and

stable reconﬁguration times and low optical loss. Both of these properties

depend strongly on the control process underlying these switching processes.

Optimising PID parameters for a large number of non-identical devices

faces three primary diﬃculties. Firstly, since no manufacturing process is

perfect, no two manufactured devices will be identical. The subtle but deﬁ-

nite diﬀerences can (as will be seen in this work) have signiﬁcant impact on

how well they can be controlled by the same set of PID parameters, motivat-

ing a means of having unique PID parameters for each device rather than a

single generic set. An eﬃcient optimisation method would be able to exploit

this device-level information, and use it eﬀectively to generate parameters

that are suitable for that device.

Secondly, since the number of devices manufactured can be arbitrarily

large, it is desirable to minimise the amount of time it takes to generate these

parameters to avoid signiﬁcant production overhead due to optimisation.

Devices can potentially undergo a large number of possible control processes

in their lifetime. For example, in a 384 ×384 all-to-all switch where each

actuator in each plane can move from pointing towards any position in the

opposite plane, to any of the remaining 383 positions, each actuator has

147,456 possible movements it can make per axis - almost 300,000 total per

actuator. Since each actuator switches at the order of O(10ms), checking

each of these movements for a single set of PID parameters will take at least

50 minutes. When a large number of combinations in a search process is

being explored, it is clear to see how this can easily incur days of overhead.

An ideal optimisation routine would not require explicit exposure to each of

these movements in order to evaluate if a set of parameters are suitable.

Thirdly, dynamic PID loops, where PID parameters are constantly ad-

justed over the lifetime of a control process based on the closed loop response

of the system, are not suitable in the case of low-cost/high-speed electronics

like FPGAs, since they incur additional in-loop computation requirements

in order to re-calculate PID parameters. As such it is desirable to ﬁnd sin-

gle set of parameters per-device that achieves good control outcomes over

it’s lifetime and with respect to potentially multiple diﬀerent performance

metrics.

Direct-search based methods have dominated tuning methodologies for

many decades [3, 4]. However, these methods must be repeated each time

a set of optimal parameters is to be found (i.e. for a new device) and often

require long monitoring cycle to iterate over various parameter combinations

to evaluate performance in comparison to some set criteria. They are also

diﬃcult to implement in the context of multiple simultaneous (and possi-

bly contending) performance goals since they are typically designed for a

particular control outcome.

To summarise - the ideal method of PID parameter optimisation should be

able to generate PID parameters such that: 1. each device has a unique set of

parameters that are optimal for that device speciﬁcally; 2. these parameters

can be generated in a timely manner, not requiring long tuning times or

extensive closed-loop operation of the device to do so; 3. these parameters are

determined once for each device with respect to a ﬂexible and multi-faceted

performance requirement and are consistent in the face of environmental and

operational variability.

This paper presents a method (patent pending) based on deep reinforce-

ment learning (DRL) that implements one-step, oﬄine and instantaneous

optimisation of PID parameters. The method is trained (O(hours)) on a

set of devices where it learns a relationship between device information (e.g.

resonance-per-axis etc), a multi-objective performance criteria and PID pa-

rameter values. After training, the method can be applied to previously

unseen devices in a one-shot and oﬄine inference procedure (O(ms)) where

some previously measured information (e.g. during post-manufacturing char-

acterisation processes) about the device can be used to directly generate PID

parameters that are performant for that device speciﬁcally. In this way the

method incurs eﬀectively zero optimisation overhead as optimisation time for

large numbers of devices is trivial and can be done in parallel to some other

process once device information has been measured.

Compared to a direct-search based tuning method implemented in the

production setting of a world leading optical switch manufacturer, our method

ensures that 5×more switching events are equal to or less than the most

challenging target switching time whilst improving average switching time

by 23%. The standard deviation of switching times also improves by 45%,

allowing for more consistent switching performance as well as better perfor-

mance on average. Moreover, the method is also able to achieve 3.5×greater

thermal stability across temperatures ranging from 5◦C to 73◦C. In addition

to this, the proposed method takes O(hour) to train and ≤ O(ms) to gen-

erate new unique parameters for previously unseen actuators. By contrast,

manual (direct search) tuning takes O(week) to calculate a single set of con-

trol parameters for a given actuator and must be re-run if it is to be used on a

per-device basis; otherwise using generic parameters leads to (as seen in sec-

tion 6) much more inconsistent performance. The proposed method is able

to achieve a 106×speed up when generating device-speciﬁc PID parameters

that achieve better all around multi-objective control-performance.

2. Related Work

Classical PID tuning methods have historically been based on a cost-

function driven search process [3, 5, 6]. These processes are capable of pro-

ducing high quality parameters. However, such methods often rely on having

reliable system models, which is often not possible. They are also slow, re-

quiring a large number of iterations before they ﬁnd good parameters. This

may be acceptable for one-oﬀ optimisation processes, but it is prohibitively

slow and costly when large numbers of systems have to be individually opti-

mised, where longer production-time per-system incurs additional cost. Fi-

nally, these methods are often designed to handle only single performance

objectives and their application becomes more complex when multiple, pos-

sibly conﬂicting, objectives are to be simultaneously handled.

More recent optimisation techniques that can handle multi-objective cri-

teria automatically without requiring exhaustive search have been presented

as promising PID auto-tuning techniques. Evolutionary/swarm optimisation

techniques such as particle swarm or genetic algorithms have been applied to

various formulations of the PID tuning problem [7, 8, 9], since they are com-

putationally more eﬃcient than direct search, have good convergence prop-

erties and can handle multi-objective optimisation criteria ﬂexibly. However,

one fundamental issue with such meta-heuristic algorithms is that the full

optimisation process has to be implemented every time a set of parameters

is to be found, meaning it is not appropriate when minimising optimisation

time is desirable and a large number of distinct devices/systems are to be

optimised.

Another general short-coming that applies to all of the methods men-

tioned above, is the lack of generlisability in the optimisation process. Con-

sider the case when PID parameters need to be found for many devices

which are similar (e.g. the devices are the same model of device but are

distinguished by inevitable manufacturing imperfections). In this case, it

can be reasonably expected that PID parameters would be similar, and that

suﬃcient exposure to a large number of such devices should be able to be

exploited in order to more eﬃciently ﬁnd parameters for such devices. This

premise is not accounted for in the above methods, wich instead must be

re-run for each application.

DRL has emerged as another promising means of system control. It

has been demonstrated to be able to learn very complex control/operational

policies that can yield superior results compared to top human performers in

considerably complex and uncertain environments [10, 11]. While DRL can in

principle be used to control a system directly, replacing PID loops altogether,

the decision making process involves a forward pass through a neural network

- a process that generally requires expensive and power-hungry hardware such

as GPUs - rendering it often inappropriate for the kind of mass-production

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

One-shot,OineandProduction-ScalablePIDOptimisationwithDeepReinforcementLearningZacharayaShabkaa,,MichaelEnricob,NickParsonsb,GeorgiosZervasaaUniversityCollegeLondon,RobertsBuilding,TorringtonPlace,London,WC1E7JE,UnitedKingdombHuber+SuhnerPolatis,332CambridgeSciencePark,MiltonRoad,Cambridge,CB40WN,...

展开>> 收起<<

One-shot Offline and Production-Scalable PID Optimisation with Deep Reinforcement Learning.pdf

共30页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

One-shot Offline and Production-Scalable PID Optimisation with Deep Reinforcement Learning

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: