Evolution of adaptive learning for nonlinear dynamic systems: a systematic survey

Mouhcine Harib; Hicham Chaoui; Suruz Miah

doi:10.20517/ir.2021.19

Intelligence & Robotics ›› 2022, Vol. 2 ›› Issue (1) :37 -71. DOI: 10.20517/ir.2021.19

Review

Evolution of adaptive learning for nonlinear dynamic systems: a systematic survey

Author information +

History +

PDF

Abstract

The extreme nonlinearity of robotic systems renders the control design step harder. The consideration of adaptive control in robotic manipulation started in the 1970s. However, in the presence of bounded disturbances, the limitations of adaptive control rise considerably, which led researchers to exploit some “algorithm modifications”. Unfortunately, these modifications often require a priori knowledge of bounds on the parameters and the perturbations and noise. In the 1990s, the field of Artificial Neural Networks was hugely investigated in general, and for control of dynamical systems in particular. Several types of Neural Networks (NNs) appear to be promising candidates for control system applications. In robotics, it all boils down to making the actuator perform the desired action. While purely control-based robots use the system model to define their input-output relations, Artificial Intelligence (AI)-based robots may or may not use the system model and rather manipulate the robot based on the experience they have with the system while training or possibly enhance it in real-time as well. In this paper, after discussing the drawbacks of adaptive control with bounded disturbances and the proposed modifications to overcome these limitations, we focus on presenting the work that implemented AI in nonlinear dynamical systems and particularly in robotics. We cite some work that targeted the inverted pendulum control problem using NNs. Finally, we emphasize the previous research concerning RL and Deep RL-based control problems and their implementation in robotics manipulation, while highlighting some of their major drawbacks in the field.

Keywords

Adaptive control / deep reinforcement learning / manipulators / neural networks / reinforcement learning / robotics

Cite this article

Download citation ▾

Mouhcine Harib, Hicham Chaoui, Suruz Miah. Evolution of adaptive learning for nonlinear dynamic systems: a systematic survey. Intelligence & Robotics, 2022, 2(1): 37-71 DOI:10.20517/ir.2021.19

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Aseltine J,Sarture C.A survey of adaptive control systems.IRE Trans Automat Contr1958;6:102-8

[2]	Stromer PR.Adaptive or self-optimizing control systems - a bibliography.IRE Trans Automat Contr1959;AC-4:65-8

[3]	Mishkin E. Adaptive control systems. 1st ed. New York: McGraw-Hill; 1961.

[4]	Truxal JG.Adaptive control.IFAC Proceedings Volumes1963;1:386-92

[5]	Eveleigh VW. Adaptive control and optimization technique. 1st ed. New York: McGraw-Hill; 1967.

[6]	Wittenmark B.Stochastic adaptive control methods: a survey.Int J Control1975;21:705-30

[7]	Åström K,Ljung L.Theory and applications of self-tuning regulators.Automatica1977;13:457-76

[8]	Åström K.Theory and applications of adaptive control - a survey.Automatica1983;19:471-86

[9]	Jamali H. Adaptive control methods for mechanical manipulators: a comparative study. Monterey, CA: Naval Postgraduate School; 1989.

[10]	Mathelin MD.Robust adaptive identification of slowly time-varying parameters with bounded disturbances.Automatica1999;35:1291-305

[11]	Deisenroth MP. PILCO: a model-based and data-efficient approach to policy search. Proceedings of the 28th International Conference on International Conference on Machine Learning; 2011 Jun; Madison, WI, USA. 2011. p. 465-72.

[12]	Wang LY.Fundamental limitations and differences of robust and adaptive control. Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148); 2001 Jun 25-27; Arlington, VA, USA. IEEE; 2001. p. 4802-7.

[13]	Ioannou PA. Robust adaptive control. Mineola, NY: Courier Corporation; 2012.

[14]	Lavretsky E.Adaptive output feedback design using asymptotic properties of LQG/LTR controllers.IEEE Trans Automat Contr2012;57:1587-91

[15]	Sastry S.adaptive control: stability, convergence and robustness. Mineola, NY: Dover Publications; 2011.

[16]	Larminat P. On overall stability of certain adaptive control systems.IFAC Proceedings Volumes1979;12:1153-9

[17]	Narendra K.Stable discrete adaptive control.IEEE Trans Automat Contr1980;25:456-61

[18]	Peterson B.Bounded error adaptive control.IEEE Trans Automat Contr1982;27:1161-8

[19]	Fuchs J.Discrete adaptive control: a sufficient condition for stability and applications.IEEE Trans Automat Contr1980;25:940-6

[20]	Goodwin G,Caines P.Discrete-time multivariable adaptive control.IEEE Trans Automat Contr1980;25:449-56

[21]	Egardt B.Global stability analysis of adaptive control systems with disturbances. Proceedings of the 1980 Joint Automatic Control Conference; 2021 Nov 1; San Fransisco, CA. 1980.

[22]	Rohrs CE,Athans M.Robustness of adaptive control algorithms in the presence of unmodeled dynamics. 1982 21st IEEE Conference on Decision and Control; 1982 Dec 8-10; Orlando, FL, USA. IEEE; 1982. p. 3-11.

[23]	Aström KJ.Analysis of Rohrs counterexamples to adaptive control. The 22nd IEEE Conference on Decision and Control; 1983 Dec; San Antonio, TX, USA. 1983. p. 982-7.

[24]	Riedle B,Kokotovic P.Disturbance instabilities in an adaptive system.IEEE Trans Automat Contr1984;29:822-4

[25]	Ioannou P.Instability analysis and improvement of robustness of adaptive control.Automatica1984;20:583-94

[26]	Egardt B.Stability of adaptive controllers. Berlin Heidelberg: Springer; 1979.

[27]	Kreisselmeier G.Stable model reference adaptive control in the presence of bounded disturbances.IEEE Trans Automat Contr1982;27:1169-75

[28]	Samson C.Stability analysis of adaptively controlled systems subject to bounded disturbances.Automatica1983;19:81-6

[29]	Ioannou PA.Adaptive systems with reduced models. New York, NY, USA: Springer-Verlag; 1983.

[30]	Peterson B.Bounded error adaptive control.IEEE Trans Automat Contr1982;27:1161-8

[31]	Narendra K.Robust adaptive control in the presence of bounded disturbances.IEEE Trans Automat Contr1986;31:306-15

[32]	Slotine JJE. Applied nonlinear control. Englewood Cliffs, NJ: Prentice Hall; 1991.

[33]	Bunich AL.Rapidly converging algorithm for the identification of a linear system with limited noise.Autom Remote Control1983;44:1049-54

[34]	Sastry SS.Model-reference adaptive control - stability, parameter convergence, and robustness.IMA J Math Control Info1984;1:27-66

[35]	Slotine JE.Adaptive sliding controller synthesis for non-linear systems.International Journal of Control1986;43:1631-51

[36]	Adaptive control in the presence of disturbances. In: Ioannou PA, Kokotovic PV, editors. Adaptive systems with reduced models. Berlin/Heidelberg: Springer-Verlag; 1983. p. 81-90.

[37]	Ioannou P.A robust direct adaptive controller.IEEE Trans Automat Contr1986;31:1033-43

[38]	Ioannou P.Robust adaptive controller with zero residual tracking errors.IEEE Trans Automat Contr1986;31:773-6

[39]	Ioannou P.Robust direct adaptive control. The 23rd IEEE Conference on Decision and Control; 1984 Dec 12-14; Las Vegas, NV, USA. IEEE; 1984. p. 1015-20.

[40]	Tsakalis KS.The σ-modification in the adaptive control of linear time-varying plants. [1992] Proceedings of the 31st IEEE Conference on Decision and Control; 1992 Dec 16-18; Tucson, AZ, USA. IEEE; 1992. p. 694-8.

[41]	He Z,Xu J.On the asymptotic property analysis for a class of adaptive control systems with σ-modification: adaptive control systems with σ-modification.Int J Adapt Control Signal Process2013;27:620-34

[42]	Li MY.A geometric approach to global-stability problems.SIAM Journal on Mathematical Analysis1996;27:14

[43]	Narendra K.A new adaptive law for robust adaptation without persistent excitation.IEEE Trans Automat Contr1987;32:134-45

[44]	Lasalle J.Some extensions of Liapunov’s second method.IRE Trans Circuit Theory1960;7:520-7

[45]	Mattern DL. Practical applications and limitations of adaptive control. Available from: http://www.proquest.com/docview/303617884/abstract/FC4A275C8474474PQ/1 [Last accessed on 8 Mar 2022]

[46]	Kreisselmeier G.Robust model reference adaptive control.IEEE Trans Automat Contr1986;31:127-33 PMCID:PMC8348028

[47]	Davidson JM. Model reference adaptive control specification for a steam heated finned tube heat exchanger. Available from: https://www.proquest.com/docview/302770965/citation/9192D8E407D24AFBPQ/1 [Last accessed on 8 Mar 2022]

[48]	Davison E,Wright J.On the application of tuning regulators to control a commercial heat exchanger.IEEE Trans Automat Contr1980;25:361-75

[49]	Harrell RC,Hsu CS.Adaptive control of the fluid heat exchange process.J Dyn Syst Meas Control1987;109:49-52

[50]	Zhang Q.Multivariable direct adaptive control of thermal mixing processes.J Dyn Syst Meas Control1985;107:278-83

[51]	Lukas MP.Adaptive control of a heat exchanger using function blocks.Chemical Engineering Communications2007;24:259-73

[52]	Harris CJ. Self-tuning and adaptive control - theory and applications. 1st ed. London: Peter Peregrinus, Ltd; 1981.

[53]	Dubowsky S.The application of model-referenced adaptive control to robotic manipulators.J Dyn Syst Meas Control1979;101:193-200

[54]	Dubowsky S.On the adaptive control of robotic manipulators: the discrete-time case.IEEE Trans Automat Contr1981;

[55]	Nicosia S.Model reference adaptive control algorithms for industrial robots.Automatica1984;20:635-44

[56]	Koivo A.Adaptive linear controller for robotic manipulators.IEEE Trans Automat Contr1983;28:162-71

[57]	Horowitz R.An adaptive control scheme for mechanical manipulators - compensation of nonlinearity and decoupling control.J Dyn Syst Meas Control1986;108:127-35

[58]	Narendra KS.Adaptive identification and control of dynamical systems using neural networks. Proceedings of the 28th IEEE Conference on Decision and Control; 1989 Dec 13-15; Tampa, FL, USA. 1989. p. 1737-8.

[59]	Lee C.Fuzzy logic in control systems: fuzzy logic controller. II.IEEE Trans Syst, Man, Cybern1990;20:419-35

[60]	Sutton RS,Williams RJ.Reinforcement learning is direct adaptive optimal control.IEEE Control Syst1992;12:19-22

[61]	Yechiel O.A survey of adaptive control.IRATJ2017;3:0053

[62]	Malik O.Amalgamation of adaptive control and AI techniques: applications to generator excitation control.Annu Rev Control2004;28:97-106

[63]	Hopfield JJ.Neural networks and physical systems with emergent collective computational abilities.Proc Natl Acad Sci U S A1982;79:2554-8 PMCID:PMC346238

[64]	Hopfield JJ.“Neural” computation of decisions in optimization problems.Biol Cybern1985;52:141-52

[65]	Burr D.Experiments on neural net recognition of spoken and written text.IEEE Trans Acoust, Speech, Signal Processing1988;36:1162-8

[66]	Gorman R.Learned classification of sonar targets using a massively parallel network.IEEE Trans Acoust, Speech, Signal Processing1988;36:1135-40

[67]	Sejnowski T.Parallel networks that learn to pronounce English text.Complex Syst1987;1:145-68

[68]	Widrow B,Baxter R.Layered neural nets for pattern recognition.IEEE Trans Acoust, Speech, Signal Processing1988;36:1109-18

[69]	Levin AU.Control of nonlinear dynamical systems using neural networks: controllability and stabilization.IEEE Trans Neural Netw1993;4:192-206

[70]	Narendra KS.Identification and control of dynamical systems using neural networks.IEEE Trans Neural Netw1990;1:4-27

[71]	Sontag ED.Feedback stabilization using two-hidden-layer nets.IEEE Trans Neural Netw1992;3:981-90

[72]	Barto AG. Connectionist learning for control: an overview. In: Miller WT, Sutton RS, Werbos PJ. Neural networks for control. Cambridge, MA, USA: MIT Press; 1990. p. 5-58.

[73]	Dai SL,Wang M.Dynamic learning from adaptive neural network control of a class of nonaffine nonlinear systems.IEEE Trans Neural Netw Learn Syst2014;25:111-23

[74]	Chen CL,Wen GX.Fuzzy neural network-based adaptive control for a class of uncertain nonlinear stochastic systems.IEEE Trans Cybern2014;44:583-93

[75]	Dai S,Wang C.Neural learning control of marine surface vessels with guaranteed transient tracking performance.IEEE Trans Ind Electron2016;63:1717-27

[76]	Li H,Wang L,Wang H.Adaptive neural control of uncertain nonstrict-feedback stochastic nonlinear systems with output constraint and unknown dead zone.IEEE Trans Syst Man Cybern, Syst2017;47:2048-59

[77]	Cheng L,Hou Z,Tan M.Neural-network-based nonlinear model predictive control for piezoelectric actuators.IEEE Trans Ind Electron2015;62:7717-27

[78]	Ren B,Su CY.Adaptive neural control for a class of uncertain nonlinear systems in pure-feedback form with hysteresis input.IEEE Trans Syst Man Cybern B Cybern2009;39:431-43

[79]	Luo B,Wu HN.Data-driven H∞ control for nonlinear distributed parameter systems.IEEE Trans Neural Netw Learn Syst2015;26:2949-61

[80]	Liu Y.Barrier Lyapunov functions for Nussbaum gain adaptive control of full state constrained nonlinear systems.Automatica2017;76:143-52

[81]	Li Y,Zhuang X.Robust and adaptive backstepping control for nonlinear systems using RBF neural networks.IEEE Trans Neural Netw2004;15:693-701

[82]	Zhang H,Liu D.Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints.IEEE Trans Neural Netw2009;20:1490-503

[83]	Barto AG,Anderson CW.Neuronlike adaptive elements that can solve difficult learning control problems.IEEE Trans Syst, Man, Cybern1983;SMC-13:834-46

[84]	Michie D. Boxes: an experiment in adaptive control. Edinburgh, UK: Oliver and Boyd; 1968. p. 137-52.

[85]	Michie D.Boxes’ as a model of pattern-formation. 1st ed. Edinburgh: Edinburgh univ. press; 1968. p. 206-15.

[86]	Anderson CW.Strategy Learning with multilayer connectionist representations. proceedings of the fourth international workshop on machine learning. Elsevier; 1987. p. 103-14.

[87]	Anderson C.Learning to control an inverted pendulum using neural networks.IEEE Control Syst Mag1989;9:31-7

[88]	Lin CS.CMAC-based adaptive critic self-learning control.IEEE Trans Neural Netw1991;2:530-3

[89]	Albus JS. Theoretical and experimental aspects of a Cerebellar Model. Available from: https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=820153 [Last accessed on 8 Mar 2022]

[90]	Albus JS.A new approach to manipulator control: the cerebellar model articulation controller (CMAC).J Dyn Syst Meas Control1975;97:220-7

[91]	Albus JS.Mechanisms of planning and problem solving in the brain.Math Biosci1979;45:247-93

[92]	Albus JS. Brains, behavior, and robotics. 1st ed. Peterborough: BYTE Books; 1981.

[93]	Huang, Chien-lo Huang. Control of an inverted pendulum using grey prediction model.IEEE Trans on Ind Applicat2000;36:452-8

[94]	Pathak K,Agrawal S.Velocity and position control of a wheeled inverted pendulum by partial feedback linearization.IEEE Trans Robot2005;21:505-13

[95]	Li, Jun Luo. Adaptive Robust dynamic balance and motion controls of mobile wheeled inverted pendulums.IEEE Trans Contr Syst Technol2009;17:233-41

[96]	Chaoui H,Yagoub MCE.ANN-based adaptive motion and posture control of an inverted pendulum with unknown dynamics. 2009 3rd International Conference on Signals, Circuits and Systems (SCS); 2009 Nov 6-8; Medenine, Tunisia. IEEE; 2009. p. 1-6.

[97]	Guez A.Solution to the inverse kinematics problem in robotics by neural networks. IEEE 1988 International Conference on Neural Networks; 1988 Jul 24-27; San Diego, CA, USA. IEEE; 1988. p. 617-24.

[98]	Elsley. A learning architecture for control based on back-propagation neural networks. IEEE 1988 International Conference on Neural Networks; 1988 Jul 24-27; San Diego, CA, USA. IEEE; 1988. p. 587-94.

[99]	Jamshidi M,Vadiee N.A neural network-based controller for a two-link robot. 29th IEEE Conference on Decision and Control; 1990 Dec 5-7; Honolulu, HI, USA. IEEE; 1990. p. 3256-7.

[100]

Karakasoglu A.Decentralized variable structure control of robotic manipulators: neural computational algorithms. 29th IEEE Conference on Decision and Control; 1990 Dec 5-7; Honolulu, HI, USA. IEEE; 1990. p. 3258-9.

[101]

Xu G,Schweitzer G.Application of neural networks on robot grippers. 1990 IJCNN International Joint Conference on Neural Networks; 1990 Jun 17-21; San Diego, CA, USA. IEEE; 1990. p. 337-42.

[102]

Wilhelmsen K.Neural network based controllers for a single-degree-of-freedom robotic arm. 1990 IJCNN International Joint Conference on Neural Networks; 1990 Jun 17-21; San Diego, CA, USA. IEEE; 1990. p. 407-13.

[103]

Miller WT,Kraft LG.Application of a general learning algorithm to the control of robotic manipulators.Int J Rob Res1987;6:84-98

[104]

Miller W.Sensor-based control of robotic manipulators using a general learning algorithm.IEEE J Robot Automat1987;3:157-65

[105]

Miller WT.Real time learned sensor processing and motor control for a robot with vision.Neural Networks1988;1:347

[106]

Miller WT.Real time experiments in neural network based learning control during high speed nonrepetitive robotic operations. Proceedings IEEE International Symposium on Intelligent Control 1988; 1988 Aug 24-26; Arlington, VA, USA. IEEE; 1988. p. 513-8.

[107]

Miller W.Real-time application of neural networks for sensor-based control of robots with vision.IEEE Trans Syst, Man, Cybern1989;19:825-31

[108]

Miller W,Kraft L.CMAC: an associative neural network alternative to backpropagation.Proc IEEE1990;78:1561-7

[109]

Huan L,Bekey .Building a generic architecture for robot hand control. IEEE 1988 International Conference on Neural Networks; 1988 Jul 24-27; San Diego, CA, USA. IEEE; 1988. p. 567-74.

[110]

Wang SD.Self-adaptive neural architectures for control applications. 1990 IJCNN International Joint Conference on Neural Networks; 1990 Jun 17-21; San Diego, CA, USA. IEEE; 1990. p. 309-14.

[111]

Seidl D,Putman J.Neural network compensation of gear backlash hysteresis in position-controlled mechanisms.IEEE Trans on Ind Applicat1995;31:1475-83

[112]

Olsson H,Canudas de Wit C,Lischinsky P.Friction models and friction compensation.European Journal of Control1998;4:176-95

[113]

Katsura S,Ohnishi K.Pushing operation by flexible manipulator taking environmental information into account.IEEE Trans Ind Electron2006;53:1688-97

[114]

Katsura S.Force servoing by flexible manipulator based on resonance ratio control.IEEE Trans Ind Electron2007;54:539-47

[115]

Ghorbel F,Spong M.Adaptive control of flexible-joint manipulators.IEEE Control Syst Mag1989;9:9-13

[116]

Chien M.Adaptive control for flexible-Joint electrically driven robot with time-varying uncertainties.IEEE Trans Ind Electron2007;54:1032-8

[117]

Hauschild JP.Control of harmonic drive motor actuated flexible linkages. Proceedings 2007 IEEE International Conference on Robotics and Automation; 2007 Apr 10-14; Rome, Italy. IEEE; 2007. p. 3451-6.

[118]

Kong K,Moon H,Jeon D.Mechanical design and impedance compensation of SUBAR (Sogang University’s Biomedical Assist Robot). 2008 IEEE/ASME International Conference on Advanced Intelligent Mechatronics; 2008 Jul 2-5; Xi’an, China. IEEE; 2008. p. 377-82.

[119]

Ghorbel F.Adaptive integral manifold control of flexible joint robot manipulators. Proceedings 1992 IEEE International Conference on Robotics and Automation; 1992 May 12-14; Nice, France. IEEE; 1992. p. 707-14.

[120]

Al-ashoor R,Khorasani K.Robust adaptive controller design and stability analysis for flexible-joint manipulators.IEEE Trans Syst, Man, Cybern1993;23:589-602

[121]

Ott C,Hirzinger G.Comparison of adaptive and nonadaptive tracking control laws for a flexible joint manipulator. IEEE/RSJ International Conference on Intelligent Robots and Systems; 2002 Sep 30-Oct 4; Lausanne, Switzerland. IEEE; 2002. p. 2018-24.

[122]

Spong MW.Modeling and control of elastic joint robots.J Dyn Syst Meas Control1987;109:310-8

[123]

Ge SS.Adaptive neural network controller design for flexible joint robots using singular perturbation technique.Transactions of the Institute of Measurement and Control1995;17:120-31

[124]

Taghirad HD.Design and simulation of robust composite controllers for flexible joint robots. 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422); 2003 Sep 14-19; Taipei, Taiwan. IEEE; 2003. p. 3108-13.

[125]

Huang L,Lee TH. Adaptive position/force control of an uncertain constrained flexible joint robots - singular perturbation approach. SICE 2004 Annual Conference; 2004 Aug 4-6; Sapporo, Japan; 2004. p. 220-5.

[126]

Chaoui H.Type-2 fuzzy logic control of a flexible-joint manipulator.J Intell Robot Syst2008;51:159-86

[127]

Karray F,Al-Sharhan S.The hierarchical expert tuning of PID controllers using tools of soft computing.IEEE Trans Syst Man Cybern B Cybern2002;32:77-90

[128]

Gueaieb W,Al-sharhan S.A robust adaptive fuzzy position/force control scheme for cooperative manipulators.IEEE Trans Contr Syst Technol2003;11:516-28

[129]

Kim E.Output feedback tracking control of robot manipulators with model uncertainty via adaptive fuzzy logic.IEEE Trans Fuzzy Syst2004;12:368-78

[130]

Chaoui H,Yagoub MCE.Hybrid neural fuzzy sliding mode control of flexible-joint manipulators with unknown dynamics. IECON 2006 - 32nd Annual Conference on IEEE Industrial Electronics; 2006 Nov 6-10; Paris, France. IEEE; 2006. p. 4082-7.

[131]

Chaoui H,Lakhsasi A.Reference model supervisory loop for neural network based adaptive control of a flexible joint with hard nonlinearities. Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513); 2004 May 2-5; Niagara Falls, ON, Canada. IEEE; 2004. p. 2029-34.

[132]

Chaoui H,Lakhsasi A.Neural network based model reference adaptive control structure for a flexible joint with hard nonlinearities. 2004 IEEE International Symposium on Industrial Electronics; 2004 May 4-7; Ajaccio, France. IEEE; 2004. p. 271-6.

[133]

Hui, Fuchun S, Zengqi S. Observer-based adaptive controller design of flexible manipulators using time-delay neuro-fuzzy networks.J Intell Robot Syst2002;34:453-66

[134]

Subudhi B.Singular perturbation based neuro-H_∞ control scheme for a manipulator with flexible links and joints.Robotica2006;24:151-61

[135]

Chaoui H,Gueaieb W.ANN-based adaptive control of robotic manipulators with friction and joint elasticity.IEEE Trans Ind Electron2009;56:3174-87

[136]

Hou ZG,Tan M.Multicriteria optimization for coordination of redundant robots using a dual neural network.IEEE Trans Syst Man Cybern B Cybern2010;40:1075-87

[137]

Li Z.Neural-adaptive control of single-master-multiple-slaves teleoperation for coordinated multiple mobile manipulators with time-varying communication delays and input uncertainties.IEEE Trans Neural Netw Learn Syst2013;24:1400-13

[138]

Li Z,Sun F.Adaptive fuzzy control for multilateral cooperative teleoperation of multiple robotic manipulators under random network-induced delays.IEEE Trans Fuzzy Syst2014;22:437-50

[139]

He W,Ge SS.Adaptive neural network control of a robotic manipulator with time-varying output constraints.IEEE Trans Cybern2017;47:3136-47

[140]

He W,Hong J.Vibration control of a flexible robotic manipulator in the presence of input deadzone.IEEE Trans Ind Inf2017;13:48-59

[141]

Zhu G,Lee T.Simulation studies of tip tracking control of a single-link flexible robot based on a lumped model.Robotica1999;17:71-8

[142]

Sun C,Hong J.Neural network control of a flexible robotic manipulator using the lumped spring-mass model.IEEE Trans Syst Man Cybern, Syst2017;47:1863-74

[143]

Bertsekas DP. Neuro-dynamic programming. Belmont, MA: Athena Scientific; 1996.

[144]

Pitts W.How we know universals; the perception of auditory and visual forms.Bull Math Biophys1947;9:127-47

[145]

Liu R. Multispectral images-based background subtraction using Codebook and deep learning approaches. Available from: https://www.theses.fr/2020UBFCA013.pdf [Last accessed on 8 Mar 2022]

[146]

Liu W,Liu X,Liu Y.A survey of deep neural network architectures and their applications.Neurocomputing2017;234:11-26

[147]

Silver D,Simonyan K.Mastering the game of Go without human knowledge.Nature2017;550:354-9

[148]

Laud AD. Theory and application of reward shaping in reinforcement learning. Available from: https://www.proquest.com/openview/bb29dc3d66eccbe7ab65560dd2c4147f/1?pq-origsite=gscholar&cbl=18750&diss=y [Last accessed on 8 Mar 2022]

[149]

Kober J,Peters J.Reinforcement learning in robotics: a survey.Int J Rob Res2013;32:1238-74

[150]

Digney BL.Nested Q-learning of hierarchical control structures. Proceedings of International Conference on Neural Networks (ICNN’96); 1996 Jun 3-6; Washington, DC, USA. IEEE; 1996. p. 161-6.

[151]

Schaal S.Learning from demonstration. Proceedings of the 9th International Conference on Neural Information Processing Systems; 1996 Dec; Cambridge, MA, USA. IEEE; 1996. p. 1040-6.

[152]

Kuan C.Reinforcement learning and robust control for robot compliance tasks.J Intell Robot Syst1998;23:165-82

[153]

Bucak IO.Application of reinforcement learning control to a nonlinear dexterous robot. Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304); 1999 Dec 7-10; Phoenix, AZ, USA. IEEE; 1999. p. 5108-13.

[154]

Bucak IO.Reinforcement learning control of nonlinear multi-link system.Eng Appl Artif Intell2001;14:563-75

[155]

Althoefer K,Husmeier D.Reinforcement learning in a rule-based navigator for robotic manipulators.Neurocomputing2001;37:51-70

[156]

Gaskett C. Q-learning for robot control. Available from: https://digitalcollections.anu.edu.au/bitstream/1885/47080/5/01front.pdf [Last accessed on 8 Mar 2022]

[157]

Smart WD.Reinforcement learning for robot control.Proc SPIE2002;

[158]

Izawa J,Ito K.Biological robot arm motion through reinforcement learning. Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292); 2002 May 11-15; Washington, DC, USA. IEEE; 2002. p. 3398-403.

[159]

Peters J,Schaal S. Reinforcement learning for humanoid robotics. 3rd IEEE-RAS International Conference on Humanoid Robots; 2003 Sep 29-30; Karlsruhe, Germany. 2003.

[160]

Bhatnagar S,Ghavamzadeh M.Natural actor-critic algorithms.Automatica2009;45:2471-82

[161]

Theodorou E,Schaal S.Reinforcement learning for optimal control of arm movements. Poster presented at 37th Annual Meeting of the Society for Neuroscience (Neuroscience 2007); San Diego, CA, USA. 2007.

[162]

Peters J.Natural actor-critic.Neurocomputing2008;71:1180-90

[163]

Atkeson CG.Learning tasks from a single demonstration. Proceedings of International Conference on Robotics and Automation; 1997 Apr 25-25; Albuquerque, NM, USA. IEEE; 1997. p. 1706-12.

[164]

Hoffmann H,Schaal S. Behavioral experiments on reinforcement learning in human motor control. Available from: https://www.researchgate.net/publication/325463394 [Last accessed on 8 Mar 2022]

[165]

Peters J.Learning to control in operational space.Int J Rob Res2008;27:197-212

[166]

Buchli J,Stulp F.Variable impedance control - a reinforcement learning approach. In: Matsuoka Y, Durrant-Whyte H, Neira J, editors. Robotics: Science and Systems VI. Cambridge: MIT Press; 2011.

[167]

Theodorou E,Schaal S.Reinforcement learning of motor skills in high dimensions: a path integral approach. 2010 IEEE International Conference on Robotics and Automation; 2010 May 3-7; Anchorage, AK, USA. IEEE; 2010. p. 2397-403.

[168]

Kappen HJ.Path integrals and symmetry breaking for optimal control theory.J Stat Mech2005;2005:P11011

[169]

Shah H.Reinforcement learning control of robot manipulators in uncertain environments. 2009 IEEE International Conference on Industrial Technology; 2009 Feb 10-13; Churchill, VIC, Australia. IEEE; 2009. p. 1-6.

[170]

Kim B,Park S.Learning robot stiffness for contact tasks using the natural actor-critic. 2008 IEEE International Conference on Robotics and Automation; 2008 May 19-23; Pasadena, CA, USA. IEEE; 2008. p. 3832-7.

[171]

Kim B,Park S.Impedance learning for robotic contact tasks using natural actor-critic algorithm.IEEE Trans Syst Man Cybern B Cybern2010;40:433-43

[172]

Adam S,Babuska R.Experience replay for real-time reinforcement learning control.IEEE Trans Syst , Man, Cybern C2012;42:201-12

[173]

Hafner R.Reinforcement learning in feedback control: Challenges and benchmarks from technical process control.Mach Learn2011;84:137-69

[174]

Krizhevsky A,Hinton GE.ImageNet classification with deep convolutional neural networks.Commun ACM2017;60:84-90

[175]

Levine S,Darrell T. End-to-end training of deep visuomotor policies. Available from: http://arxiv.org/abs/1504.00702 [Last accessed on 8 Mar 2022]

[176]

Levine S,Abbeel P. Learning contact-rich manipulation skills with guided policy search. Available from: http://arxiv.org/abs/1501.05611 [Last accessed on 8 Mar 2022]

[177]

Tai L,Liu M,Burgard W. A survey of deep network solutions for learning control in robotics: from reinforcement to imitation. Available from: http://arxiv.org/abs/1612.07139 [Last accessed on 8 Mar 2022]

[178]

Vecerik M,Scholz J. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. Available from: http://arxiv.org/abs/1707.08817 [Last accessed on 8 Mar 2022]

[179]

Liu R,Zanne P,Dresp-langley B.Deep reinforcement learning for the control of robotic manipulation: a focussed mini-review.Robotics2021;10:22

[180]

Andrychowicz M,Ray A. Hindsight experience replay. Available from: https://arxiv.org/abs/1707.01495v3 [Last accessed on 8 Mar 2022]

[181]

Gupta A,Ganguli S.Embodied intelligence via learning and evolution.Nat Commun2021;12:5721 PMCID:PMC8494941

[182]

Rajeswaran A,Gupta A. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. Available from: http://arxiv.org/abs/1709.10087 [Last accessed on 8 Mar 2022]

[183]

Matas J,Davison AJ. Sim-to-real reinforcement learning for deformable object manipulation. Available from: http://arxiv.org/abs/1806.07851 [Last accessed on 8 Mar 2022]