Intelligent and efficient fiber allocation strategy based on the dueling-double-deep Q-network

Yong ZHANG , Zhipeng YUAN , Jia DING , Feng GUO , Junyang JIN

Front. Eng ›› 2025, Vol. 12 ›› Issue (4) : 721 -735.

PDF (4740KB)
Front. Eng ›› 2025, Vol. 12 ›› Issue (4) : 721 -735. DOI: 10.1007/s42524-025-4170-7
Industrial Engineering and Intelligent Manufacturing
RESEARCH ARTICLE

Intelligent and efficient fiber allocation strategy based on the dueling-double-deep Q-network

Author information +
History +
PDF (4740KB)

Abstract

Fiber allocation in optical cable production is critical for optimizing production efficiency, product quality, and inventory management. However, factors like fiber length and storage time complicate this process, making heuristic optimization algorithms inadequate. To tackle these challenges, this paper proposes a new framework: the dueling-double-deep Q-network with twin state-value and action-advantage functions (D3QNTF). First, dual action-advantage and state-value functions are used to prevent overestimation of action values. Second, a method for random initialization of feasible solutions improves sample quality early in the optimization. Finally, a strict penalty for errors is added to the reward mechanism, making the agent more sensitive to and better at avoiding illegal actions, which reduces decision errors. Experimental results show that the proposed method outperforms state-of-the-art algorithms, including greedy algorithms, genetic algorithms, deep Q-networks, double deep Q-networks, and standard dueling-double-deep Q-networks. The findings highlight the potential of the D3QNTF framework for fiber allocation in optical cable production.

Graphical abstract

Keywords

optical fiber allocation / deep reinforcement learning / dueling-double-deep Q-network / dual action-advantage and state-value functions / feasible solutions

Cite this article

Download citation ▾
Yong ZHANG, Zhipeng YUAN, Jia DING, Feng GUO, Junyang JIN. Intelligent and efficient fiber allocation strategy based on the dueling-double-deep Q-network. Front. Eng, 2025, 12(4): 721-735 DOI:10.1007/s42524-025-4170-7

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Fang J, Wang Z, Liu W, Chen L, Liu X, (2024). A new particle-swarm-optimization- assisted deep transfer learning framework with applications to outlier detection in additive manufacturing. Engineering Applications of Artificial Intelligence, 131: 107700

[2]

FujimotoSHoofHMegerD (2018). Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning: 1587–1596

[3]

Gök M, (2024). Dynamic path planning via Dueling Double Deep Q-network (D3QN) with prioritized experience replay. Applied Soft Computing, 158: 111503

[4]

Gui Y, Tang D, Zhu H, Zhang Y, Zhang Z, (2023). Dynamic scheduling for flexible job shop using a deep reinforcement learning approach. Computers & Industrial Engineering, 180: 109255

[5]

HausknechtMStoneP (2015). Deep recurrent Q-learning for partially observable MDPs. In: 2015 AAAI Fall Symposium Series

[6]

Huang M, Hao Y, Wang Y, Hu X, Li L, (2023). Split-order consolidation optimization for online supermarkets: Process analysis and optimization models. Frontiers of Engineering Management, 10( 3): 499–516

[7]

Kiran B R, Sobh I, Talpaert V, Mannion P, Sallab A A A, Yogamani S, Perez P, (2022). Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23( 6): 4909–4926

[8]

Lee J, Perkins D, (2021). A simulated annealing algorithm with a dual perturbation method for clustering. Pattern Recognition, 112: 107713

[9]

Li H, Liu H, Lan C, Yin Y, Wu P, Yan C, Zeng N, (2023). SMWO/D: A decomposition-based switching multi-objective whale optimiser for structural optimisation of turbine disk in aero-engines. International Journal of Systems Science, 54( 8): 1713–1728

[10]

Li H, Wang Z, Lan C, Wu P, Zeng N, (2024). A novel dynamic multiobjective optimization algorithm with hierarchical response system. IEEE Transactions on Computational Social Systems, 11( 2): 2494–2512

[11]

Liu J, Yuan S, Luo B, Biondi B, Noh H Y, (2023). Turning telecommunication fiber-optic cables into distributed acoustic sensors for vibration-based bridge health monitoring. Structural Control and Health Monitoring, 2023: 1–14

[12]

Luo S, Zhang L, Fan Y, (2021). Dynamic multi-objective scheduling for flexible job shop by deep reinforcement learning. Computers & Industrial Engineering, 159: 107489

[13]

Ma G, Wang Z, Liu W, Fang J, Zhang Y, Ding H, Yuan Y, (2023). Estimating the state of health for lithium-ion batteries: A particle swarm optimization-assisted deep domain adaptation approach. IEEE/CAA Journal of Automatica Sinica, 10( 7): 1530–1543

[14]

Martin E, Cervantes A, Saez Y, Isasi P, (2020). IACS-HCSP: Improved ant colony optimization for large-scale home care scheduling problems. Expert Systems with Applications, 142: 112994

[15]

Mazyavkina N, Sviridov S, Ivanov S, Burnaev E, (2021). Reinforcement learning for combinatorial optimization: A survey. Computers & Operations Research, 134: 105400

[16]

Ming F, Gong W, Li D, Wang L, Gao L, (2023). A competitive and cooperative swarm optimizer for constrained multiobjective optimization problems. IEEE Transactions on Evolutionary Computation, 27( 5): 1313–1326

[17]

Moerland T M, Broekens J, Plaat A, Jonker C M, (2023). Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16( 1): 1–118

[18]

Qiu C, Hu Y, Chen Y, Zeng B, (2019). Deep deterministic policy gradient (DDPG)-based energy harvesting wireless communications. IEEE Internet of Things Journal, 6( 5): 8577–8588

[19]

Silva Y L T, Subramanian A, Pessoa A A, (2018). Exact and heuristic algorithms for order acceptance and scheduling with sequence-dependent setup times. Computers & Operations Research, 90: 142–160

[20]

Tan F, Yuan Z, Zhang Y, Tang S, Guo F, Zhang S, (2024). Improved genetic algorithm based on rule optimization strategy for fibre allocation. Systems Science & Control Engineering, 12( 1): 2347887

[21]

TokicM (2010). Adaptive ε-greedy exploration in reinforcement learning based on value differences. In: Annual Conference on Artificial Intelligence. Berlin, Heidelberg: Springer, 203–210

[22]

Van Hasselt H, Guez A, Silver D, (2016). Deep reinforcement learning with double Q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 30( 1): 2094–2100

[23]

Wang L, Hu X, Wang Y, Xu S, Ma S, Yang K, Liu Z, Wang W, (2021b). Dynamic job-shop scheduling in smart manufacturing using deep reinforcement learning. Computer Networks, 190: 107969

[24]

Wang X, Wang S, Liang X, Zhao D, Huang J, Xu X, Dai B, Miao Q, (2024). Deep reinforcement learning: A survey. IEEE Transactions on Neural Networks and Learning Systems, 35( 4): 5064–5078

[25]

Wang Y, Han Y, Gong D, Li H, (2023a). A review of intelligent optimization for group scheduling problems in cellular manufacturing. Frontiers of Engineering Management, 10( 3): 406–426

[26]

WangYLiuWWangCFadzilFLauriaSLiuX (2023b). A novel multi-objective optimization approach with flexible operation planning strategy for truck scheduling. International Journal of Network Dynamics and Intelligence, 100002

[27]

Wang Y, Wu Z, Guan G, Li K, Chai S, (2021a). Research on intelligent design method of ship multi-deck compartment layout based on improved taboo search genetic algorithm. Ocean Engineering, 225: 108823

[28]

WangZSchaulTHesselMHasseltHLanctotMFreitasN (2016). Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning, 1995–2003

[29]

Xue Z, Zhang Y, Cheng C, Ma G, (2020). Remaining useful life prediction of lithium-ion batteries with adaptive unscented kalman filter and optimized support vector regression. Neurocomputing, 376: 95–102

[30]

Yao F, Du Y, Li L, Xing L, Chen Y, (2023). General modeling and optimization technique for real-world earth observation satellite scheduling. Frontiers of Engineering Management, 10( 4): 695–709

[31]

Zhang Y, Chen L, Li Y, Zheng X, Chen J, Jin J, (2021). A hybrid approach for remaining useful life prediction of lithium-ion battery with adaptive levy flight optimized particle filter and long short-term memory network. Journal of Energy Storage, 44: 103245

[32]

Zhao Y, Wang Y, Tan Y, Zhang J, Yu H, (2021). Dynamic jobshop scheduling algorithm based on deep Q network. IEEE Access: Practical Innovations, Open Solutions, 9: 122995–123011

[33]

ZhengTZhouYHuMZhangJ (2023). Dynamic scheduling for large-scale flexible job shop based on noisy DDQN. International Journal of Network Dynamics and Intelligence, 100015

[34]

ZhongLZengZHuangZShiXBieY (2024). Joint optimization of electric bus charging and energy storage system scheduling. Frontiers of Engineering Management, 11(4): 676–696

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (4740KB)

642

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/