基于改进DQN强化学习算法的弹性光网络资源分配研究

尚晓凯; 韩龙龙; 翟慧鹏

doi:doi:10.13921/j.cnki.issn1002-5561.2023.05.003

光通信技术, 2023, 47 (5): 0012, 网络出版: 2024-02-02

基于改进DQN强化学习算法的弹性光网络资源分配研究

Research on elastic optical network resource allocation based on improved DQN reinforcement learning algorithm

论文大纲

尚晓凯韩龙龙翟慧鹏

作者单位

国家计算机网络与信息安全管理中心河南分中心，郑州 450000

弹性光网络改进深度Q网络强化学习算法资源分配 elastic optical network, improved reinforcement le

摘要

针对光网络资源分配中频谱资源利用率不高的问题，提出了一种改进的深度Q网络(DQN)强化学习算法。该算法基于ε-greedy策略，根据动作价值函数和状态价值函数的差异来设定损失函数，并不断调整ε值，以改变代理的探索率。通过这种方式，实现了最优的动作值函数，并较好地解决了路由与频谱分配问题。此外，采用了不同的经验池取样方法，以提高迭代训练的收敛速度。仿真结果表明：改进DQN强化学习算法不仅能够使弹性光网络训练模型快速收敛，当业务量为300 Erlang时，比DQN算法频谱资源利用率提高了10.09%，阻塞率降低了12.41%，平均访问时延减少了1.27 ms。

Abstract

Aiming at the low utilization of spectrum resources in optical network resource allocation, an improved deep Q network(DQN) reinforcement learning algorithm is proposed. Based on the ε-greedy strategy, the algorithm sets the loss function according to the difference between the action value function and the state value function, and constantly adjusts the ε value to change the exploration rate of the agent. In this way, the optimal action value function is realized, and the routing and spectrum allocation problems are solved well. In addition, different experience pool sampling methods are used to improve the convergence speed of iterative training. The simulation results show that the improved DQN reinforcement learning algorithm can not only make the elastic optical network training model converge quickly, but also improve the spectrum resource utilization by 10.09%, reduce the blocking rate by 12.41% and reduce the average access delay by 1.27 ms compared with DQN algorithm when the traffic volume is 300 Erlang.

参考文献

[1] 李新伟，王敬勇. 互联网发展技术溢出与区域创新能力[J]. 科技管理研究，2020，40(22)：7-14.

[2] 姬卫玲. 波分复用技术的应用现状和发展前景[D]. 南京：南京邮电大学，2015.

[3] 张盛峰，何阿成，石鹏涛，等. EON中资源节约型组播路由和频谱分配策略[J]. 光通信技术, 2018, 42(3)：52-55.

[4] 包博文，李娜娜，胡劲华，等. EON中距离自适应调制技术综述[J]. 光通信技术，2018，42(9)：14-17.

[5] SHARMA D, KUMAR S. Evaluation of network blocking probability and network utilization efficiency on proposed elastic optical network using RSA algorithms[J]. Journal of Optical Communications, 2020, 41(4) : 403-409.

[6] 崔思恒. 光网络中基于强化学习的动态资源分配技术研究[D]. 北京：北京邮电大学，2021.

[7] 季晨阳，毕美华，周钊，等. 基于DQN的多租户PON在线带宽资源分配算法[J]. 光通信技术，2021，45(9)：36-39.

[8] 于佼良. 基于深度学习的EON路由和频谱分配策略的研究[D]. 南京：南京邮电大学，2020.

[9] MUHAMMAD A, ZERVAS G, FORCHHEIMER R. Resource allocation for space-division multiplexing: optical white box versus optical black box networking[J]. Journal of Lightwave Technology, 2015, 33(23): 4928-4941.

[10] 尚晓凯，翟慧鹏，韩龙龙. 基于DQN的光网络资源分配方法研究[J]. 电子技术与软件工程，2022，(9)：1-4.

[11] 袁银龙. DQN算法及应用研究[D]. 广州：华南理工大学，2019.

[12] GOSAVI A, PARULEKAR A. Solving Markov decision processes with downside risk adjustment[J]. International Journal of Automation and Computing, 2016,13(3): 235-245.

[13] RAZMINIA A, ASADIZADEHSHIRAZ M, TORRES D. Fractional order version of the Hamilton-Jacobi-Bellman equation[J]. Journal of Computational and Nonlinear Dynamics, 2019(1): 14-18.

[14] HUANG T, ZHANG R X, ZHOU C, et al. QARC: video quality aware rate control for real-time video streaming via deep reinforcement learning[C]//ACM. Proceedings of 2018 ACM Multimedia Conference. New York: ACM, 2018: 1208-1216.

[15] RIEDMILLER M. Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method[C]// European Conference on Machine Learning, October 3-7, 2005, Berlin, Germany. Heidelberg: Spinger, 2005: 317-328.

[16] 史加荣，王丹，尚凡华，等. 随机梯度下降算法研究进展[J]. .

尚晓凯, 韩龙龙, 翟慧鹏. 基于改进DQN强化学习算法的弹性光网络资源分配研究[J]. 光通信技术, 2023, 47(5): 0012. SHANG Xiaokai, HAN Longlong, ZHAI Huipeng. Research on elastic optical network resource allocation based on improved DQN reinforcement learning algorithm[J]. Optical Communication Technology, 2023, 47(5): 0012.

基于改进DQN强化学习算法的弹性光网络资源分配研究

关于本站 Cookie 的使用提示

全站搜索

基于改进DQN强化学习算法的弹性光网络资源分配研究

相关论文

相关资讯

关于本站 Cookie 的使用提示

全站搜索