Convolutional,Neural,Network-Based,Deep,Q-Network(CNN-DQN)Resource,Management,in,Cloud,Radio,Access,Network

【www.zhangdahai.com--其他范文】

Amjad Iqbal,Mau-Luen Tham,Yoong Choon Chang

Department of Electrical and Electronic Engineering,Lee Kong Chian Faculty of Engineering and Science,Universiti Tunku Abdul Rahman(UTAR),Malaysia

*The corresponding author,email:thamml@utar.edu.my

Abstract:The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation(5G)mobile networks.Cloud radio access network(CRAN)is a prominent framework in the 5G mobile network to meet the above requirements by deploying low-cost and intelligent multiple distributed antennas known as remote radio heads(RRHs).However,achieving the optimal resource allocation(RA)in CRAN using the traditional approach is still challenging due to the complex structure.In this paper,we introduce the convolutional neural network-based deep Q-network(CNN-DQN)to balance the energy consumption and guarantee the user quality of service(QoS)demand in downlink CRAN.We first formulate the Markov decision process(MDP)for energy efficiency(EE)and build up a 3-layer CNN to capture the environment feature as an input state space.We then use DQN to turn on/off the RRHs dynamically based on the user QoS demand and energy consumption in the CRAN.Finally,we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs.In the end,we conduct the simulation to compare our proposed scheme with nature DQN and the traditional approach.

Keywords:energy efficiency(EE);markov decision process(MDP);convolutional neural network(CNN);cloud RAN;deep Q-network(DQN)

The past two decades have witnessed the exponential growth of mobile subscribers and user data traffic.In the annual report of Cisco 2020,it is expected that mobile subscribers will reach 5.7 billion with monthly data traffic of 110 exabytes(EB)in 2023[1].To fulfill the above requirements,a large number of base stations(BSs)need to be installed within the coverage area.However,installing more BSs leads to infrastructure costs as well as energy and power consumption.Approximately 60-75% of the total energy consumption is due to BSs in the cellular network[2].Therefore,it is essential to dynamically turn off the BSs,when the required users demand is low to ensure low energy consumption.

In the existing radio access network(RAN)framework,the capacity is limited by the isolated resource management among BSs.One way to improve the existing RAN framework capacity is by network densification.However,such a process increases the capital and operational cost(CAPEX and OPEX),and thus,existing RAN frameworks cannot support the everincreasing user demand and mobile subscribers[3].

Cloud radio access network(CRAN)is a prominent architecture to overcome the above difficulties by providing reliable and fast real-time communication for the next generation network[4].The main idea of CRAN is to decouple the BS functionality into distributed low-cost,low-power remote radio heads(RRHs)and centralized baseband unit(BBU).The transceiving of a radio signal from/to end-users are performed by RRHs,while the BBU is responsible for baseband signal processing functions.Due to the centralized processing,CRAN assigns the overall radio resources knowledge to RRHs according to the user demand and mobility.Although CRAN is a key enabler technology for the upcoming generation,adaptive resource allocation(RA)is still a topic worthy of investigation.

Many researchers have investigated the RA problem in CRAN from different perspectives,i.e.,throughput,resource allocation,and joint cell activation[5–7].However,these problems are formulated based on the traditional model-based approach with a static network environment.Such approaches become impractical,especially where the user mobility affects the network state at each time stept.Therefore,in this work,we consider the model-free approach to optimize the RA problem in the entire operational period in real-time.

Reinforcement learning(RL)is a machine learning(ML)approach,where the learning agent interacts continuously with an unknown environment to tackle the complex decision-making problem based on the current state[8].The learning agent chooses the possible action from each state and then trains the model based on the available data to make the decision at each time stept.Recently,deep learning(DL)has been applied successfully in many applications,i.e.,image processing,computer vision(CV),natural language processing(NLP),and speech recognition.Similarly,DL has also been used in wireless communication to learn the sequential control task to help the RL algorithm end-to-end[9].The convolutional neural network(CNN)advances the DL method to extract more complex dynamic features in the mobility scenario[10].In[11]and[12]use the CNN to train the neural network(NN)to maximize the energy efficiency and throughput of the multicell heterogeneous networks.However,they considered neither the setting of CRAN nor RL,which captures the interactions with the environment.Furthermore,most of the existing work defines the state of the wireless network as user demand and RRHs and neglects the relationship between them[13,14].The main drawback of these works is that users report their information to the respective RRHs,increasing the burden on signaling overhead as feedback.Secondly,the above works[13,14]usually exploit the fully connected layers to train the NN;increasing the training parameters complexity[15].Keeping the above drawbacks in our mind,we consider the relationship between users and RRHs as raw observation at the input state and propose a three-layer relational CNN-based deep QNetwork(CNN-DQN),that randomly captures the environmental state features.We combine CNN and DQN schemes in this paper to extract the raw observation between users and RRHs from the network.The CNN phase is responsible for extracting features and reducing network parameter complexity.The DQN phase on the other hand is responsible for dynamically turning on/off the RRHs.To address the problem of RA more efficiently,we first devise the Markov decision process(MDP)framework for EE by defining the state,action,reward,and next state.We then propose the CNN and DQN method to dynamically switch on/off the RRHs to maximize energy efficiency(EE)and satisfy the user quality of service(QoS)demand.Finally,we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs.The key contributions of this paper are as follows:

1.We proposed a DRL-based autonomous RA decision making approach that successfully guarantees user satisfaction and maximizes EE while minimizing power consumption in downlink CRAN.

2.The RA problem is formulated as MDP by defining the RL components,i.e.,state-space,actionspace,and reward function.We first build up a three layers CNN framework that captures the raw observation as an input state.The CNN output is fed with the DQN input to ensure the RRHs on/off switching decision based on the user requirements.

3.We divide our algorithm into two phases,i.e.,CNN and DQN.The CNN phase is used to extract the raw observation feature from the environment,and the DQN phase determines the best possible action in a particular state.

In the end,we have conducted a comprehensive simulation to validate the effectiveness of the proposed algorithm,and results show that the proposed solution has better performance in terms of maximizing energy efficiency,power-saving,and satisfying the user QoS requirements,compared to nature DQN and traditional approach.

The rest of this paper is organized as follows.We present some of the closely related work to this research in Section II.The network model,along with power consumption and problem formulation,is described in Section III.The proposed scheme,follow by 3-layer CNN phase,is presented in Section IV.Simulation results and conclusions are discussed in Section V and Section VI,respectively.

The increasing popularity of smartphone applications has accelerated the development of the wireless network.One of the wireless network’s significant challenges is successfully handling power consumption,maximizing the EE,and satisfying the user QoS requirements.As such,many scholars have shown their interest to propose a lasting solution for the above problems.

In[16],maximization of the EE problem is studied based on the mode-selection algorithm using the transmission rate as a QoS requirement.Based on the device’s various factors,it is concluded that EE is maximized successfully for each content delivery.In[17],joint power and RRHs selection problem are formulated to improve the EE for green CRAN.Comprehensive simulation in[17],show that the proposed method can obtain the near-optimal solution for EE.The authors of[17],extend their work to use mixed integer non-linear programming problems at jointly selected RRHs to reduce the computational complexity in[18].In[19],two transmission strategies,i.e.,data sharing and data compression,are formulated to minimize total power in the wireless network.The radio resource management framework is proposed in heterogeneous CRAN to maximize the performance of EE in[20].Similarly,a load-aware maximization approach is proposed to maximize the EE optimization problem in a small dense network[21].In[22],a soft fractional frequency reuse method is proposed to formulate the joint optimization problem with resource block and power allocation to maximize the performance of EE in heterogeneous CRANs.In[23],the user association problem is investigated to improve the EE performance in a small cell heterogeneous network.All the above works[16–23]applied the modelbased optimization approach to solve the RA management problem in the wireless network.These methods solve the utility function by assuming the static environment.However,such approaches are not practical as the channel conditions of a typical mobile radio network change dynamically.

Recently,the deep learning(DL)branch of ML has been applied successfully to solve the high computation mass data issues.DL-based approaches significantly reduce the high data complexity and have been adopted in wireless network problems,i.e.,RA[24]and physical layer communication[25].One step ahead,Minh et al.,[26]introduced the advancement of DL,known as deep reinforcement learning(DRL),that can solve the human-level complicated control problem.DRL provides a promising solution to tackle the RA problems in the wireless network[27,28].A DRL technique is applied to solve the cloud computing systems power management problem and overall resource distribution[29].A DRL-based algorithm is applied in multiple relay cooperative networks to maximize EE performance and overall data rate[30].Furthermore,different DRL algorithms are used to solve the power management problem in CRAN[13,14].Although these works solve the RA problem with handcrafted features and do not explicitly describe the relationship between users and RRHs information in the network state.If such information is present between the users and RRHs,then RRHs are responsible for recording all the valid information.The users do not need to provide any such information for signaling.Such a process reduces the signaling burden in the network.Secondly,the above works utilize the fully connected layer to train the NN,which significantly increases the training parameters[15].This motivates us to combine CNN with DQN.The CNN phase is responsible for extracting the input state feature containing the user’s demand,the on/off switching of RRHs,and the relationship between users and RRHs.In contrast,the DQN phase speeds up the algorithm learning process and achieves better network performance.

Figure 1.DRL-based dynamic RA in CRAN.

As depicted in Figure 1,we consider a typical downlink CRAN framework,containing a set of RRHsR,set of UEsUand a single BBU and denoted asR={1,2,...R},andU={1,2,...U},respectively.We also consider a time-period T={1,2,...T}.The UEs change their position randomly and reports user data rate demandDu∈[Dmax,Dmin]and channel state information(CSI)to the BBU pool.The BBU pool act as an RL agent.The major notations are summarized in Table 1.Without loss of generality,we assume that the users and RRHs are equipped with a single antenna.

Furthermore,we consider that the users can access all the RRHs,and the RRHs are connected to the BBU pool.Thus,all the information is shared in a centralized manner.The path loss of the system model is followed by[19].

wheredr,uindicates the distance between the RRHs and users.The channel fading model is defined as[19]:

whereζr,u,ρr,uandωr,urepresents the antenna gain,small scale-fading and shadowing coefficient,respectively.According to[19],the signal-to-interference-plus-noise ratio received by the UEuat timet δu(t)can be represented as:

Such thatσ2denotes the background noise.hu(t)represents the channel gain between RRHs and users at timetand expressed ashu(t) =[h1u(t),h2u(t),...,hR u(t)]T.wu(t)is known as beamforming weight between RRHs and user at timetand can be denoted aswu(t) =[w1u(t),w2u(t),...,wR u(t)]T.Thus,the achievable data rate for the user at time steptis given as[19]:

whereWandJmimplies the channel bandwidth and SNR gap,respectively.The SNR gap depends on the modulation scheme.We assumeJm=1 according to[14].

3.1 Power Consumption Model

According to[31],the relationship between BS power consumption and transmit power can be approximated linearly.Therefore,for each RRH,a linear power model is applied as:

Such that,pr,A(t)denotes the active RRHs power without transmitting any signals.pr,S(t)represents the sleep power when there is no need for a transmission.indicates the RRHs transmit power.Whereasτis known as the power amplifier drain efficiency and consider as a constant.AandSmeans the set of active and sleep modes of RRHs,respectively.Thus,one hasA∪S.

Most of the works in the literature,e.g.,[17],[19],and[32],have ignored the transition power to calculate the total power consumption,which is a change mode power of RRH states.In this paper,we also consider the transition power,denoted aspr,G(t).Therefore,the total power consumptionPtotal(t)of all RRHs at time steptcan be expressed mathematically as:

3.2 Problem Formulation

This work aims to maximize the long-term EE performance by adjusting the per RRH transmit power and user data rate.According to[33],EE(Mbits/J)is defined as the ratio between the sum of throughput and total power consumption at timet.Mathematically,we can express the EE as:

Thus,the optimization problem of EE can be formulated as:

Constraints(8b)indicate that each user target data rate must be less than or equal to the achievable data rate.Whereas constraints(8c)specify users transmit power must be less than or equal to the maximum transmit power.

In this paper,we present the DRL approach to maximize the long-term EE performance and satisfy the user QoS requirement in downlink CRAN.In this section,we first introduce the basic concept of RL for better readability,followed by the proposed scheme description in detail.

4.1 RL Concept

RL is a powerful artificial intelligence(AI)technique in which an agent interacts solely with an unknown environment to monitor the current state and map the situation to maximize the reward value.Basically,RL follows the concept of the MDP framework for modeling the complex decision-making problem.The MDP can be defined as a tuple ofN=(S,A,K(s,a),P(s′,k|s,a).Such thatS,Arepresents the discrete state and action space,respectively.K(s,a)denotes the reward function for a particular state and action.P(s′,k|s,a)imply the transition probability,when the agent moves from the given states∈Sand actiona∈Ato the next states′∈S.The agent observes the current network statestvalue at each time steptand execute an actionatas shown in Figure 2.A feedback is obtained after executing the action from the environment in terms of scalar reward.The goal of the agent is to learn the near-optimal control policya=π(s),that can maximize the reward function value in a long-term perspective.A state value functionis introduced to calculate the average accumulative reward function.ThisVπ(s)follows the recursive relationship based on the Bellman equation[26]as:

Figure 2.RL basic form and components.

whereμis the discount factor,specify the importance of the future reward than the current reward.According to[26],two basic approaches are used to solve the MDP framework for(9),i.e.,dynamic programming(DP)and Q-learning.DP is mostly used with the model-based approach where the state transition probability is already known.However,in the complex 5G networking environment,the state transition probability is changing with each time stept.Therefore,such approach is not feasible to solve the complex network application problem.To solve the MDP problem,we deal with the unknown state transition probability in this work.

4.2 Q-Learning

Q-learning is a basic algorithm for dealing with unknown state transition problems based on the temporal difference method.Before explaining the Qlearning,we first evaluate the concept of the Q-value function,also known as the state-action value functionThe optimal Q-function can be represented asQ*(s,a)=maxπ Q(s,a).The Bellman equation[26],for the optimal Q-function is then written as:

The action selection in Q-learning relies onϵ-greedy exploration,where the agent chooses the random action with a probability ofϵand the greedy action with a probability of 1-ϵ.The Q-value is initialized with the given state and action value and update iteratively while evolving the action selection.The updated Qvalue can be written as:

whereγis known as the learning rate.From(11),it can be concluded that every value of state and action is stored in the form of a Q-table,which works well for a limited state-action dimension.However,in the realtime application,the state-action value increases exponentially,creating a problem for Q-learning to store all the values in the lookup Q-table

4.3 DQN-Learning

To avoid the dimensionality problem,a linear function approximation method is proposed to approximate the Q-value function.However,such method can not estimate the Q-value function accurately.Such a problem is then solved by proposing deep reinforcement learning(DRL)with the help of a neural network known as a deep neural network(DNN).The basic idea of DNN is to use the non-linear function to approximate the Q-value function.Deep Q-network(DQN)is the widely used DRL algorithm proposed for different applications[34].In DQN,a separate target network and experience replayDis further added besides the DNN to reduce the correlation between data and make the system more stable for convergence.In DQN,the learning agent collects all the information and then applies this information to train the policy(offline)in its background.Thus,DQN makes all the decisions efficiently and timely based on the already learned policy.In DQN,the state-action value functionQ(s,a)can be represented based on Bellman equation asK+μ Q*(s′,a′).The loss function is then calculated as:

where

θandθ′indicates the weights of evaluated and target network,respectively.We optimize these weights by using the stochastic gradient descent algorithm[35].

4.4 Proposed Convolutional Neural Network(CNN)Scheme

Due to random user movement at each time stept,the state space dimension is increasing exponentially.We propose a relational CNN-DQN algorithm,that significantly solves the state space dimensionality issue and achieved the optimal control policy on RRHs on/off switching.Considering the dynamic network statespace characteristic,we propose three hidden convolutional layers.The hidden convolution layer contains the 32,32 and 64 convolution filters with an input matrix ofM×M,respectively.The input matrix consists of the user’s demand,RRHs on/off state,and CSI feature.We use the Xavier normal initializer[36]approach to initialize each convolutional filter.The output of the convolution filter can be calculated as:

whereOis the convolutional filter output,and I,K,P,and S represents the input size,kernel(filter size),number of paddings,and stride,respectively.In this work,we consider the kernel size for all hidden layers is 2×2,the padding value for all hidden layers is assumed to be 0,and the stride value for all hidden layers is 1.Furthermore,we employ the activation function as a rectified linear unit(ReLU)for all the hidden layers[35].The proposed CNN-DQN algorithm consists of a convolutional layer,pooling layers followed by flatten and fully connected layers,as shown in Figure 3.The convolution layers are responsible for extracting the environment state-space features.The pooling layers are used to perform the down-sampling of the extracted feature.We apply a max filter that will output the maximum value of a particular region.To prevent the NN from over-fitting,we dropout the output of the last max-pooling layer with a probability ofβ=0.25.The output of the last max-pooling layer is then flattened to a one dimensional vector,which is then connected to 100×1 of fully connected(FC)NN.The training process is then executed by the DQN algorithm,as shown in Figure 3.Therefore,the extracted state feature of CNN is fed to DQN to perform the on/off RRHs switch decision.We define the state-spaces(t),action-spacea(t)and reward functionK(t)for our problem as:

Algorithm 1.CNN-based DQN framework.Input:User date rate demand Du(t),RRHs on/off state vr(t),and Channel gain H(t).Output:Energy Efficiency EE(t).1:Initialize the experience memoryDwith capacity 2:Initialize the weights and biases for the main and target network θ and θ′3:for each episode do 4:Observe the initial state st 5:Extract the CSI feature φt using CNN 6:Feed the extracted CSI feature φt to the DRL agent 7:for each time step t do 8: Choose a probability ρ 9: if ϵ≥ρ then 10: Select a random action at 11: else 12: Select a greedy action at=argmax Q*(φt,at;θ)13: end if 14: Solve(19)to obtain optimal beamforming solution based on an active set of RRHs R.15: Calculate reward Kt and successor state s′t 16: Store the transition of(st,at,Kt,s′t)into D.17: Randomly sample mini-batch transition(st,at,Kt,s′t)from D 18: Set targetimages/BZ_141_1556_1964_1593_2009.pngKt=Kt;if episode terminate Kt+μmax Q(φ′,a′;θ′);otherwise (15)19: Train the network to minimize the loss function of(12)20: Perform the stochastic descent step on(yt-Q(φt,at;θ))2 21:end for 22:end for

Figure 3.Proposed CNN based DQN framework.

4.4.1 State Space

At each time stept,we capture the feature of the state,contains the user date rate demandDu(t),the RRHs on/off statevr(t)and a relational matrix between usersUand RRHsR.The relational matrix can be constructed asH∈RR×Hand define as:

wherehURindicates the CSI features between users and RRHs.We then concatenate all these three features into a single vector.Thus,the state-space becomes as:

4.4.2 Action Space

At each time stept,we define the action based on the RRHs on/off state and represented asar(t)∈{0,1}.However,we restricted our RL agent to decides the action based on the active set of RRHsA.

4.4.3 Reward

The reward indicates whether to punish or encourage the actions.In our proposed framework,the reward is the objective function defined in(8),which shows the improvement of EE and can be written mathematically as:

4.5 Resource Allocation Optimization

Recall(6);we consider three essential power consumption,where the state power(pr,A,pr,S)and transition power(pr,G)are composed of some constant values and can be easily calculated.These two powers rely on the current value of state and action.Therefore,it is necessary to calculate the minimum transmit power to minimize the total power consumptionPtotal(t)at each time stept.Thus,the allocation scheme at each time steptdepends on the active set of RRHs beamforming weights.Therefore,we express the optimization problem as:

The objective is to achieve the minimum transmit power given by the states of RRHsR.TheCu(t)is defnied as the user demand;1);whereasPr,Tindicates the constraints of maximum RRHs transmit power.Constraints(19a)represents that all user demand must be met,whereas constraints(19b)ensure the transmit power limitation of each RRHs.The problem explains in(19)belongs to the convex optimization problem and can be modified to the second-order cone optimization problem[37].Such a problem can be solved by using some of the iterative approaches[38].At the start of such optimization,it is worth noting that it may have no feasible solution due to insufficient active RRHs.In this case,a large negative reward is assigned to the DQN agent.To avoid the infeasibility issue,more RRHs are activated to satisfy the user demand.The detail of the proposed framework pseudo-code is summarized in Algorithm 1.

4.6 Computational Complexity

The computational complexity of the proposed CNNbased DQN(CNN-DQN)algorithm is derived from(19).Since(19)can be modified to second-order cone programming(SOCP),that can be solved in polynomial time by a standard interior-point method;e.g.,[39].The total total number of variables of(19)isR+Uand a total number of constraints is 2R+2U+1.Thus,the worst-case computational complexity per-episode isO(RU)3.5.Therefore,the overall computational complexity to solve Algorithm 1 isO(R3.5U3.5K+Ψ.Ω+D+|Gθ|),whereKis the number of episodes required to converge the Algorithm 1.(Ψ.Ω),DandGθspecify the size of extracted channel gain,the number of experience samples from the replay buffer,and the number of hidden layers,respectively.Similarly,the computational complexity of[14]isO(R3.5U3.5K+D+|Gθ|).The computational complexity of the proposed algorithm is much higher than that in[14]since the proposed algorithm limits the size of the channel gain feature at the input of the network state.However,the signaling overhead of our proposed algorithm is much less than that in[14]because users do not have to exchange their information to the respective RRHs.RRHs record all the information between RRHs and users.That reduces the signaling burden of the network.

Table 2.Simulation parameters.

In this section,we analyze the simulation setting and illustrate the performance of our proposed CNN-DQN algorithm.We compared our proposed algorithm with nature DQN and the traditional approach.Without loss of generality,we assumed the traditional approach as a full coordinate association and denoted as FA,where all RRHs are always turned on,followed by solving the convex optimization problem(19).This is different from the DRL-based approach where the agent learns and switches on/off certain RRHs based on user demands and channel state information.The performance evaluation is carried out to maximize energy efficiency performance,total power consumption,and user QoS satisfaction.We fix the user demand in the range of[10-60]Mbps with a step size of 10 Mbps.Furthermore,we consider two different scenarios to verify the effectiveness of increasing RRHs,when RL agents cannot find a feasible solution to satisfy the user QoS demand.All the simulations are successfully performed in the environment of Python 3.7.First,we consider 1000 training episodes for the DRL agent to learn the environment behavior,and then performance is measured based on 100 testing episodes.All other simulation parameters used in our work are summarized in Table 2.

5.1 Convergence Analysis

Figure 4.Convergence algorithm.

Consequently,we first compare and analyze the convergence of the algorithm.We evaluate the convergence of the algorithm,when the number of RRHs isR=8 and number of users isU=4.It can be seen from Figure 4,that both algorithms can converge.The proposed solution CNN-DQN converge to the optimal when the number of episodes is 790.However,it can be clearly seen that the optimal value obtained by convergence of proposed solution is far greater than DQNsolution.At the start of algorithm,we can see that the convergence rate of proposed solution is similar to DQN solution.The DQN algorithm converges at 900 episode.However,at this time,the weighted energy efficiency calculated by proposed solution has been better than that of DQN.Therefore,when proposed solution converges,the optimal energy efficiency of the system is superior to that of DQN solution.

5.2 Effect of Learning Rate

Learning rateγis an important hyperparameter used in machine learning(ML).γis used to tune the NN to achieve the optimal performance for the problem.Therefore,it is necessary to choose an optimal value forγ.The larger the value ofγthe more chance to overfit the model.However,the larger value ofγincreases the learning speed of NN.The smaller the value ofγ,the easier it is to avoid the model from overfitting.However,the smaller value ofγrequires tremendous computing power to train the NN.For epochitheγis given by:

wheredandγinitindicates the positive integer and initial learning rate to control the decaying speed.In this work,we assumed∈{0.1,0.2,0.3,...1.0}.Theγis constant for all values of the epoch whend=0.As shown in Figure 5,theγiis decreased sharply for increasingd.Therefore,to avoid the NN from overfitting,we usedγ=0.001=10-3andd=1.

Figure 5.Effect of learning rate on the different decaying values with epoch.

5.3 Power Minimization

This section demonstrates the proposed CNN-DQN algorithm for power consumption performance on different values of the user demands.We then compare our proposed algorithm with nature DQN and FA,as shown in Figure 5.We first consider theR=6 andU=4.It can be observed from Figure 6,that the power is exponentially increasing for each point of increasing user demand for all three approaches.The proposed solution can save 5-10%more power at all the points of user demand.It can also be noted,that the proposed approach and DQN-based approach consistently outperforming the FA-based approach.The reason comes from the learning of the environment.So,at each time stept,the learning agent takes the best possible action from the action space.The FA randomly chooses the action from the current action space and do not learn anything from the environment.However,all three approaches become infeasible to satisfy the user QoS demand after reaches to 50Mbps.This is because,insufficient number of active RRHs are available to satisfy the user requirements.To avoid such a problem,we increased RRHsR=8 with the same number of usersU=4 as shown in Figure 6.It can be concluded that the proposed solution can significantly save more power and also satisfy the user QoS requirement.However,increasing RRHs effects on the power consumption.We can observe from Figure 6,when the user demand is 50Mbps,the power consumed by the system is 48.73W forR=6,while at the same point,the power is consumed 55W forR=8.

Figure 6.Comparison of the proposed algorithm with other algorithms for power saving on different user demand.

Figure 7.Comparison of proposed algorithm with other algorithms for EE maximization on different user demands.

5.4 Energy Efficiency Maximization

Figure 8.Comparison of proposed solution for EE maximization with power consumption for R=6 and U=4.

Figure 9.Comparison of proposed solution for EE maximization with power consumption for R=8 and U=4.

In Figure 7,we have plotted the EE performance against the different user demands.The EE is linearly increasing with increasing user demand.It can be noticed from Figure 7,that DRL-based method outperforming the FA approach.In the FA approach,EE performance depends on the immediate network state,making the decision only for the current action space.The DQN-based approach improves the EE performance at each point of increasing user demand compared to the FA-based approach.However,the DQNbased approach contains a large number of state-action pairs.Which increases the computational complexity as well as reduces the system performance.The DQNbased approach still achieves 4-8%better performance than the FA-based approach.From Figure 7,we can observe that our proposed approach invokes to reduce the training parameters and outperform the other two approaches for increasing user demand.The proposed approach can obtain 5-12%better performance at every point of user demand for both scenarios.These performances are evidence of using a CNN-DQN approach for a high mobility scenario.In Figure 8,we have plotted the EE with power consumption forR=6 andU=4.It can be seen from Figure 8,that at the start,EE is slightly increased over a small increase in power for all the approaches.However,the EE starts to decline without further increasing after reaching to maximum value for all three approaches.This is because high transmit power is required to satisfy the user QoS demand.From Figure 8,we can observe the proposed approach achieve a maximum EE of 4.10Mbits/J with a power consumption of 48.73W Similarly,DQN and FA can achieve 3.92 Mbits/J with a power consumption of 51.95W and 3.61 Mbit/J with a power consumption of 54.07W,respectively.A similar trend has been applied for Figure 9,increasing the RRHsR=8 with the same number of users asU=4.As shown in Figure 9,the proposed approach can achieve the EE of 3.95 Mbit/J with a power consumption of 61.15W while the DQN-based and and 3.55 Mbits/J with a power consumption of 63W and 65W respectively.These figure shows the effectiveness of the proposed method in terms of achieving more EE with less power consumption.

Figure 10.Average EE performance vs transmit power.

5.5 Transmit Power Selection

Figure 10,demonstrates the average EE performance with different values of transmit power.From Figure 10,we can observe that our proposed approach always outperforms the performance of EE compare to DQN and FA approaches.At the start,the transmit power of RRHs is very low;all three approaches almost achieve the same level of average EE performance.As the transmit power is increased,the average EE performance is increasing linearly.The proposed solution can achieve a higher value of average EE on different levels of transmit power.Which shows the effectiveness of the proposed approach for different values of transmit power.

In this paper,we proposed a CNN-based DQN(CNNDQN)approach in downlink CRAN to simultaneously balance the EE performance and satisfy the user QoS demand.First,we combined the CNN approach with DQN,where the CNN phase is responsible for extracting the input state information.The extracted feature of CNN was fed to the input of DQN,which dynamically performs the switching decision of the RRHs based on the energy consumption of user demand.Then,the RA optimization scheme was formulated based on the user constraints and transmit power to balance the performance of EE and satisfy the user QoS requirements.Finally,comprehensive simulation results showed that the proposed solution achieves 10-15%more efficiency than the baseline solutions and best balances the EE performance and satisfies the user QoS requirements in different scenarios.

ACKNOWLEDGEMENT

This work is supported by the Universiti Tunku Abdul Rahman(UTAR)Malaysia under UTARRF(IPSR/RMC/UTARRF/2021-C1/T05).

推荐访问:based Deep CNN

本文来源:http://www.zhangdahai.com/shiyongfanwen/qitafanwen/2023/0427/590169.html

  • 相关内容
  • 热门专题
  • 网站地图- 手机版
  • Copyright @ www.zhangdahai.com 大海范文网 All Rights Reserved 黔ICP备2021006551号
  • 免责声明:大海范文网部分信息来自互联网,并不带表本站观点!若侵害了您的利益,请联系我们,我们将在48小时内删除!