Deep Q-Network

수학노트
둘러보기로 가기 검색하러 가기

노트

위키데이터

말뭉치

  1. We applied DQN to learn to play games on the Atari 2600 console.[1]
  2. In our Nature paper we trained separate DQN agents for 50 different Atari games, without any prior knowledge of the game rules.[1]
  3. The DQN has been able to complete different games with the same algorithm.[2]
  4. A DQN starts by exploring a game and gradually learning the mechanics of it, the more the agent plays this game, the more it learns and is able to achieve higher scores.[2]
  5. The DQN algorithm that connects all these parts together is shown in Fig.[2]
  6. To improve the performance of the DQN, an auxiliary model is used alongside.[2]
  7. The DQN neural network model is a regression model, which typically will output values for each of our possible actions.[3]
  8. For demonstration's sake, I will continue to use our blob environment for a basic DQN example, but where our Q-Learning algorithm could learn something in minutes, it will take our DQN hours.[3]
  9. In this paper, over 30 sub-algorithms were surveyed that influence the performance of DQN variants.[4]
  10. Multi Deep Q-Network (MDQN) as a generalization of popular Double Deep Q-Network (DDQN) algorithm was developed.[4]
  11. A deep Q-network (DQN) is a neural network used to learn a Q-function.[5]
  12. As most reinforcement learning is associated with complex (typically visual) inputs, the initial layers of a DQN are normally convolutional.[5]
  13. Double deep Q network is a variant that has sometimes been erroneously but understandably confused with the target network paradigm explained above.[5]
  14. A duelling deep Q network recalls actor-critic architectures in that two separate estimations are made based on the environment and then combined to inform what to do.[5]
  15. The resulting algorithm is called Deep Q-Network (DQN).[6]
  16. These two facts make DQN extremely slow to learn: millions of transitions are needed to obtain a satisfying policy.[6]
  17. DQN was initially applied to solve various Atari 2600 games.[6]
  18. Architecture of the CNN used in the original DQN paper.[6]
  19. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network.[7]
  20. In DQN, a target network, which calculates a target value and is updated by the Q function at regular intervals, is introduced to stabilize the learning process.[7]
  21. However, because the target value is not propagated unless the target network is updated, DQN usually requires a large number of samples.[7]
  22. In this study, we proposed Constrained DQN that uses the difference between the outputs of the Q function and the target network as a constraint on the target value.[7]
  23. In this work, we make the first attempt to theoretically understand the deep Q-network (DQN) algorithm (Mnih et al., 2015) from both algorithmic and statistical perspectives.[8]
  24. In specific, we focus on a slight simplification of DQN that fully captures its key features.[8]
  25. Under mild assumptions, we establish the algorithmic and statistical rates of convergence for the action-value functions of the iterative policy sequence obtained by DQN.[8]
  26. As a byproduct, our analysis provides justifications for the techniques of experience replay and target network, which are crucial to the empirical success of DQN.[8]
  27. the Deep Q-Network (DQN) surpasses the level of professional human players in most of the challenging Atari 2600 games.[9]
  28. In order to alleviate this problem, in this paper, we introduce two approaches that integrate global and local attention mechanisms respectively into the DQN model.[9]
  29. Deep Q learning (DQN), as published in Playing Atari with Deep Reinforcement Learning | Mnih et al, 2013, leverages advances in deep learning to learn policies from high dimensional sensory input.[10]
  30. Well, we know about the deep Q-network architecture, and we also have been introduced to replay memory .[11]
  31. We're now going to see exactly how the training process works for a DQN by utilizing this replay memory.[11]
  32. Now, with deep Q-learning, our network will make use of the Bellman equation to estimate the Q-values to find the optimal Q-function.[12]
  33. Alright, we should now have a general idea about what deep Q-learning is and what, at a high level, the deep Q-network is doing.[12]
  34. A Double Deep Q-Network, or Double DQN utilises Double Q-learning to reduce overestimation by decomposing the max operation in the target into action selection and action evaluation.[13]
  35. To avoid computing the full expectation in the DQN loss, we can minimize it using stochastic gradient descent.[14]
  36. The Atari DQN work introduced a technique called Experience Replay to make the network updates more stable.[14]
  37. Next, take a look at the tutorial for training a DQN agent on the Cartpole environment using TF-Agents.[14]
  38. In deep Q-learning, we use a neural network to approximate the Q-value function.[15]
  39. OpenAI gym provides several environments fusing DQN on Atari games.[15]
  40. There are some more advanced Deep RL techniques, such as Double DQN Networks, Dueling DQN and Prioritized Experience replay which can further improve the learning process.[15]
  41. In 2015, DQN beat human experts in many Atari games.[16]
  42. Let’s restart our journey back to the Deep Q-Network DQN.[16]
  43. In the Seaquest game below, DQN learns how to read scores, shoot the enemy, and rescue divers from the raw images, all by itself.[16]
  44. To address that, we switch to a deep network Q (DQN) to approximate Q(s, a).[16]
  45. In the initialization part, we create our environment with all required wrappers applied, the main DQN neural network that we are going to train, and our target network with the same architecture.[17]
  46. This is the second of three posts devoted to present the basics of Deep Q-Network (DQN), in which we present in detail the algorithm.[17]
  47. DQN 에서 도입한 Experience replay 는 모든 경험을 uniform 하게 샘플링한다.[18]
  48. The deep Q-network (DQN) combined with Q-learning have demonstrated excellent results for several Atari 2600 games.[19]
  49. We focus on a profit sharing (PS) method that is an XoL method and combine a DQN and PS.[19]
  50. The proposed method DQNwithPS is compared to a DQN in Pong of Atari 2600 games.[19]
  51. We demonstrate that the proposed DQNwithPS method can learn stably with fewer trial-and-error searches than only using a DQN.[19]
  52. Memory¶ We’ll be using experience replay memory for training our DQN.[20]
  53. It has been shown that this greatly stabilizes and improves the DQN training procedure.[20]
  54. 이후에도 double DQN, dueling DQN, rainbow DQN 등 여러 개선버전들이 등장했습니다.[21]
  55. 더 개선하기 위해 Double DQN이나 Prioritized replay, Dueling DQN 등이 논의된다.[22]
  56. Note that these algorithms adopt the same classical DQN network structure, which has little influence on their compared results.[23]
  57. (referred as DQN with single agent).[23]
  58. From Figure 7 and Figure 8 , we observe that DDQN with a dual-agent algorithm outperforms the binary action algorithm and DQN algorithm with a large margin.[23]
  59. Thus, the DQN algorithm suffers from the traffic flow of left turns.[23]
  60. In this paper, we present a PSR model-based DQN approach which combines the strengths of the PSR model and DQN planning.[24]
  61. Also, SL-RNN + RL-DQN does not consider the effect of action value when calculating the state representation, which may incur the inaccurate representations of the underlying states.[24]
  62. In DQN, the last four frames of the observations are directly input to the CNN as the first layer of DQN to compute the current state information.[24]
  63. : we compared the performance of RPSR-DQN with the model-free methods including the DQN-1frame and DRQN.[24]

소스

메타데이터

위키데이터