2021년 2월 17일 (수) 08:27에 Pythagoras0님의 편집

2021-02-17T08:27:03Z

Pythagoras0: /* 메타데이터 */ 새 문단

2020-12-26T12:26:46Z

메타데이터: 새 문단

Pythagoras0: /* 노트 */ 새 문단

2020-12-21T06:05:45Z

노트: 새 문단

새 문서

== 노트 ==

===위키데이터===
* ID : [https://www.wikidata.org/wiki/Q42259287 Q42259287]
===말뭉치===
# AlphaGo Zero was able to defeat its predecessor in only three days time with lesser processing power than AlphaGo.<ref name="ref_6ff9d2c2">[https://hub.packtpub.com/deepmind-alphago-zero-game-changer-for-ai-research/ Why DeepMind AlphaGo Zero is a game changer for AI research]</ref>
# All these facts beg the questions: what makes AlphaGo Zero so exceptional?<ref name="ref_6ff9d2c2" />
# Simply put, AlphaGo Zero is the strongest Go program in the world (with the exception of AlphaZero).<ref name="ref_6ff9d2c2" />
# But AlphaGo Zero didn’t use any human data whatsoever.<ref name="ref_6ff9d2c2" />
# In October 2017, the DeepMind team published details of a new Go-playing system, AlphaGo Zero, that studied no human games at all.<ref name="ref_3b604e16">[https://www.quantamagazine.org/why-alphazeros-artificial-intelligence-has-trouble-with-the-real-world-20180221/ Quanta Magazine]</ref>
# At the end of these scrimmages, AlphaGo Zero went head to head with the already superhuman version of AlphaGo that had beaten Lee Sedol.<ref name="ref_3b604e16" />
# But perhaps even more significant than these victories is how AlphaGo Zero became so dominant.<ref name="ref_4c6a78a1">[https://www.scientificamerican.com/article/ai-versus-ai-self-taught-alphago-zero-vanquishes-its-predecessor/ AI versus AI: Self-Taught AlphaGo Zero Vanquishes Its Predecessor]</ref>
# AlphaGo Zero even devised its own unconventional strategies.<ref name="ref_4c6a78a1" />
# This tutorial walks through a synchronous single-thread single-GPU (read malnourished) game-agnostic implementation of the recent AlphaGo Zero paper by DeepMind.<ref name="ref_40b8489a">[https://web.stanford.edu/~surag/posts/alphazero.html Simple Alpha Zero]</ref>
# The methods are fairly simple compared to previous papers by DeepMind, and AlphaGo Zero ends up beating AlphaGo (trained using data from expert games and beat the best human Go players) convincingly.<ref name="ref_40b8489a" />
# Recently, DeepMind published a preprint of Alpha Zero on arXiv that extends AlphaGo Zero methods to Chess and Shogi.<ref name="ref_40b8489a" />
# The aim of this post is to distil out the key ideas from the AlphaGo Zero paper and understand them concretely through code.<ref name="ref_40b8489a" />
# It is able to do this by using a novel form of reinforcement learning, in which AlphaGo Zero becomes its own teacher.<ref name="ref_49f7785b">[https://deepmind.com/blog/alphago-zero-learning-scratch/ AlphaGo Zero: Starting from scratch]</ref>
# This updated neural network is then recombined with the search algorithm to create a new, stronger version of AlphaGo Zero, and the process begins again.<ref name="ref_49f7785b" />
# Google later developed AlphaZero, a generalized version of AlphaGo Zero that could play chess and Shōgi in addition to Go.<ref name="ref_2ba2291a">[https://en.wikipedia.org/wiki/AlphaGo_Zero AlphaGo Zero]</ref>
# In December 2017, AlphaZero beat the 3-day version of AlphaGo Zero by winning 60 games to 40, and with 8 hours of training it outperformed AlphaGo Lee on an Elo scale.<ref name="ref_2ba2291a" />
# AlphaGo Zero's neural network was trained using TensorFlow, with 64 GPU workers and 19 CPU parameter servers.<ref name="ref_2ba2291a" />
# AlphaGo Zero was widely regarded as a significant advance, even when compared with its groundbreaking predecessor, AlphaGo.<ref name="ref_2ba2291a" />
# The new AlphaGo Zero beat the previous version by 100 games to 0, and learned Go completely on its own.<ref name="ref_96d48964">[https://www.kdnuggets.com/2017/10/alphago-zero-biggest-ai-advance.html AlphaGo Zero: The Most Significant Research Advance in AI]</ref>
# In October, DeepMind published a paper describing a new version of AlphaGo, called AlphaGo Zero.<ref name="ref_b16c50c1">[https://www.yuzeh.com/data/agz-cost.html How much did AlphaGo Zero cost?]</ref>
# Not only that, AlphaGo Zero learned to play Go without any prior knowledge of the game (in other words, tabula rasa).<ref name="ref_b16c50c1" />
# AlphaGo Zero learns to play Go by simulating matches against itself in a procedure referred to as self-play.<ref name="ref_b16c50c1" />
# This is not to say that AlphaGo Zero is not an amazing achievement (it is!).<ref name="ref_b16c50c1" />
# Remarkably, during this self-teaching process AlphaGo Zero discovered many of the tricks and techniques that human Go players have developed over the past several thousand years.<ref name="ref_5ed98bba">[https://www.technologyreview.com/2017/10/18/148511/alphago-zero-shows-machines-can-become-superhuman-without-any-help/ AlphaGo Zero Shows Machines Can Become Superhuman Without Any Help]</ref>
# AlphaGo Zero simply played against itself, randomly at first.<ref name="ref_5ed98bba" />
# This is changed when DeepMind released AlphaGo Zero in late 2017.<ref name="ref_eb307863">[https://jonathan-hui.medium.com/alphago-zero-a-game-changer-14ef6e45eba5 AlphaGo Zero — a game changer. (How it works?)]</ref>
# We may expect AlphaGo Zero is more complicated and harder to train.<ref name="ref_eb307863" />
# In AlphaGo Zero, we train a single network f using self-play games.<ref name="ref_eb307863" />
# After three days of self-play using hugely powerful computer systems that let it play games at superhuman speeds, AlphaGo Zero was able to defeat its predecessor 100 games to nil.<ref name="ref_c363e036">[https://www.theverge.com/2019/11/27/20985260/ai-go-alphago-lee-se-dol-retired-deepmind-defeat Former Go champion beaten by DeepMind retires after declaring AI invincible]</ref>
# To address this challenge, we start by taking steps towards developing a formal understanding of AGZ.<ref name="ref_921f15d8">[https://openreview.net/forum?id=rkxtl3C5YX Understanding & Generalizing AlphaGo Zero]</ref>
# They had to make a newer and better version called AlphaGo Zero.<ref name="ref_89498c98">[https://towardsdatascience.com/enjoy-the-go-with-alpha-go-e1172afdb322 Enjoy the GO with Alpha Go]</ref>
# AlphaGo Zero uses only reinforcement learning.<ref name="ref_89498c98" />
# AlphaGo Zero uses only 1 neural network.<ref name="ref_89498c98" />
# AlphaGo Zero has some very similar features to AlphaGo Lee, but its distinct differences are what makes the new version so dominant.<ref name="ref_89498c98" />
# Among their accomplishments, particular focus will be placed upon the recent success of AlphaGo Zero which made waves in the machine learning and artificial intelligence communities.<ref name="ref_85f6aa5b">[https://dl.acm.org/doi/10.1145/3206157.3206174 Overview on DeepMind and Its AlphaGo Zero AI]</ref>
# The result, AlphaGo Zero, detailed in a paper published in October, 2017, was so called because it had zero knowledge of Go beyond the rules.<ref name="ref_cdc1512a">[https://www.newyorker.com/science/elements/how-the-artificial-intelligence-program-alphazero-mastered-its-games How the Artificial Intelligence Program AlphaZero Mastered Its Games]</ref>
# When the AlphaGo Zero and AlphaZero papers were published, a small army of enthusiasts began describing the systems in blog posts and YouTube videos and building their own copycat versions.<ref name="ref_cdc1512a" />
# The distributed LeelaZero community has had their system play more than ten million games against itself—a little more than AlphaGo Zero.<ref name="ref_cdc1512a" />
# The latest updates present in AlphaGo Zero left researchers in awe.<ref name="ref_a23e3470">[https://www.axcelerate.com.au/post/alphago-master-vs-alphago-zero-the-power-of-reinforcement-learning AlphaGo Master vs AlphaGo Zero - The Power of Reinforcement Learning]</ref>
# With AlphaGo Zero, DeepMind pushed RL’s independence from data further by starting with 100% randomness.<ref name="ref_a23e3470" />
# AlphaGo Zero is much less demanding than old Alphago, but running the same setup would still take 1700 GPU-years with ordinary hardware.<ref name="ref_6d138ff8">[https://rossum.ai/blog/building-our-own-version-of-alphago-zero/ Building Our Own Version of AlphaGo Zero]</ref>
# This is the setup where Nochi was the first AlphaGo Zero replication that achieved the level of the GNU Go baseline.<ref name="ref_6d138ff8" />
# Several other efforts to replicate the success of AlphaGo Zero are now underway – e.g. Leela Zero and Odin Zero.<ref name="ref_6d138ff8" />
# There were many advances in Deep Learning and AI in 2017, but few generated as much publicity and interest as DeepMind’s AlphaGo Zero.<ref name="ref_8fcef0d5">[https://hackernoon.com/the-3-tricks-that-made-alphago-zero-work-f3d47b6686ef dThe 3 Tricks That Made AlphaGo Zero Work]</ref>
# In this essay, I’ll try to give an intuitive idea of the techniques AlphaGo Zero used, what made them work, and what the implications for future AI research are.<ref name="ref_8fcef0d5" />
# This data, generated purely via lookahead and self-play, is what DeepMind used to train AlphaGo Zero.<ref name="ref_8fcef0d5" />
# AlphaGo Zero’s was its neural network architecture, a “two-headed” architecture.<ref name="ref_8fcef0d5" />
# A simplified, highly flexible, commented and (hopefully) easy to understand implementation of self-play based reinforcement learning based on the AlphaGo Zero paper (Silver et al).<ref name="ref_743bb4be">[https://github.com/suragnair/alpha-zero-general suragnair/alpha-zero-general: A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more]</ref>
# If you follow artificial intelligence research, you probably saw last week's Nature article on DeepMind's AlphaGo Zero.<ref name="ref_d7b1fb7e">[https://blog.benchsci.com/alphago-zero-artificial-intelligence-drug-discovery What AlphaGo Zero Means for Artificial Intelligence Drug Discovery]</ref>
# This is the promise that AlphaGo Zero represents.<ref name="ref_d7b1fb7e" />
# While previous versions learned from thousands of human amateur and professional games, AlphaGo Zero taught itself to play by receiving only the goal (win), the rules, and feedback on its success.<ref name="ref_d7b1fb7e" />
# Even so, he notes, AlphaGo Zero is an important advance.<ref name="ref_d7b1fb7e" />
# AI research arm, DeepMind, announced in an article in the journal Nature today that it has built a final version of its prolific digital Go master: AlphaGo Zero.<ref name="ref_4f3d7f61">[https://qz.com/1105509/deepminds-new-alphago-zero-artificial-intelligence-is-ready-for-more-than-board-games/ DeepMind has a bigger plan for its newest Go-playing AI]</ref>
# The most important new modification is how AlphaGo Zero learned to master the game.<ref name="ref_4f3d7f61" />
# AlphaGo Zero doesn’t have hints from humans that previous systems had, like which pieces are whose or how to interpret the board.<ref name="ref_4f3d7f61" />
# AlphaGo Zero could beat the version of AlphaGo that faced Lee Sedol after training for just 36 hours and earned its 100-o score after 72 hours.<ref name="ref_4f3d7f61" />
# AlphaGo Zero used a Deep Reinforcement Learning approach and a tree-based search strategy.<ref name="ref_6033617b">[https://analyticsindiamag.com/deepminds-alphago-zero-really-scientific-breakthrough-step-towards-general-intelligence/ Is DeepMind’s AlphaGo Zero Really A Scientific Breakthrough?]</ref>
# Besides the human-like ingenuity displayed in learning, the reason why AlphaGo Zero is a big step forward is because the system got rid of the supervision and feature engineering.<ref name="ref_6033617b" />
# At NIPS 2017, Deep Reinforcement Learning was the most popular topic and DeepMind has delivered great results with AlphaGo Zero which plays at superhuman level.<ref name="ref_6033617b" />
# A new and much more powerful version of the program called AlphaGo Zero unveiled Wednesday is even more capable of surprises.<ref name="ref_bc19db6f">[https://www.wired.com/story/this-more-powerful-version-of-alphago-learns-on-its-own/ This More Powerful Version of AlphaGo Learns On Its Own]</ref>
# AlphaGo Zero showcases an approach to teaching machines new tricks that makes them less reliant on humans.<ref name="ref_bc19db6f" />
# DeepMind CEO Demis Hassabis said in a press briefing Monday that the guts of AlphaGo Zero should be adaptable to scientific problems such as drug discovery, or understanding protein folding.<ref name="ref_bc19db6f" />
# AlphaGo Zero is so-named because it doesn’t need human knowledge to get started, relying solely on that self-play mechanism.<ref name="ref_bc19db6f" />
# AlphaGo Zero was built on an improved reinforcement-learning system, and it trained itself from scratch without any input from human games.<ref name="ref_b802908c">[https://livebook.manning.com/book/deep-learning-and-the-game-of-go/chapter-14/ Chapter 14. AlphaGo Zero: Integrating tree search with reinforcement learning · Deep Learning and the Game of Go]</ref>
# AlphaGo Zero AI program just became the Go champion of the world without human data or guidance.<ref name="ref_ed46dbf7">[https://futureoflife.org/2017/10/18/deepminds-alphago-zero-becomes-go-champion-without-human-assistance/ DeepMind’s AlphaGo Zero Becomes Go Champion Without Human Input]</ref>
# AlphaGo Zero was not shown a single human game of Go from which to learn.<ref name="ref_ed46dbf7" />
# AlphaGo Zero learned entirely from playing against itself, with no prior knowledge of the game.<ref name="ref_ed46dbf7" />
# After just three days of playing against itself (4.9 million times), AlphaGo Zero beat AlphaGo by 100 games to 0.<ref name="ref_ed46dbf7" />
# DeepMind has shaken the world of Reinforcement Learning and Go with its creation AlphaGo, and later AlphaGo Zero.<ref name="ref_08505231">[https://dylandjian.github.io/alphago-zero/ AlphaGo Zero demystified]</ref>
# AlphaGo Zero pipeline is divided into three main components (just like the previous article on World Models), each in a different process that runs the code asynchronously.<ref name="ref_08505231" />
# This is also why AlphaGo Zero is sometimes called the two headed beast : a body, which is the feature extractor, and two heads : policy and value.<ref name="ref_08505231" />
# I want to thank my school’s AI association for letting me use the server to try to train this implementation of AlphaGo Zero.<ref name="ref_08505231" />
# Using less computing power and only three days of training time, AlphaGo Zero beat the original AlphaGo in a 100-game match by 100 to 0.<ref name="ref_02da8cfa">[https://qbi.uq.edu.au/blog/2017/10/google-alphago-zero-masters-game-three-days Google AlphaGo Zero masters the game in three days]</ref>
# By contrast, AlphaGo Zero never saw humans play.<ref name="ref_02da8cfa" />
# AlphaGo Zero achieved this feat by approaching the problem differently from the original AlphaGo.<ref name="ref_02da8cfa" />
# In the pure reinforcement learning approach of AlphaGo Zero, the only information available to learn policies and values was for it to predict who might ultimately win.<ref name="ref_02da8cfa" />
# That Go-playing virtual intelligence was called AlphaGo Zero, and it managed to rediscover over 3,000 years of human knowledge around the game in just 72 hours.<ref name="ref_00ad3329">[https://techcrunch.com/2017/11/02/deepmind-has-yet-to-find-out-how-smart-its-alphago-zero-ai-could-be/ DeepMind has yet to find out how smart its AlphaGo Zero AI could be – TechCrunch]</ref>
# Perhaps the most interesting thing about AlphaGo Zero, though, isn’t how fast it was able to do what it did, or with such efficacy, but also that it ultimately didn’t even achieve its full potential.<ref name="ref_00ad3329" />
# Named AlphaGo Zero, the AI program has been hailed as a major advance because it mastered the ancient Chinese board game from scratch, and with no human help beyond being told the rules.<ref name="ref_683fd64d">[https://www.theguardian.com/science/2017/oct/18/its-able-to-create-knowledge-itself-google-unveils-ai-learns-all-on-its-own 'It's able to create knowledge itself': Google unveils AI that learns on its own]</ref>
# Twitter Pinterest David Silver describes how the Go playing AI program, AlphaGo Zero, discovers new knowledge from scratch.<ref name="ref_683fd64d" />
# When AlphaGo Zero plays a good move, it is more likely to be rewarded with a win.<ref name="ref_683fd64d" />
# Though far better than previous versions, AlphaGo Zero is a simpler program and mastered the game faster despite training on less data and running on a smaller computer.<ref name="ref_683fd64d" />
# In this way, AlphaGo Zero begins as a clean slate and learns from itself.<ref name="ref_26a5d85a">[https://www.technologynetworks.com/neuroscience/articles/deepminds-alphago-zero-learns-from-scratch-without-any-human-input-293412 New AI Learns From Scratch]</ref>
# Revealed in October 2017, AlphaGo Zero was the first computer program that learns to play simply by playing games against itself, starting from completely random play.<ref name="ref_daefbd94">[http://www.xinhuanet.com/english/2018-04/09/c_137097436.htm Feature: One man's Go program looks to remake AlphaGo Zero - and beyond - Xinhua]</ref>
# Point being: AlphaGo Zero (which we’ll go ahead and shorten to AG0) is arguably the most impressive and definitely the most praised recent AI accomplishment.<ref name="ref_6d4649f9">[https://www.skynettoday.com/editorials/is-alphago-zero-overrated AlphaGo Zero Is Not A Sign of Imminent Human-Level AI]</ref>
# Roughly speaking, AG0 is just a Deep Neural Network that takes the current state of a Go board as input, and outputs a Go move.<ref name="ref_6d4649f9" />
# With those positive things having been said, some perspective: AG0 is not really a testament to the usefulness of such techniques for solving the hard problems of AI.<ref name="ref_6d4649f9" />
# AG0 is a definite example of Weak AI, also known as narrow AI.<ref name="ref_6d4649f9" />
# AlphaGo Zero scored 20-0.<ref name="ref_b8b9e0b4">[https://homepages.cwi.nl/~aeb/go/games/games/AlphaGo/ AlphaGo]</ref>
# AlphaGo Zero scored 17-3.<ref name="ref_b8b9e0b4" />
===소스===
<references />

@@ 97번째 줄: / 97번째 줄: @@
   <references />
-== 메타데이터 ==
+==메타데이터==
 ===위키데이터===
 * ID :  [https://www.wikidata.org/wiki/Q42259287 Q42259287]

← 이전 판		2020년 12월 26일 (토) 12:26 판
96번째 줄:		96번째 줄:
	===소스===		===소스===
	<references />		<references />
		+
		+	== 메타데이터 ==
		+
		+	===위키데이터===
		+	* ID : [https://www.wikidata.org/wiki/Q42259287 Q42259287]

알파고 제로 - 편집 역사

2021년 2월 17일 (수) 08:27에 Pythagoras0님의 편집

Pythagoras0: /* 메타데이터 */ 새 문단

Pythagoras0: /* 노트 */ 새 문단