Chakrabarti alpha go analysis

AlphaGo Analysis from Deep Learning Perspec6ve

Chayan Chakrabar6 July 11, 2016 Pleasanton, CA

Mastering the game of GO

•  DeepMind problem domain •  Deep learning and reinforcement learning concepts

•  Design of AlphaGo •  Execu6on

GO: perfect informa6on game

All possible GO boards = 250150 > Number of atoms in the universe

Reduce search space

•  Reduce breadth – Not all moves are equally likely – Some moves are bePer – Leverage moves made by expert players

•  Reduce depth – Evaluate strength of board (likelihood of winning) – Collapse symmetrical or similar boards – Simulate the games

Monte Carlo tree search

Supervised learning using neural networks

Convolu6onal neural networks

Encode local or spa6al features

Reinforcement learning Reinforcement"Learning""

State:" St

Reward"(Feedback):"Rt

AcIon:"At

•  Feedback"is"delayed."•  No"supervisor,"only"a"reward"signal."•  Rules"of"the"game"are"unknown."•  Agent’s"acIons"affect"the"subsequent"state"

Agent"Environment"

Determinis6c policy

Stochas6c policy

Value: expected long term reward

Monte Carlo tree search combined with deep neural networks AlphaGo

neural networks

normal MCTS

AlphaGO schema6c architecture

AlphaGo neural networks

selectionevaluation evaluation

Reducing breadth of moves

Predic6ng the move 1.*Reducing*“action*candidates”

(1) Imitating+expert+moves+(supervised+learning)

Expert$Moves$Imitator$Model(w/$CNN)

Current$Board Next$Action

Training:

1.*Reducing*“action*candidates”

Training:

Prediction$Model

0 0+ 0 0 0+ 0 0 0 00 0+ 0 0 0 1 0 0 00 H1 0 0 1 H1 1 0 00 1 0 0 1 H1 0 0 00 0+ 0 0 H1 0 0 0 00 0+ 0 0 0+ 0 0 0 00 H1 0 0 0+ 0 0 0 00 0+ 0 0 0+ 0 0 0 0

0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0

s g:$s! p(a|s) p(a|s) aargmax

Training:

Two kinds of policies

● used a large database of online expert games

● learned two versions of the neural network

○ a fast network Pᶢ for use in evaluation

○ an accurate network Pᶥ for use in selection

Step 1: learn to predict human movesCS63 topic

neural networksweek 7, 14?

Further reduce search space Symmetries"

Input"RotaIon""90"degrees"

RotaIon""180"degrees"

RotaIon""270"degrees"

VerIcal"reflecIon"

Reduce depth by board evalua6on

2.*Board*Evaluation

Updated$Modelver 1,000,000

Board$Position

Training:

Win$/$Loss

Win(0~1)

Value$Prediction$Model

(Regression)

Adds$a regression$layer$to$the$modelPredicts$values$between$0~1Close$to$1:$a$good$board$positionClose$to$0:$a$bad$board$position

2.*Board*Evaluation

Board$Position

Training:

Win$/$Loss

Win(0~1)

(Regression)

Adds$a regression$layer$to$the$modelPredicts$values$between$0~1Close$to$1:$a$good$board$positionClose$to$0:$a$bad$board$position2.*Board*Evaluation

Board$Position

Training:

Win$/$Loss

Win(0~1)

(Regression)

2.*Board*Evaluation

Board$Position

Training:

Win$/$Loss

Win(0~1)

(Regression)

Value follows from policy Step 3: learn a board evaluation network, Vᶚ

● use random samples from the self-play database

● prediction target: probability that black wins from a given board

PuWng it all together Looking*ahead*(w/*Monte*Carlo*Search*Tree)

Action$Candidates$Reduction(Policy$Network)

Board$Evaluation(Value$Network)

(Rollout):$Faster$version$of$estimating$p(a|s)! uses shallow$networks$(3$ms! 2µs)

Selec6on

Expansion Expansion"

Insert"the"node"for"the"successor"state""""".""s0

Nv(s0, a0) = Nr(s

0, a0) = 0

Wr(s0, a0) = Wv(s

0, a0) = 0

P (s0, a0) = p�(a0|s0)

p�(a0|s0)

If"visit"count"exceed"a"threshold":""""""","Nr(s, a) > nthr

For"every"possible"""""","iniIalize"the"staIsIcs:""""

Evalua6on EvaluaIon"

2" Simulate"the"acIon"by""rollout"policy"network""""""""."p⇡

Evaluate""""""""""""""by"value"network""""""."v✓(s0) v✓

r(sT )

v✓(s0) When"reaching"terminal""""""",""

calculate"the"reward""""""""""""".""sT

r(sT )

Backup

Distribute search through GPUs Distributed"Search""

r(sT )

v✓(s0)

p�(a0|s0)

Main"search"tree"Master"CPU"

Policy"&"value"networks"176"GPUs"

Rollout"policy"networks"1,202"CPUs""

Apply trained networks to tasks with different loss func6on Takeaways

Use+the+networks+trained+for+a+certain+task+(with+different+loss+objectives)+for+several+other+tasks

Single most important takeaway

•  Feature abstrac6on is the key component of any machine learning algorithm

•  Convolu6onal neural networks are great at automated feature abstrac6on

Reference

Silver et. al. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature. 529, 484–489. January 2016.

About the speaker

Chayan Chakrabar6 hPps://www.linkedin.com/in/chayanchakrabar6

Chakrabarti alpha go analysis

Education

Relativistic Quantum Mechanicshome.iitk.ac.in/~dipankar/RelativisticQM.pdf · Relativistic Quantum Mechanics Dipankar Chakrabarti DepartmentofPhysics, IndianInstituteofTechnologyKanpur,

Cronbach's Coefficient Alpha: A Meta-Analysis Study · Cronbach's Coefficient Alpha: A Meta-Analysis Study 19 ISSN: 1300-5340 ek olarak güvenirliği etkilediği düşünülen farklı

Alpha Helix A NH Alpha Helix A N

Vendimia Restaurant-Puerto Nuevo, Puerto RicoChateau Cabezac Alexis, Langudos Chile Montes Alpha, Pinot Noir Montes Alpha, Merlot Montes Alpha, Cabernet Cabo de Homos Montes Alpha

DENOMINACIÓN DE LA ASIGNATURA - UCO · 2019-06-26 · - "Advanced Structural Analysis", Devdas Menon. Alpha Science International. 2009. Las estrategias metodológicas y el sistema

Alpha Particle Spectroscopy - utoledo.eduastro1.panet.utoledo.edu/~relling2/teach/4780/20100913...Alpha Particle Spectroscopy • Alpha particle source –alpha decay • Context –understanding

Relação entre a Alpha-fetoproteína, Doença Hepática ... Ana... · carcinoma. (2) It was intended to make a qualitative analysis of the alpha-fetoprotein, by ways that it could

uco.es/idep/ · 2018-06-18 · - "Advanced Structural Analysis",€Devdas Menon.€Alpha Science International. 2009. Las estrategias metodológicas y el sistema de evaluación contempladas

The Phoenix of Alpha Sigma Alpha; Summer 2012

ALPHA-THEORY: AN ELEMENTARY AXIOMATICS FORdinasso/papers/14.pdf · ALPHA-THEORY: AN ELEMENTARY AXIOMATICS FOR NONSTANDARD ANALYSIS VIERI BENCI AND MAURO DI NASSO Abstract. The methods

2/21/13 1:11 PM Formatted: Alpha Kappa Alpha Sorority ...€¦ · Alpha Kappa Alpha Sorority, Incorporated Theta Chapter Chapter Bylaws Columbus, OH ... Alpha Kappa Alpha Sorority,

lampiran - core.ac.uk · Paket : Seri Program Statistik (SPS-2000) Modul : Statistik Butir (Items Analysis) Program : Uji-Keandalan Teknik Alpha Cronbach . Edisi : Sutrisno Hadi dan

Inside a quaking star: asteroseismic analysis of alpha Centauri Jørgen Christensen-Dalsgaard Institut for Fysik og Astronomi, Aarhus Universitet Teresa

alpha IQ – WITTENSTEIN alpha

ALPHA PRO, ALPHA +, SOLAR UP, UPD, UPS, UPSD ...udmsnab.ru/files/UPS_91830027_0707.pdfОбщие сведения Alpha Pro, Alpha +, UPS серия 100 8 Назначение Циркуляционные

Alpha Phi Omega Alpha Gamma Chapter 80th Anniversary Alpha Phi Omega Alpha Gamma Chapter Purdue University 80th Anniversary Banquet 1932 – 2012 Our 80

WHAT IS JINI - IIT-Computer Sciencecs550/Projects/November_20th/Chakrabarti... · Web viewCOMPARATIVE OPERATING SYSTEMS FALL 2001 PALKI CHAKRABARTI ID # 999-29-0211 Email: chakpal@iit.edu

PowerPoint Presentation · Biswapati Maiti Pen and ink on paper 14”X9.5” 60,000 Debabrata Chakrabarti Acrylic on canvas 24”X20” 40,000 Debabrata Chakrabarti

การใช้ซอฟต์แวร์ Open source (alpha miner) วิเคราะห์ตะกร้าสินค้า (market basket analysis)

The California Alpha-Beta Chapter of Sigma Alpha Epsilon