Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 1
コンピュータアーキテクチャ 演習の補足
情報工学系 吉瀬謙二
Kenji Kise, Department co Computer Sciencekise_at_c.titech.ac.jp
Ver. 2020-10-12a2020年度(令和2年)版
Course number: CSC.T363
Visit our support page https://www.arch.cs.titech.ac.jp/lecture/CA/
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 2
【重要】 ACRiルームのサーバの予約
• ACRiルームのアカウントを使って、次のURLからログインする。
• https://gw.acri.c.titech.ac.jp/
• 「予約ページトップ」から、vs で始まるサーバの 10月13日(火) 12:00~15:00 の枠と、15:00~18:00の枠を予約すること。
• 同じサーバーを連続して予約すること。
• 12:00開始の枠は、30分前の 11:30 までに予約する必要があるので注意。
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 3
UART (Universal Asynchronous Receiver/Transmitter)
• 調歩同期方式によるシリアル信号をパラレル信号に変換したり,その逆方向の変換をおこなう集積回路をUARTと呼ぶ.8ビット(1バイト)単位でデータを送信・受信する.
• UARTを用いることで,FPGAとコンピュータの間でのお手軽なデータ通信が可能.
• 例えば,’a’ という文字を送信する場合,’a’ は 8’h61, 8’b01100001 (次スライドのASCII Tableを参照)なので,下図のタイミングで送信線TXDを制御する.
• データが送信されるまで送信線TXD を1とする.
• まず,青色で示した0 (これをスタートビットと呼ぶ)を送信することで,データ送信の開始を明示.
• 次に,黄色で示した様に送信したいデータ 8’b01100001 の最下位ビットから順番に送信する.
• 最後に,赤色で示した1(これをストップビットと呼ぶ)を送信する.
• 1ビットを送受信するための時間間隔は送信側と受信側で同じレートを用いる.これをボー・レート (baud) と呼ぶ.例えば、1000 baud であれば、1ビット送信の間隔は 1msec となる.
3
0 1 0 0 0 0 1 1 0
Time
1 1
Startbit
TXD
Stopbit
1
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 4
Byte Addresses
• Since 8-bit bytes are so useful, most architectures address individual bytes in memory
• The memory address of a word must be a multiple of 4 (alignment restriction)
• Big Endian:
• Leftmost byte is word address
• IBM 360/370, Motorola 68k, MIPS, SPARC, HP PA
• Little Endian:
• Rightmost byte is word address
• Intel 80x86, DEC Vax, DEC Alpha (Windows NT)
msb lsb
3 2 1 0little endian byte 0
0 1 2 3big endian byte 0
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 5
コンピュータアーキテクチャComputer Architecture
2. コンピュータの性能と消費電力の動向Trends in Performance and Power
Ver. 2020-10-05a2020年度(令和2年)版
Course number: CSC.T363
www.arch.cs.titech.ac.jp/lecture/CA/Tue 14:20 - 16:00, 16:15 - 17:55Fri 14:20 - 16:00
吉瀬 謙二 情報工学系Kenji Kise, Department of Computer Sciencekise _at_ c.titech.ac.jp
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 6
Performance Factors
Want to distinguish elapsed time and the time spent on our task
CPU execution time (CPU time) : time the CPU spends working on a task
Does not include time waiting for I/O or running other programs
CPU execution time # CPU clock cycles
for a program for a program= x clock cycle time
CPU execution time # CPU clock cycles for a program
for a program clock rate = -------------------------------------------
◼ Can improve performance by reducing either the length of the clock cycleor the number of clock cycles required for a program
or
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 7
Performance Factors
Performance = f x IPCf: frequency (clock rate)
IPC: retired instructions per cycle
CPU execution time # CPU clock cycles for a program
for a program clock rate = -------------------------------------------
int flag = 1;
int foo(){
while(flag);
}
Performance = clock rate x 1 / # CPU clock cycles for a program
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 8
Pollack’s Rule
• Pollack's Rule states that microprocessor "performance increase due to microarchitecture advances is roughly proportional to the square root of the increase in complexity". Complexity in this context means processor logic, i.e. its area.
• Superscalar, vector
• Instruction level parallelism, data level parallelism
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 9
From multi-core era to many-core era
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction, MICRO-36
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 10
Power within a Microprocessor
• Powerdynamic = 1/2 x Capacitive load x Voltage 2 x Frequency switched
• Pdynamic = 1/2 x C x V2 x F
• Power required per transistor
• The first 32-bit microprocessors like Intel 80386 consumed less than two watt.
• 3.3GHz Intel Core i7 consumes 130 watts.
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 11
From multi-core era to many-core era
Platform 2015: Intel® Processor and Platform Evolution for the Next Decade, 2005
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 12
Intel Sandy Bridge, January 2011
4 to 8 core
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 13
アーキテクチャの異なる視点による分類
Flynnによる命令とデータの流れに注目した並列計算機
の分類(1966年)
SISD (Single Instruction stream, Single Data stream)
SIMD( Single Instruction stream, Multiple Data stream)
MISD (Multiple Instruction stream, Single Data stream)
MIMD (Multiple Instruction stream, Multiple Data stream)
Instruction stream
Data stream
SISD SIMD MISD MIMD
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 14
アーキテクチャの異なる視点による分類
Flynnによる命令とデータの流れに注目した並列計算機
の分類(1966年)
SISD (Single Instruction stream, Single Data stream)
SIMD( Single Instruction stream, Multiple Data stream)
MISD (Multiple Instruction stream, Single Data stream)
MIMD (Multiple Instruction stream, Multiple Data stream)
MIMD
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 15
コンピュータアーキテクチャComputer Architecture
3. 半導体メモリMemory Technologies
Ver. 2020-10-12a
Course number: CSC.T363
吉瀬 謙二 情報工学系Kenji Kise, Department of Computer Sciencekise _at_ c.titech.ac.jp
2020年度(令和2年)版
www.arch.cs.titech.ac.jp/lecture/CA/Tue 14:20 - 16:00, 16:15 - 17:55Fri 14:20 - 16:00
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 16
参考資料
• MIG を使って DRAM メモリを動かそう (1)
• https://www.acri.c.titech.ac.jp/wordpress/archives/6048
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 17
コンピュータの古典的な要素
出力制御
データパス
記憶
入力
出力
プロセッサ
コンピュータ
インタフェース
コンパイラ
性能の評価
Instruction Set Architecture (ISA), 命令セットアーキテクチャ
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 18
DRAM (dynamic random access memory)
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 19
Processor-Memory(DRAM) Performance Gap
1
10
100
1000
10000
1980
1983
1986
1989
1992
1995
1998
2001
2004
Year
Perf
orm
an
ce
“Moore’s Law”
Processor
55%/year
(2X/1.5yr)
DRAM
7%/year
(2X/10yrs)
Processor-Memory
Performance Gap
(grows 50%/year)
The “Memory Wall”
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 20
The Memory System’s Fact and Goal
Fact: Large memories are slow and fast memories are small
How do we create a memory that gives the illusionof being large, cheap and fast ?
With hierarchy (階層)
With parallelism (並列性)
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 21
A Typical Memory Hierarchy
Second
Level
Cache
(SRAM)
Control
Datapath
Secondary
Memory
(Disk)
On-Chip Components
RegF
ile
Main
Memory
(DRAM)Da
ta
Cache
Instr
Ca
ch
e
ITL
BD
TL
B
Speed (%cycles): ½’s 1’s 10’s 100’s 1,000’s
Size (bytes): 100’s K’s 10K’s M’s G’s to T’s
Cost: highest lowest
❑ By taking advantage of the principle of locality (局所性)
Present much memory in the cheapest technology
at the speed of fastest technology
TLB: Translation Lookaside Buffer
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 22
Cache
• Cache memory consists of a small, fast memory that acts as a buffer for the large memory.
• The nontechnical definition of cache is a safe place for hiding things.
Intel Core 2 Duo
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 23
Intel Sandy Bridge, January 2011
Processor
Main memoryDisk
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 24
Characteristics of the Memory Hierarchy
Increasing
distance from
the processor
in access
time
L1$
L2$
Main Memory
Secondary Memory
Processor
(Relative) size of the memory at each level
Inclusive (包括)–what is in L1$ is a
subset of what is in
L2$.
L2$ is a subset of
what is in MM.
MM is a subset of
is in SM.
4-8 bytes (word)
1 to 4 blocks
1,024+ bytes (disk sectors = page)
8-32 bytes (block)
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 25
Memory Hierarchy Technologies
◼ Caches use SRAM (static random access memory) for speed and technology compatibility◼ Low density (6 transistor cells),
high power, expensive, fast
◼ Static: content will last “forever” (until power turned off)
◼ Main Memory uses DRAM for size (density)◼ High density (1 transistor cells), low power, cheap, slow
◼ Dynamic: needs to be “refreshed” regularly (~ every 8 ms)
◼ 1% to 2% of the active cycles of the DRAM
◼ Addresses divided into 2 halves (row and column)
◼ RAS or Row Access Strobe triggering row decoder
◼ CAS or Column Access Strobe triggering column selector
Dout[15-0]
SRAM
2M x 16
Din[15-0]
Address
Chip select
Output enable
Write enable
16
16
21