Upload
yandex
View
68.835
Download
6
Embed Size (px)
Citation preview
Linux
?
3
?
?
?
?
MySQL MongoDB?
?
4
x86_64
Linux Kernel 2.6.32
5
(resident memory) , (RAM).
(anonymous memory) (without backing store).
Page fault (trap) . .
6
, .
4KB.
Huge Pages 2MB ( ).
7
page
page
page
page
page
0x0
0xFFFFFFFF
4KB
page
8
RAM
Swap
;
;
;
.
Paging/swapping
vercommit
:
sysctl vm.overcommit_memory 0 (default), 1, 2
sysctl vm.overcommit_ratio / vm.overcommit_kbytes
overcommit:
# cat /proc/meminfo
CommitLimit: 32973320 kB Committed_AS: 5510988 kB
9
NUMA SMP(UMA)
10
CPU 1
System Bus
:
# numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 node 0 size: 32735 MB node 0 free: 434 MB node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 node 1 size: 32768 MB node 1 free: 101 MB node distances: node 0 1 0: 10 21 1: 21 10
interconnect
mem bus mem bus
SMP
NUMA
CPU 2
CPU 1 CPU 2
RAM 1 RAM 2
RAM 1 RAM 2
NUMA
:
# numactl --interleave all command
11
Node 1 Node 2
56GB
30GB
Memory Nodes
Memory Zones
- , .
ZONE_DMA
ZONE_DMA32
ZONE_NORMAL
:
# grep zone /proc/zoneinfo
Node 0, zone DMA Node 0, zone DMA32 Node 0, zone Normal Node 1, zone Normal
12
Page Cache
.
Page Cache.
:
# free -m total used free shared buffers cached Mem: 64401 64101 299 0 161 60339 -/+ buffers/cache: 3600 60800Swap: 0 0 0
# grep Cached /proc/meminfo Cached: 61638200 kB
13
Read Page Cache
14
Disk Storage
read() syscall
Page Cache
no, miss
yes
Page Cache.
.
mincore Page Cache.
vmtouch Page Cache:
# vmtouch /var/lib/db/index Files: 1 Directories: 0 Resident Pages: 21365/21365 83M/83M 100% Elapsed: 0.004477 seconds
hit
Write Page Cache Page Cache ( open() c O_SYNC).
(dirty).
(writeback):
vm.dirty_expire_centisecs (fsflush/pdflush);
(kswapd);
fsync() msync();
(vm.dirty_ratio ). # grep Dirty /proc/meminfo Dirty: 9604 kB 15
:
stack; mmap; heap; bss; init data; text.
16
Stack (grows downwards)
Text (program code)
Initialized data
Uninitialized data (bss)
Heap (grows upwards)
unallocated memoryprogram break
(brk)
top of stack
mmap region
RLIMIT_STACK
ps
top
cat /proc//status VmPeak: 8908 kB VmSize: 8908 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 356 kB VmRSS: 356 kB VmData: 180 kB VmStk: 136 kB VmExe: 44 kB VmLib: 1884 kB VmPTE: 36 kB VmSwap: 0 kB
17
Virtual Memory Area (VMA)
(virtual memory area VMA) ( 08048000-0804c000).
:
(r);
(w);
(e).
:
(p);
(s).
18
VMA
:
# pmap -x
Address RSS Dirty Mode Mapping 00007f0356b23000 76 76 rwx-- [ anon ] 00007f0356b38000 392 392 rwx-- [ anon ]00007f0356bb9000 34708 0 r-xs- some_mapped_file00007f0359272000 21876 0 r-xs- some_mapped_file2
VMA :
# cat /proc//maps
:
# cat /proc//smaps
19
20
Private Shared
Anonymous stack malloc() mmap(ANON, PRIVATE) brk()/sbrk()
mmap(ANON, SHARED)
File-backed mmap(fd, PRIVATE) binary/shared libraries mmap(fd, SHARED)
malloc() free()
glibc malloc() :
heap (128KB);
mmap() .
free() .
malloc() brk()
22
Heap (grows upwards)
program break (brk)
unallocated memory
Heap (grows upwards)
new program break
(brk)unallocated memory
1. 2.
110 KB100 KB
heap brk(), heap.
mmap() munmap()
23
mmap area
/var/lib/db/index
mmap(fd, )
mmap() .munmap() .
mmap()
:
MAP_PRIVATE ;
MAP_SHARED .
:
PROT_READ;
PROT_WRITE.
24
Linux .
25
Page fault (demand paging)
26
Allocated and mapped memory
Only allocated
Unallocated
Address space of a process
Pagewrite syscall
Page Table
MMU
TLB
translate to physical
RAMpage fault
Pagepage mapping
Minor Page Fault .
Page Fault
Minor ;
major ;
invalid (segmentation fault).
27
Page fault
:
1. Unallocated;
2. Allocated, but unmapped (not yet faulted);
3. Allocated, and mapped to main memory (RAM);
4. Allocated, and mapped to the physical swap device (disk);
:
RSS 3- ;
Virtual Memory Size : 2 + 3 + 4.
28
Copy On Write (COW)
29
#0
#2
#1
free#3
#0
#1
#2
#3#4
Real Memory
free#4
#0
#1
#2
#3#4
Parent Child
1. fork().
#0
#2
#1
change#3
change
#1
#2
#3#4
Real Memory
free#4
#0
#1
#2
#3#4
Parent Child
2. .
30
malloc()
31
free
read(fd, buf, 8192)
Kernelfree
freefree
/bin/ls
find
Page Cache
Heap pages
1. /var/m.log. 2. .
miss
m.log#0free
/bin/ls
Page Cache
libc.so
3. .
free
m.log#1
filledfilled
Heap
4. user space
Kernel
KernelDisk
Storage
libc.so
malloc()
.
user space CPU .
32
mmap
33
#0#1
m.log#0free
/bin/ls
Page Cache
libc.so
m.log#1
mmap area
mmap()
Page Cache.
#2
mmap minor page fault
34
#0#1
m.log#0free
/bin/ls
Page Cache
libc.so
m.log#1
mmap area
mmap()
#2
m.log#2
, Page Cache.
minor page fault
mmap major page fault (1)
35
#0#1
m.log#0free
/bin/ls
Page Cache
libc.so
m.log#1
mmap area
mmap()
#2
free
, Page Cache
major page fault
m.log#0free
/bin/ls
Page Cache
libc.so
m.log#1
m.log#2
Disk Storage
1. Page Cache major page fault.
2. .
mmap major page fault (2)
36
#0#1
m.log#0free
/bin/ls
Page Cache
libc.so
m.log#1
mmap area
mmap()
#2
m.log#2
3. Page Cache.
mmap()
37
.
Lazy loading.
.
.
.
sar
-B: paging statistics:
02:46:04 pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff02:46:05 0,00 134,00 1743,00 0,00 5978,00 0,00 0,00 0,00 0,0002:46:06 0,00 108,00 9094,00 0,00 11801,00 0,00 0,00 0,00 0,00
-r: memory utilization: 02:41:50 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact02:41:51 346644 65599996 99,47 191340 61669768 5410704 8,20 34115072 2938446402:41:52 345900 65600740 99,48 191340 61669956 5410596 8,20 34114568 29384568
-R: memory statistics:
02:44:50 frmpg/s bufpg/s campg/s 02:44:51 393,00 4,00 45,00 02:44:52 -200,00 1,00 35,00
38
Page Cache1. Page Cache:
open(fd, O_DIRECT) ( MySQL InnoDB).
2. , :
posix_fadvide(fd, POSIX_FADV_DONTNEED);
madvise(addr, MADV_DONTNEED);
mincore().
3. vmtouch ( posix_fadvide):
vmtouch -e /var/lib/db/index
39
readahead
readahead :
readahead();
madvise();
posix_fadvise();
blockdev --reportblockdev --setra .
40
(page reclaiming)
:
unreclaimable;
swappable;
syncable;
discardable.
41
free list
42
Free page list
Memory request
Page Cache Swap (kswapd) Kernel memory (slab allocator)
OOM Killer
vm.swappiness0 100
swap aggressivelyswap only to avoid an OOM
Page Scanning (kswapd)
43
min pages
high pages
low pages
background
synchronous
time
size ofavailable
free memory
vm.min_free_kbytes
LRU/2
44
Active List
Inactive Listhead tail
headtail
free page
Free List
referenced
referenced
tailhead
page allocation
free pages
reclaim
LRU
45
memory Node Zone cgroup (kernel 3.3):
Active anon;
Inactive anon;
Active file;
Inactive file;
Unevictable.
File backend LRU .
# cat /proc/meminfo Active: 32714084 kB Inactive: 30755444 kB Active(anon): 1612548 kB Inactive(anon): 264 kB Active(file): 31101536 kB Inactive(file): 30755180 kB
Out Of Memory Killer (OOM)
:
grep -i kill /var/log/messages*
(-16 15, -17 ):
echo -17 > /proc//oom_adj
pid:
cat /proc//oom_score 0
46
Memory cgroup :
;
+ swap;
OOM;
swappiness.
:
# cat memory.stat inactive_anon 0 active_anon 0 inactive_file 0 active_file 0 unevictable 0
47
Cgroup page reclaiming
Global reclaiming.
Target reclaiming.
48
49
Systems Performance: Enterprise and the Cloud
Linux Kernel DevelopmentLinux System Programming: Talking Directly to the Kernel and C Library
!