50

Как Linux работает с памятью — Вячеслав Бирюков

  • Upload
    yandex

  • View
    68.835

  • Download
    6

Embed Size (px)

Citation preview

  • Linux

  • ?

    3

  • ?

    ?

    ?

    ?

    MySQL MongoDB?

    ?

    4

  • x86_64

    Linux Kernel 2.6.32

    5

  • (resident memory) , (RAM).

    (anonymous memory) (without backing store).

    Page fault (trap) . .

    6

  • , .

    4KB.

    Huge Pages 2MB ( ).

    7

    page

    page

    page

    page

    page

    0x0

    0xFFFFFFFF

    4KB

    page

  • 8

    RAM

    Swap

    ;

    ;

    ;

    .

    Paging/swapping

  • vercommit

    :

    sysctl vm.overcommit_memory 0 (default), 1, 2

    sysctl vm.overcommit_ratio / vm.overcommit_kbytes

    overcommit:

    # cat /proc/meminfo

    CommitLimit: 32973320 kB Committed_AS: 5510988 kB

    9

  • NUMA SMP(UMA)

    10

    CPU 1

    System Bus

    :

    # numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 node 0 size: 32735 MB node 0 free: 434 MB node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 node 1 size: 32768 MB node 1 free: 101 MB node distances: node 0 1 0: 10 21 1: 21 10

    interconnect

    mem bus mem bus

    SMP

    NUMA

    CPU 2

    CPU 1 CPU 2

    RAM 1 RAM 2

    RAM 1 RAM 2

  • NUMA

    :

    # numactl --interleave all command

    11

    Node 1 Node 2

    56GB

    30GB

    Memory Nodes

  • Memory Zones

    - , .

    ZONE_DMA

    ZONE_DMA32

    ZONE_NORMAL

    :

    # grep zone /proc/zoneinfo

    Node 0, zone DMA Node 0, zone DMA32 Node 0, zone Normal Node 1, zone Normal

    12

  • Page Cache

    .

    Page Cache.

    :

    # free -m total used free shared buffers cached Mem: 64401 64101 299 0 161 60339 -/+ buffers/cache: 3600 60800Swap: 0 0 0

    # grep Cached /proc/meminfo Cached: 61638200 kB

    13

  • Read Page Cache

    14

    Disk Storage

    read() syscall

    Page Cache

    no, miss

    yes

    Page Cache.

    .

    mincore Page Cache.

    vmtouch Page Cache:

    # vmtouch /var/lib/db/index Files: 1 Directories: 0 Resident Pages: 21365/21365 83M/83M 100% Elapsed: 0.004477 seconds

    hit

  • Write Page Cache Page Cache ( open() c O_SYNC).

    (dirty).

    (writeback):

    vm.dirty_expire_centisecs (fsflush/pdflush);

    (kswapd);

    fsync() msync();

    (vm.dirty_ratio ). # grep Dirty /proc/meminfo Dirty: 9604 kB 15

  • :

    stack; mmap; heap; bss; init data; text.

    16

    Stack (grows downwards)

    Text (program code)

    Initialized data

    Uninitialized data (bss)

    Heap (grows upwards)

    unallocated memoryprogram break

    (brk)

    top of stack

    mmap region

    RLIMIT_STACK

  • ps

    top

    cat /proc//status VmPeak: 8908 kB VmSize: 8908 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 356 kB VmRSS: 356 kB VmData: 180 kB VmStk: 136 kB VmExe: 44 kB VmLib: 1884 kB VmPTE: 36 kB VmSwap: 0 kB

    17

  • Virtual Memory Area (VMA)

    (virtual memory area VMA) ( 08048000-0804c000).

    :

    (r);

    (w);

    (e).

    :

    (p);

    (s).

    18

  • VMA

    :

    # pmap -x

    Address RSS Dirty Mode Mapping 00007f0356b23000 76 76 rwx-- [ anon ] 00007f0356b38000 392 392 rwx-- [ anon ]00007f0356bb9000 34708 0 r-xs- some_mapped_file00007f0359272000 21876 0 r-xs- some_mapped_file2

    VMA :

    # cat /proc//maps

    :

    # cat /proc//smaps

    19

  • 20

    Private Shared

    Anonymous stack malloc() mmap(ANON, PRIVATE) brk()/sbrk()

    mmap(ANON, SHARED)

    File-backed mmap(fd, PRIVATE) binary/shared libraries mmap(fd, SHARED)

  • malloc() free()

    glibc malloc() :

    heap (128KB);

    mmap() .

    free() .

  • malloc() brk()

    22

    Heap (grows upwards)

    program break (brk)

    unallocated memory

    Heap (grows upwards)

    new program break

    (brk)unallocated memory

    1. 2.

    110 KB100 KB

    heap brk(), heap.

  • mmap() munmap()

    23

    mmap area

    /var/lib/db/index

    mmap(fd, )

    mmap() .munmap() .

  • mmap()

    :

    MAP_PRIVATE ;

    MAP_SHARED .

    :

    PROT_READ;

    PROT_WRITE.

    24

  • Linux .

    25

  • Page fault (demand paging)

    26

    Allocated and mapped memory

    Only allocated

    Unallocated

    Address space of a process

    Pagewrite syscall

    Page Table

    MMU

    TLB

    translate to physical

    RAMpage fault

    Pagepage mapping

    Minor Page Fault .

  • Page Fault

    Minor ;

    major ;

    invalid (segmentation fault).

    27

  • Page fault

    :

    1. Unallocated;

    2. Allocated, but unmapped (not yet faulted);

    3. Allocated, and mapped to main memory (RAM);

    4. Allocated, and mapped to the physical swap device (disk);

    :

    RSS 3- ;

    Virtual Memory Size : 2 + 3 + 4.

    28

  • Copy On Write (COW)

    29

    #0

    #2

    #1

    free#3

    #0

    #1

    #2

    #3#4

    Real Memory

    free#4

    #0

    #1

    #2

    #3#4

    Parent Child

    1. fork().

    #0

    #2

    #1

    change#3

    change

    #1

    #2

    #3#4

    Real Memory

    free#4

    #0

    #1

    #2

    #3#4

    Parent Child

    2. .

  • 30

  • malloc()

    31

    free

    read(fd, buf, 8192)

    Kernelfree

    freefree

    /bin/ls

    find

    Page Cache

    Heap pages

    1. /var/m.log. 2. .

    miss

    m.log#0free

    /bin/ls

    Page Cache

    libc.so

    3. .

    free

    m.log#1

    filledfilled

    Heap

    4. user space

    Kernel

    KernelDisk

    Storage

    libc.so

  • malloc()

    .

    user space CPU .

    32

  • mmap

    33

    #0#1

    m.log#0free

    /bin/ls

    Page Cache

    libc.so

    m.log#1

    mmap area

    mmap()

    Page Cache.

    #2

  • mmap minor page fault

    34

    #0#1

    m.log#0free

    /bin/ls

    Page Cache

    libc.so

    m.log#1

    mmap area

    mmap()

    #2

    m.log#2

    , Page Cache.

    minor page fault

  • mmap major page fault (1)

    35

    #0#1

    m.log#0free

    /bin/ls

    Page Cache

    libc.so

    m.log#1

    mmap area

    mmap()

    #2

    free

    , Page Cache

    major page fault

    m.log#0free

    /bin/ls

    Page Cache

    libc.so

    m.log#1

    m.log#2

    Disk Storage

    1. Page Cache major page fault.

    2. .

  • mmap major page fault (2)

    36

    #0#1

    m.log#0free

    /bin/ls

    Page Cache

    libc.so

    m.log#1

    mmap area

    mmap()

    #2

    m.log#2

    3. Page Cache.

  • mmap()

    37

    .

    Lazy loading.

    .

    .

    .

  • sar

    -B: paging statistics:

    02:46:04 pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff02:46:05 0,00 134,00 1743,00 0,00 5978,00 0,00 0,00 0,00 0,0002:46:06 0,00 108,00 9094,00 0,00 11801,00 0,00 0,00 0,00 0,00

    -r: memory utilization: 02:41:50 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact02:41:51 346644 65599996 99,47 191340 61669768 5410704 8,20 34115072 2938446402:41:52 345900 65600740 99,48 191340 61669956 5410596 8,20 34114568 29384568

    -R: memory statistics:

    02:44:50 frmpg/s bufpg/s campg/s 02:44:51 393,00 4,00 45,00 02:44:52 -200,00 1,00 35,00

    38

  • Page Cache1. Page Cache:

    open(fd, O_DIRECT) ( MySQL InnoDB).

    2. , :

    posix_fadvide(fd, POSIX_FADV_DONTNEED);

    madvise(addr, MADV_DONTNEED);

    mincore().

    3. vmtouch ( posix_fadvide):

    vmtouch -e /var/lib/db/index

    39

  • readahead

    readahead :

    readahead();

    madvise();

    posix_fadvise();

    blockdev --reportblockdev --setra .

    40

  • (page reclaiming)

    :

    unreclaimable;

    swappable;

    syncable;

    discardable.

    41

  • free list

    42

    Free page list

    Memory request

    Page Cache Swap (kswapd) Kernel memory (slab allocator)

    OOM Killer

    vm.swappiness0 100

    swap aggressivelyswap only to avoid an OOM

  • Page Scanning (kswapd)

    43

    min pages

    high pages

    low pages

    background

    synchronous

    time

    size ofavailable

    free memory

    vm.min_free_kbytes

  • LRU/2

    44

    Active List

    Inactive Listhead tail

    headtail

    free page

    Free List

    referenced

    referenced

    tailhead

    page allocation

    free pages

    reclaim

  • LRU

    45

    memory Node Zone cgroup (kernel 3.3):

    Active anon;

    Inactive anon;

    Active file;

    Inactive file;

    Unevictable.

    File backend LRU .

    # cat /proc/meminfo Active: 32714084 kB Inactive: 30755444 kB Active(anon): 1612548 kB Inactive(anon): 264 kB Active(file): 31101536 kB Inactive(file): 30755180 kB

  • Out Of Memory Killer (OOM)

    :

    grep -i kill /var/log/messages*

    (-16 15, -17 ):

    echo -17 > /proc//oom_adj

    pid:

    cat /proc//oom_score 0

    46

  • Memory cgroup :

    ;

    + swap;

    OOM;

    swappiness.

    :

    # cat memory.stat inactive_anon 0 active_anon 0 inactive_file 0 active_file 0 unevictable 0

    47

  • Cgroup page reclaiming

    Global reclaiming.

    Target reclaiming.

    48

  • 49

    Systems Performance: Enterprise and the Cloud

    Linux Kernel DevelopmentLinux System Programming: Talking Directly to the Kernel and C Library

  • !