50
© Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0 翁岩 Pivotal Greenplum研发经 [email protected] Greenplum on PKS Containerized MPP Database 1

Greenplum on PKSbos.itdks.com/f1d141e2968f4586a4970fd62e4d0137.pdf · GREENPLUM PLATFORM MULTI- STRUCTURED DATA SOURCES & PIPELINES Storage Structured Data JDBC, ODBC SQL ANSI SQL

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • © Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0

    翁岩⻘青 Pivotal Greenplum研发经理理 [email protected]

    Greenplum on PKS Containerized MPP Database

    1

  • Cover w/ Image

    Agenda

    ■  Greenplum Architecture

    ■  Greenplum Data Platform

    ■  Kubernetes on PCF

    ■  Greenplum on Kubernetes

    ■  Q+A

    2

  • OPEN SOURCE MASSIVELY PARALLEL PROCESSING

    DATA WAREHOUSE

    WHAT IS GREENPLUM?

    3

  • Greenplum = Massively Parallel Postgres for Analytics

    Standby Master

    Master Host

    SQL

    Interconnect

    Segment Host

    Node1

    Segment Host

    Node2

    Segment Host

    Node3

    Segment Host

    NodeN

    Local Storage

    Other RDBMSes Spark GemFire

    Cloud Object

    Storage HDFS Kafka ETL

    Spring Cloud

    Data Flow

    Master Servers Query planning and dispatch

    Segment Servers Query processing and data storage

    Interconnect

    External Sources & Pipelines Parallel loading and streaming

  • Greenplum Data Platform

    ANALYTICAL APPLICATIONS

    NATIVE INTERFACES

    PIVOTAL GREENPLUM PLATFORM

    MULTI- STRUCTURED DATA

    SOURCES & PIPELINES

    Structured Data

    JDBC, ODBC

    SQL

    ANSI SQL

    FLEXIBLE DEPLOYMENT

    Local Storage

    Other RDBMSes Spark GemFire

    Cloud Object

    Storage HDFS

    JSON, Apache AVRO, Apache Parquet and XML

    Teradata SQL

    Other DB SQL

    Apache MADlib

    ML/Statistics/Graph

    Python. R, Java, Perl, C

    Programmatic

    Apache SOLR

    Text

    PostGIS

    GeoSpatial

    Custom Apps BI / Reporting Machine Learning AI

    On-Premises

    NEXT GENERATION

    DATA PLATFORM

    Kafka ETL Spring Cloud

    Data Flow

    Massively Parallel (MPP)

    PostgreSQL Kernel

    Petabyte Scale

    Loading

    Query Optimizer

    (GPORCA)

    Workload Manager

    Polymorphic Storage

    Command Center

    SQL Compatibility

    (Hyper-Q)

    DS Analysts IT Dev

    Public Clouds

    Private Clouds

    Fully Managed Clouds

    5

  • Faster Deployments… How?

    6

  • Have you Have you... ●  Ran out of disk space ? ●  Been able to provision more than 100

    postgres instances in few minutes?

    ●  Faced issues in recovering failures ? ●  Faced issues in expanding the database?

    7

    Experienced these before with any database?

  • RELIABLY DEPLOY AND RUN CONTAINERIZED WORKLOADS.

    WHAT IS PKS?

    8

  • Kubernetes on Pivotal Cloud Foundry

    Continuously deliver any app to every major private and public cloud with a single platform.

    9

  • 10

    Faster Deployments… How?

  • Greenplum Data Platform + PKS

    11

  • Kubernetes 101

    Kubernetes Master

    12

  • Kubernetes Master

    kubelet kube-proxy docker

    Node

    kubelet kube-proxy docker

    Node

    13

    Kubernetes 101

  • Kubernetes Master

    Pod

    kubelet kube-proxy docker

    Node

    Pod

    kubelet kube-proxy docker

    Node

    14

    Kubernetes 101

  • Kubernetes Master

    Pod

    kubelet kube-proxy docker

    Node

    Pod

    kubelet kube-proxy docker

    Node

    15

    Kubernetes 101

    Kubectl

  • Kubernetes Master

    Pod

    kubelet kube-proxy docker

    Node

    Pod

    kubelet kube-proxy docker

    Node

    Storage volumes

    16

    Kubernetes 101

  • Kubernetes Master

    Pod

    kubelet kube-proxy docker

    Node

    Pod

    kubelet kube-proxy docker

    Node

    Storage volumes

    17

    Kubernetes 101

  • Kubernetes Master Load Balancer Service

    Node Node

    Storage volumes

    18

    Pod

    kubelet kube-proxy docker

    Node

    Pod

    Node

    Kubernetes 101

  • Greenplum on Kubernetes

    Node

    Pod

    mirror

    kubelet kube-proxy docker

    19

    Greenplum Service

    Pod

    kubelet kube-proxy docker

    standby

    Pod

    kubelet kube-proxy docker

    Node

    Storage volumes

    primary

    Pod

    kubelet kube-proxy docker

    master

  • Benefits

    Greenplum on PKS

    20

  • 1.  On Demand Cluster Provisioning

    PKS

    Radar

    21

  • 1.  On Demand Cluster Provisioning

    PKS

    Radar

    22

  • 1.  On Demand Cluster Provisioning

    PKS

    Radar

    Cluster Radar

    23

  • 1.  On Demand Cluster Provisioning

    PKS

    Radar

    Cluster Radar

    24

  • 1.  On Demand Cluster Provisioning

    PKS

    Radar

    Cluster Radar psql gpdb-radar:5432

    25

  • 1.  On Demand Cluster Provisioning

    PKS

    Radar

    Cluster Radar psql gpdb-radar:5432

    26

    Amy

  • 1.  On Demand Cluster Provisioning

    PKS

    Radar

    Cluster Radar psql gpdb-radar:5432

    27

    Amy

  • 1.  On Demand Cluster Provisioning

    PKS

    Radar

    Cluster Radar psql gpdb-radar:5432

    Cluster Amy

    28

    Amy

  • 1.  On Demand Cluster Provisioning

    PKS

    Radar

    Cluster Radar psql gpdb-radar:5432

    Cluster Amy

    29

    Amy

  • 1.  On Demand Cluster Provisioning

    PKS

    Radar

    Cluster Radar

    Cluster Amy

    psql gpdb-radar:5432

    psql gpdb-amy:5432

    30

    Amy

  • 2. Service Discovery

    master

    Container Pod

    We can always discover a container by DNS. For example, DNS address for different roles:

    master.greenplum.svc.cluster.local standby.greenplum.svc.cluster.local segment-0a.greenplum.svc.cluster.local segment-0b.greenplum.svc.cluster.local

    standby

    Container Pod

    Mirror segment-0b

    Container Pod

    Primary segment-0a

    Container Pod

    31

  • 3. HA without Rebalancing

    seg-0a

    Container Pod

    seg-0b

    Container Pod

    seg-1a

    Container Pod

    seg-1b

    Container Pod

    Primary

    Mirror

    32

  • 3. HA without Rebalancing

    seg-0a

    Container Pod

    seg-0b

    Container Pod

    seg-1a

    Container Pod

    seg-1b

    Container Pod

    Mirror becomes primary And no need for rebalancing.

    Primary

    Mirror

    33

  • 4. Kubernetes Plugins Support : Container Storage Interface

    34

    seg-1b

    Container Pod

    Cloud Storage

  • 4. Kubernetes Plugins Support : Logging

    35

    seg-0b

    Container Pod

    seg-1b

    Container Pod

    syslog / stderr syslog / stderr

    Logging Agent Log Store

  • HEY PKS! GIVE ME A GREENPLUM CLUSTER OF

    “N” SEGMENTS

    GREENPLUM ON PKS DEMO

    36

  • Deploy Greenplum on PKS

    Demo

    37

  • Expand Greenplum on PKS

    Demo

    39

  • 40

  • Greenplum Segment Failover

    Demo

    41

  • 42

  • How to How to ●  Auto segment failover ●  Auto activate standby if master failed ●  Auto expand cluster

    43

    Automate Greenplum operation on PKS

  • Greenplum Operator on Kubernetes

    Greenplum on PKS

    44

  • Kubernetes Operator

    Custom Resource Definitions (CRD) ●  Provides API object framework for a new

    resource

    ●  Used to implement new resource types ○  Data model for Controllers

    Controllers ●  Active reconciliation process

    ○  Current State -> Desired State

    ●  Used to automate app administration ○  {add, extend, replace} cluster

    functionality

  • Kubernetes Operator

    Custom Resource Definitions (CRD)

    Greenplum Custom Resource and Usage

  • Greenplum on Kubernetes

    How greenplum operator works on Kubernetes

    47

  • More Persistent Volumes Custom Scheduler

    Resource Management

    Future Work

    48

  • QUESTIONS?

    49

  • Transforming How The World Builds Software

    © Copyright 2017 Pivotal Software, Inc. All rights Reserved. 50