114
שם המצגת אהרון שילה| מנכ" ל| די בי סי אס בע" מ( DBCS )

Sql Explore Hebrew

  • Upload
    ashilo

  • View
    729

  • Download
    0

Embed Size (px)

DESCRIPTION

Sql Server Code Name \'Denali\' HADR (High Availability Desaster Recovery)

Citation preview

Page 1: Sql Explore   Hebrew

שם המצגת(DBCS)מ "בע די בי סי אס| ל"מנכ| אהרון שילה

Page 2: Sql Explore   Hebrew

קצת עלי

DBA -מי אני

3+ נשוי •

שנים בתחום 10למעלה מ•

Sql Serverבטכנולוגיות PROמוסמך •

Oracle-ו

ון ברייס 'ומוביל תחום בג CTOלשעבר •

הדרכה

העוסקת בתחום DBCSל חברת "מנכ•

, פונטיס, ס"הלמ, ל"בזק בינ -יועץ ל•

.ועוד galcomm ,storenext, טרפילוג

Page 3: Sql Explore   Hebrew

• Introduction to High Availability in SQL Server: Hardware and software solutions

• Features and techniques comparison – Log Shipping

– Database Mirroring

– Replication

– Database Snapshots

– Backup improvements

– Online operations

• HADR deep dive: How to implement the next generation of high availability and disaster recovery solution with SQL Server

Page 4: Sql Explore   Hebrew

Introduction to High Availability

and Disaster Recovery

• Definitions

– Introduce key terms and concepts

• Business Continuity Planning

– Overview of the BCP process

• SQL Server High Availability Planning

– How does BCP apply to SQL Server availability?

Page 5: Sql Explore   Hebrew

High Availability and Disaster

Recovery: Definition

• High Availability

• High availability is a system design protocol and associated implementation that ensures a certain absolute degree of operational continuity during a given measurement period

• Availability defined in terms of service level agreements (SLA)

– Recovery Time

– Data loss during unplanned downtime

• A highly available application should be accessible by users x% of the time

• Disaster Recovery

• Processes and procedures

designed to restore business

operations due to a natural or

human-induced disaster

– Typically involves providing

redundancy spanning multiple

sites or across geographic

regions

Page 6: Sql Explore   Hebrew

Defining x and SLA

• Recovery Time Objective (RTO) guided by availability requirements

– How much downtime can you tolerate?

• Recovery Point Objective (RPO) guided by criticality of application data

– How much data can you lose?

Availability

Class

Acceptable Downtime (hrs/yr) OR RTO

Acceptable Data Loss (time of last copy) OR RPO

Tier 1 >99.99%

(1 hr or less)

5 min or less

Tier 2 99.9% - 99.99% (1-8.5 hrs)

5 mins to 8.5 hrs

Tier 3 (<99.9%)

(Hours to days)

Hours to days

Tier1

RTO

RPO

Page 7: Sql Explore   Hebrew

Protection Levels

• Protection against resource failures

– Machine

– Database Corruption

– Disk

• Location Redundancy

– Building

– < 10 miles

Local HA

Regional DR

Geographic DR

Protection against

Network Outages

Site Failures

Location Redundancy

– City, County

– < 100-200 miles

Protection against

Natural Disasters

Location Redundancy

– State, Country

– > 100-200 miles

Page 8: Sql Explore   Hebrew

Business Continuity Planning

Analysis

Solution Design

Implementation

Testing

Maintenance

• Impact Analysis

– Critical Functions

– Threat Identification

– Recovery Objectives

• Solution Design

– Achieve recovery objectives for relevant threats within specified constraints like budget, human resources etc

– Cost\Benefit analysis of solutions

• Implementation

– Deploy the recommended solution

• Testing

– Test to see if the solution meets the recovery requirements

• Maintenance

– Yearly testing and review of procedures

Page 9: Sql Explore   Hebrew

SQL Server High Availability Planning

• Analysis

– Application tiers serviced by the databases

– Causes of database downtime

– Protection levels: Local HA, Regional DR, Geographic DR

• Solution Design

– Need to understand what solutions exists?

– What are the characteristics and cost of the solution?

• Implementation

– What are the deployment steps and best practices?

• Testing

– How do I test my implementation?

• Maintenance

– How do I monitor and maintain the solution?

Analysis

Solution Design

ImplementationTesting

Maintenance

Page 10: Sql Explore   Hebrew

Database Downtime Drivers

Database Downtime

Unplanned Downtime

Failure

Protection

User Errors

Planned Downtime

Online

Administration

Predictable Resourcing

Analysis

Page 11: Sql Explore   Hebrew

Solution Design

Solution Architecture

HA Capabilities

Limitations and Caveats

Cost Vector

• Understand the

solutions and

choices before

making a decision

Solution Design

Page 12: Sql Explore   Hebrew

SQL Server

Always On Technologies Solution Design

Page 13: Sql Explore   Hebrew

Always On Technologies

• Provides a full

range of options to

minimize downtime

and maintain

appropriate levels

of application

availability

Solution Design

• Backup and Restore

• Log Shipping

• Database Mirroring

• Failover Clustering

• Peer-Peer Replication

Increases Availability

• Online Index Operations

• Table Partitioning

• Enhanced Locking

• Resource Governor

• Database Snapshot

• Dedicated Admin Connection

• Dynamic Configuration

Decreased Downtime

Page 14: Sql Explore   Hebrew

Always On Technology Overview

• Architecture Overview– How does it work?

• Solution Characteristics– Data Loss Guarantees

– Failover Characteristics

– Redundancy Levels and Utilization

– Cost

– Limitations and Caveats

Solution Design

• Backup and Restore

• Log Shipping

• Database Mirroring

• Failover Clustering

• Peer-Peer Replication

Increases Availability

Page 15: Sql Explore   Hebrew

What’s New in SQL Server 2008

• New Features

• Resource Governor

– Manage SQL Server

workloads and resources

by specifying limits on

resource consumption

• Backup Compression

– Reduce backup and restore

time

• Feature Enhancements

• Database Mirroring

– Automatic recovery from

page corruption

– Log stream compression

– Faster recovery on failover

• Log Shipping

– Sub-Minute Log Shipping

– Backup compression

• Failover Clustering

– 16 nodes

– Rolling upgrade

• Peer-Peer Replication

– Hot add new nodes

Page 16: Sql Explore   Hebrew

Backup & restore

Page 17: Sql Explore   Hebrew

Backup and Restore

• Base availability technology for any solution– Protects against failures and recovery from errors

– Provides Local HA and Site DR

• Need to ensure the backups are accessible if site goes down

– High RTO due to restore time

– RPO=0 can never be guaranteed

• Types: Full, Differential, and Transaction Log– File-group backup/restore for large databases

• Backup Compression provides faster and smaller backups in SQL Server 2008

Solution Design

Page 18: Sql Explore   Hebrew

Enhanced Error Detection

• In SQL Server 2000 RESTORE

VERIFYONLY does not guarantee that the

backup is good

– Data may be corrupt

• In SQL Server 2005 RESTORE

VERIFYONLY checks everything

– Ensures that the data is correct

Page 19: Sql Explore   Hebrew

Database Checksums

• SQL Server 2000 had TornPageDetection to detect incomplete I/O Operations by power failures

• SQL Server 2005 adds checksums to data pages– Header of every page contains a checksum value

– When reading page, it re-computes checksum and compares with checksum stored

– Returns error (824) if difference found

– Detects errors not reported by I/O Subsystem

Page 20: Sql Explore   Hebrew

Backup Checksums

• Detect errors introduced by backup hardware but not reported by hardware or operating system – Backup media error detection

– Backup devices do not always detect errors

– Works with • RESTORE

• RESTORE VERIFYONLY

• Restore also checks page checksums, if present – Disk error detection on data pages prior to backup

• Can continue past errors if desired

Page 21: Sql Explore   Hebrew

Backup Compression

• Common questions:– ―How much compression will I see?‖

– ―Will it be comparable to, say, SQL Litespeed?‖

• One simple answer: ―It depends!‖

• All data compresses differently – the compression ratio achieved depends on:

– The type of data in the database

– Whether the data in the database is already compressed

– Whether the data/database is encrypted

• ―We saw an 85 percent

reduction in file size using

SQL Server 2008 Backup

Compression,‖ says Colin

Neller, Senior Software

Engineer at ServiceU and

part of the company‘s SQL

Server 2008 implementation

team. ―A backup file that was

previously over 300 GB is

now only 40 GB, and the job

runs in about half the time.‖

Page 22: Sql Explore   Hebrew

Backup Compression: Backup

Performance

• Backup of a 322 MB Adventureworks database

A LOT more CPU used (avg 25%) BUT runtime = 21.6s (45% improvement) and backup stored in 76.7MB (4.2x compression ratio)

Hardly any CPU used (avg 5%), runtime = 39.5s, compression ratio of 0.

Un

com

presse

d

Co

mp

ressed

Page 23: Sql Explore   Hebrew

DEMO

Page 24: Sql Explore   Hebrew

DATABASE SNAPSHOTS

Page 25: Sql Explore   Hebrew

Database Snapshots

• Read-only, consistent view of a

database

– Specified point-in-time

• Modifying data

– Copy-on-write of affected pages

• Reading data

– Accesses snapshot if data has

changed

– Redirected to original database

otherwise12:00 Snapshot

Page

Page

Page 26: Sql Explore   Hebrew

Using Database Snapshot to Recover

Data

Scenario Example Code / Steps

Undeleting

rows

Undoing

an update

Recovering

a dropped

object

1 Script the object in the database snapshot

INSERT INTO Production.WorkOrderRouting

SELECT * FROM

AdventureWorks_dbsnapshot_1800.Prod.WorkOrderRouting

UPDATE HR.Department

SET Name = ( SELECT Name FROM

AdventureWorks_dbsnapshot_1800.HR.Department

WHERE DepartmentID = 1)

WHERE DepartmentID = 1

Caution: Not a substitute for a comprehensive backup and restore strategy

Execute the script in the source database2

Repopulate the object (if appropriate)3

Page 27: Sql Explore   Hebrew

DEMO

Page 28: Sql Explore   Hebrew

Log Shipping

Page 29: Sql Explore   Hebrew

Log Shipping

• Automated transaction log backup and restore provides redundancy at the database level

• SQLLogship.exe provides the underlying framework for doing automated backup, copy and restore

– Backup on primary instance

– Restore on secondary instance(s)

• Scheduling is done through SQL Server Agent jobs

– SQL Server 2008 provides sub-minute scheduling interval providing the ability to do quick backup and restores

• No automatic failover capabilities

Solution Design

Page 30: Sql Explore   Hebrew

Log Shipping (Key terms)

• Primary Server:

– Contains your primary database.

– SQL Server Agent makes periodic transaction log

backups to capture changes.

• Secondary Server

– Contain an unrecovered copy of the production

database.

– One standby server can contain standby databases

from multiple primary servers.

Page 31: Sql Explore   Hebrew

Log Shipping (Key terms) cont…

• Monitor Server (Optional)

– Monitors the status of the log-shipping jobs on the

primary and each standby server.

– One monitoring server can monitor multiple primary-

standby server pairs.

– Should use a server other than the primary or the

standby to detect problems on either server.

Page 32: Sql Explore   Hebrew

Log Shipping

Copy

Copy

Copy

Perform

Backups

Copy and

Restore

Backups

Raise

Alerts

Primary Database

Monitor Database

Secondary Database

Secondary Database

Secondary Database

Copy and

Restore

Backups

Copy and

Restore

Backups

Page 33: Sql Explore   Hebrew

Strength & weakness

• Strengths

– Can Ship Logs Across WAN (Wide-Area Network)

– Protects an Entire Database

• Weaknesses

– Configured Per Database

– NO AUTOMATIC FAILOVER

Page 34: Sql Explore   Hebrew

DEMO

Page 35: Sql Explore   Hebrew

Mirroring

Page 36: Sql Explore   Hebrew

Database Mirroring

• A database level high availability solution that provides complete protection against data loss and fast recovery through automatic failover

• Maintains a redundant database by shipping log blocks when the transactions are committed on the principal

• Synchronous and Asynchronous modes provide the spectrum of options to choose between availability and performance

• Automatic failover when using witness server

Solution Design

Page 37: Sql Explore   Hebrew

Database Mirroring Modes

• High-Availability Mode– Safety Full; Synchronous operation– Database is available whenever a quorum exists– Automatic failover

• High-Protection Mode– Safety Full; Synchronous operation– No witness – quorum provided by partners– If Principal loses quorum, it stops servicing the database

• Ensures high protection; database is never in ‗exposed‘ state

– Manual failover only; no automatic failover– A transition mode; should not be in this mode for long

• High-Performance Mode– Safety Off; Asynchronous operation– Manual failover only

• Supports only one form of role switching: forced service (with possible data loss)

Page 38: Sql Explore   Hebrew

Database Mirroring How it works

MirrorPrincipal

Witness

Log

Application

SQL Server SQL Server

2

2

4

51

Data DataLog

3>2 >3

Mirror is always redoing – it remains current

Commit

Page 39: Sql Explore   Hebrew

Principal

Witness

DataLog

Mirror

Data Log

1. Bad Page

DetectedX

2. Request page

3. Find page

4. Retrieve page

5. Transfer page6. Write

Page

DBM – Automatic Page Recovery

Client

Page 40: Sql Explore   Hebrew

Database Mirroring Enhancements

• Enhancements in SQL 2008

– Compression of stream data for which at least a 12.5

percent compression ratio can be achieved.

– Automatic Recovery from Corrupted Pages.

– Page read-ahead during the undo phase.

– Improved use of log send buffers.

Page 41: Sql Explore   Hebrew

Strength & Weakness

• Strengths

– Can Mirror Across WAN

– Automatic Failover, and Nearly Instantaneous, Better

than Failover Clustering

– Protects an Entire Database

• Weaknesses

– Requires Enterprise Edition

– Must be Configured Per Database

Page 42: Sql Explore   Hebrew

DEMO

Page 43: Sql Explore   Hebrew

Replication

Page 44: Sql Explore   Hebrew

Replication

• Primarily used where availability is required in conjunction with scale out of read activity

• Failover possible; a custom solution

• Not limited to entire database; Can define subset of source database or tables

• Copy of database is continuously accessible for read activity

• Latency between source and copy can be as low as seconds

Page 45: Sql Explore   Hebrew

Transactional Replication

• A high performance data replication solution that provides granular table level replication

– Logical data movement provides flexibility and better hardware utilization

• Key scenarios:

– Customized application-specific DR

– Real-time reporting on secondary server that be used for Site DR

– Scale out application queries with ability to use any one database copy for Site DR

• Two types relevant for HA and DR

– Transactional and Peer-to-Peer

Solution Design

Page 46: Sql Explore   Hebrew

Peer-to-Peer Replication

• Provides high availability and read scalability

• Builds redundancy by eliminating single point of failure

• Enable online upgrades of servers

• Maximize Application Uptime

• Support for both Ring and Grid Topology

• Centralized Management using Management Studio

Peer Node

Peer Node

Peer Node

Peer Node

Page 47: Sql Explore   Hebrew

New Features

Application Server

Load BalancingRead

Write

Replicated

Data

User

Requests

Page 48: Sql Explore   Hebrew

Strength & Weakness

• Strengths

– Perpetual or on-demand replication of data, local or

remote

– Protects (duplicates or merges) the exact portion of the

database I want

• Weaknesses

– Configured per database, even per table

– Generally does not protect or duplicate an entire

Database

Page 49: Sql Explore   Hebrew

DEMO

Page 50: Sql Explore   Hebrew

FailOver Custering

Page 51: Sql Explore   Hebrew

Failover Clustering

• Instance level protection built on Windows Failover Clustering shared disk model– Cluster nodes typically co-located within the

same site to provide local HA

– Regional DR possible using VLAN and stretch storage level replication

• No built in data redundancy like database mirroring and log shipping– Data protection has to be provided at the

storage level or by combining with other solutions

Solution Design

Page 52: Sql Explore   Hebrew

Virtual Server

Node 2

Node 3

Shared Disk

Node 1

Failover Clustering

Page 53: Sql Explore   Hebrew

• Supports many scenarios: • Single Instance

• Multiple Instance

• Multiple Active Nodes

• N+1

• N+M

N+1: N Active, 1 Inactive

Nodes

Inst2 *

* Inst1

Multiple Active Nodes

Inst2 *

* Inst1

Inst3 *

N+M: N Active, M Inactive

Nodes

Failover Cluster

* Inst1

SQL Server Cluster Topologies

Page 54: Sql Explore   Hebrew

Failover Clustering (Facts)

• Redundancy at database instance level– All databases fail over together

– Shared copy of system databases

• Single data copy on shared storage device– No I/O overhead reducing throughput

– Storage unit is single point of failure for cluster

• All database services are clustered– SQL Agent; Analysis Services; Full-Text engine, MS DTC

• Automatic failover (up to minutes)

• DBMS accessed over virtual IP

• Storage is controlled by one cluster node at a time

• Requires hardware certified by Microsoft for Microsoft Cluster Service

Page 55: Sql Explore   Hebrew

Strength & Weakness

• Strengths

– Provides Protection Against a Node Failure, Protects

the Entire SQL Instance

– Automatic Failover Supported

• Weaknesses

– Generally Expensive, Requires Specialty Hardware

– Specialty Hardware Requirements

– Not Trivial to Configure and Manage

– Doesn‘t Protect Against a Complete

Site Failure

Page 56: Sql Explore   Hebrew

DEMO

Page 57: Sql Explore   Hebrew

Best Practices

• Backup your system databases after

modifications.

• Test if backups are restorable.

• Practice / Test your disaster recovery plans.

• Documentation is not only for you.

• Keep dedicated DR Server ready.

• Use BACKUP CHECKSUM features.

• Run DBCC CHECKDB regularly.

• Don‘t ignore any runtime errors.

Page 58: Sql Explore   Hebrew

What Solution Is Best For US ?

Page 59: Sql Explore   Hebrew

Always On Solution

Characteristics

No Data Loss(RPO=0)

Failover Unit AutoFailover(RTO)

Inst DB Tab

+ **

Read Mult-iple

Write

*

*

*

Solutions

Log Shipping

DBM Sync

Async

Cluster

TransactionalReplication

Peer-PeerReplication

RPO FailoverRedundancy and

Utilization

Hard-ware

App PerfImpact

Manag-eability

Low Low Low

Low High Low

Low Low Low

High*** Low *** Low***

Low Low High

Low Low High

Cost

Solution Design

* Database Mirroring and Log Shipping can provide point in time read capability using STANDBY or database snapshots respectively

** Database Mirroring provides fastest failover to hot secondary*** Depends on SAN technology

Page 60: Sql Explore   Hebrew

Recap

• Application availability requirements

or SLA drive primary solution choices

– RPO and RTO are the key metrics

used to define the SLA

• Need mitigation against planned and

unplanned downtimes

• Multiple solution choices that

provides varying cost\benefits

• Other requirements apart from

application SLA factor into the choice

• Understand constraints and tradeoffs

you can make

Database Mirroring

Clustering

Log Shipping

Peer-PeerReplication

Application Availability

Unplanned downtime

Planned Downtime

Solution Design

Page 61: Sql Explore   Hebrew

Always On Solution

Characteristics

No Data Loss(RPO=0)

Failover Unit AutoFailover(RTO)

Inst DB Tab

+ **

Read Mult-iple

Write

*

*

*

Solutions

Log Shipping

DBM Sync

Async

Cluster

TransactionalReplication

Peer-PeerReplication

RPO FailoverRedundancy and

Utilization

Hard-ware

App PerfImpact

Manag-eability

Low Low Low

Low High Low

Low Low Low

High*** Low *** Low***

Low Low High

Low Low High

Cost

* Database Mirroring and Log Shipping can provide point in time read capability using STANDBY or database snapshots respectively

** Database Mirroring provides fastest failover to hot secondary*** Depends on SAN technology

Solution Design

Page 62: Sql Explore   Hebrew

AdventureWorks Inc Scenario

Adventureworks Inc is a manufacturing company that manufactures and sells bicycles across the world. There are a number of applications, some that are mission critical that run on multiple SQL Server Instances

• The DBA team is run by Darren who is responsible for deploying and managing the application databases. One of his core responsibilities is to ensure availability of all application databases in order to meet the application SLA

• One datacenter located in Omaha

• Three applications

– Manufacturing – Tier 1

– Finance – Tier 2

– Scheduling – Tier 3

• Manufacturing application runs on a dedicated SQL Server 2008 Instance

– All other applications run on a second instance

• Availability of manufacturing application is critical

• Implement a solution at the lowest possible cost

Solution Design

Page 63: Sql Explore   Hebrew

Application Requirements

• Manufacturing application has strict SLA‘s

• Finance application requires readability on the secondary

– The reports are run every 4 hours and need to be fresh as of the

last one hour. To offload the reporting load from the main system

they would like to utilize the mirror

Data LossRPO=0

RTO insecs

Failover Unit AutoFailover

Inst DB Tab

Read Multiple Sites

ReadWrite

Applications

Manufacturing

Finance

Scheduling

Solution Design

Page 64: Sql Explore   Hebrew

Solution Choice for Manufacturing

Application

• Clustering can provide a zero data loss solution that can also provide fast instance level failover

• Use RAID configuration to provide data redundancy on the SAN

• If a redundant copy is required that can provide instance failover with zero

data loss use SAN replication– High Cost Solution

• Use synchronous database mirroring if instance failover is not needed

Solutions Data LossRPO=0

FastRTO

Failover Unit AutoFailover

Read > 1Sites\Copy

Read Write

Inst DB Tab

Cluster

SAN Replication

DBM - Sync

Log Shipping

TransactionalReplication

Peer-PeerReplication

Clustering with RAID

DBM - Async

Solution Design

Page 65: Sql Explore   Hebrew

For database level redundancy with acceptable data loss with minimal perf impact, asynchronous database mirroring is an optimal choice

Use database snapshots at periodic intervals to provide a readable snapshot of the data for reporting

Low cost solution

DBM - Async

Cluster

Solution Choice for Finance

ApplicationSolutions Data Loss

RPO=0FastRTO

Failover Unit AutoFailover

Read > 1Sites\Copy

Read Write

Inst DB Tab

SAN Replication

Log Shipping

TransactionalReplication

Peer-PeerReplication

Async Database Mirroring

Omaha Datacenter

Finance

Db Snapshotevery hour

Reports

Scheduling

DBM - Sync

Solution Design

Page 66: Sql Explore   Hebrew

Adding a Regional Datacenter Into

the Mix

• Regulatory and compliance requirements drive the need for having a additional datacenter within a 10 mile radius to provide redundancy against site level failure.– It is now required that all applications have the ability

to failover to the regional datacenter across the river in Council Bluff

• The SLA need to be maintained for tier 1 applications even in the case of site failures

Solution Design

Page 67: Sql Explore   Hebrew

Regional Site Solution

Choices

Async Database Mirroring

Omaha Datacenter

Finance

Db Snapshotevery hour

Reports

Scheduling

Sync Mirroringno witness

Log Shipping

CB Datacenter

Cluster with SAN

Manufacturing

Solution Design

Page 68: Sql Explore   Hebrew

A Complete Topology

• Considering the potential of floods and tornadoes destroying the regional data centers, Adventureworks Inc wants to maintain a disaster recovery site in San Antonio, TX

• The disaster recovery site has lower SLA requirements for all applications– The manufacturing application can have an RPO of 1

hour

– The RTO is set at 4 hours

Solution Design

Page 69: Sql Explore   Hebrew

Topology Diagram

Sync MirroringNo witness

Cluster with SAN

Log Shipping

Manufacturing

Solution Design

Page 70: Sql Explore   Hebrew

Scale Out and Availability

Scenario• Adventureworks is building

a new web based order

management system that

allows customers from all

over the world access the

system and place orders

• The core group of

customers are in Western

Europe, South East Asia

and North America

Requirements

– Geo Redundancy

– Data Locality

– High Availability

– Local Read-Scale

Workload Characteristics

– Mainly reads

– Few writes

Application Characteristics

– Each user logging in connects to a particular server Partitioned based on user-id and region

Writes from a user always happen on one server regardless of the region the user log in from

– All reads redirected to the closest geo-location Reasonable tolerance for latency (5-10 minutes)

Solution Design

Page 71: Sql Explore   Hebrew

Replication Topology

Peer Nodes

Read-Only Servers

Asia1 Asia2

Solution Design

Page 72: Sql Explore   Hebrew

Licensing Facts

• Passive servers are mirror, log shipped secondary and clustering passive node

• No license required on passive if it is truly passive

• A passive server does not need a license if the number of processors in the passive server is equal to or less than the number of processors in the active server.

• The passive server can take the duties of the active server for 30 days. Afterwards, it must be licensed accordingly.

Page 73: Sql Explore   Hebrew

HA Features Edition SupportFeature Express Workgroup Standard Enterprise Comments

Database Mirroring

1

Advanced high availability solution that includes fast failover and automatic client redirection

Failover Clustering 2

Backup Log-shipping

Data backup and recovery solution

Online System Changes

Includes Hot Add Memory, dedicated administrative connection, and other online operations

Online Indexing

Online Restore

Fast RecoveryDatabase available when undo operations begin

₁Single thread redo₂ Limited to 2 node cluster

Page 74: Sql Explore   Hebrew

Summary

• There is no ―one size fits all‖ solution

• Consider the cost\benefits\constraints and compare that to availability requirements of the organization to determine the best solution

• Use the charts to understand cost, benefit and constraints of the various SQL Server High Availability solutions

• TEST the solution to ensure it can meet the availability requirements and meet SLA‘s

Page 75: Sql Explore   Hebrew

•question & answer

Page 76: Sql Explore   Hebrew

SQL Server AlwaysOn:

Mission Critical Capabilities in SQL

Server “Denali”

• Jon Jahren

• Exec VP, Prediktor

[email protected]

Page 77: Sql Explore   Hebrew

High Availability and Disaster

Recovery

SQL Server “Denali” AlwaysOn

• Faster failover, easier administration with Availability Groups

• Identify databases to failover as a unit to reduce unplanned downtime

• Faster application failover using virtual name

• Increase application uptime using flexible failover policy

• Enable better data redundancy and protection with up to four secondaries and up to two synchronous secondaries

• Limited downtime with enhanced online operations

• Run Microsoft SQL Server® on Windows Server® Core to reduce planned downtime (50-60% fewer OS patch reboots)

A

A

Disaster Recovery

Shared Storage

A

Non-Shared Storage

A

A

A

AA A

Page 78: Sql Explore   Hebrew

Maximize Resources

Higher return on high availability investments

• Increase hardware utilization through active secondaries for backups, reporting, and ad hoc queries

• Reuse existing infrastructure with support for both SAN and direct attached storage

Simplify management and administration

• Integrated manageability for one-stop configuration

• Easy setup and monitoring integrated into Microsoft SQL Server Management Studio

• Availability Groups that provide failover units with contained dependencies (such as logons)

Page 79: Sql Explore   Hebrew

Breakthrough Performance and Scale

• Dramatically faster star-join query processing—

much faster than current SQL Server (~10X)

• Query speed increase varies with query and data

• Reduced I/O

• Consistent query performance

• Reduced performance tuning effort

80

11001010010100101001110101100101001

Page 80: Sql Explore   Hebrew

Mission Critical High Availability Solution

81

Meets

mission

critical high

availability

SLA

Integrated EfficientFlexible

Microsoft recommended prescriptive HA solutions and

customer references

Page 81: Sql Explore   Hebrew

Introducing SQL Server

AlwaysOnIntegrated, Flexible, Efficient high Availability for

mission critical business

Multisite Clustering

Flexible Failover Policy

Improved Diagnostics

Built for consolidation scenarios

Multi-Database Failover

Multiple Secondaries

Active Secondaries

Integrated HA Management

AlwaysOn Availability Groupsfor database protection

AlwaysOn Failover Cluster Instancesfor instance level protection

AlwaysOn provides database level and instance level protection

A high availability platform built for the future

Page 82: Sql Explore   Hebrew

AlwaysOn – A flexible solution

AlwaysOn provides the flexibility of different HA

configurations

83

Synchronous Data Movement

Asynchcronous Data Movement

Shared Storage, regional and geo secondaries

A

A

A

A

A

Direct attached storage local, regional and geo target

AA

Page 83: Sql Explore   Hebrew
Page 84: Sql Explore   Hebrew

AlwaysOn Availability Groups

AlwaysOn Availability Groups is a new feature that enhances and

combines database mirroring and log shipping capabilities

Multi-database failover

Multiple secondaries

Total of 4 secondaries

2 synchronous

secondaries

1 automatic failover pair

Synchronous and

asynchronous

data movement

Built in compression

and encryption

Automatic and manual

failover

Flexible failover policy

Application

failover using

virtual name

Configuration

Wizard

Dashboard

System Center

Integration

Rich diagnostic

infrastructure

File-stream

replication

Replication

Flexible Integrated Efficient

Active

Secondary

Readable

Secondary

Backup from

Secondary

Automation

using power-

shell

Page 85: Sql Explore   Hebrew

Availability Groups Virtual Name

Availability Groups Virtual Name allow applications to failover seamlessly to any secondary

– Application reconnects using a virtual name after a failover to a secondary

AG_HR

HRDB

HR

DB

Primary Secondary

HR_VNN

-server HR_VNN;-catalog HRDB

Application retry during failover

Connect to new primary once

failover is complete

and the virtual name is online

PrimarySecondarySecondary

HR

DB

ServerA ServerB ServerC

Page 86: Sql Explore   Hebrew

Backward Compatible

Page 87: Sql Explore   Hebrew

What about Server Objects?

Introducing Contained Databases or CDB‘s Unit of application

programmability in DenaliA DB which establishes a

boundary between application and server

CDBs sever the user–login relationshipWindows users no longer need

matching logins

Users with passwords replace SQL logins

Authentication information moves with the CDB

Page 88: Sql Explore   Hebrew

Databases are not always easy to move

Master MSDB

Temp

Master MSDB

User DBTempDB

Instance CollationLogins

CredentialsLinked Server Defs.

CLR…

AgentReplication

DB Mail…

…TempDB Collation

Other AppsOther DBs

User DBUser DBUser DB

Page 89: Sql Explore   Hebrew

Introducing the Contained Database

• New database option – CONTAINMENT• Only option supported in Denali is PARTIAL meaning,

non-enforaced containment

• Partially contained databases solve problems related to:

• Logins: Database Users with passwords or mapped directly to Windows principles

• System Collation: Temp tables use the database‘s collation

• sys.dm_db_uncontained_entities will display all potential containment breaches

Page 90: Sql Explore   Hebrew

Availability Group Architecture

Inter-node health detection,

Failover coordination,

Primary health detection,

Distributed data store for

settings and state,

Distributed change

notifications

Windows Server Failover Cluster

WSFC Common Microsoft Availability

Platform

SQL Server AlwaysOn Failover cluster

instances

SQL Server AlwaysOn Availability Group

Microsoft Hyper-V

Microsoft Exchange

Built-in WSFC workloads (e.g. file share,

NLB, etc) and third party workloads

DatabaseActive Log Synchronization

DatabaseActive Log Synchronization

Availability Group uses Windows

Server Failover Cluster (WSFC) for

Page 91: Sql Explore   Hebrew

AlwaysOn Availability Group

Instance Preparation

1. Install WSFC on each machine and create a single WSFC cluster

2. Install SQL Server Instances on each machine

3. Enable AlwaysOn through SQL Configuration Manager

4. CREATE ENDPOINT on each instance

• Notes:

– Steps 1 and 2 can occur in any order (except for AlwaysOn Failover Cluster Instance (FCI) installation which of course requires WSFC installed)

Page 92: Sql Explore   Hebrew

WSFC Cluster vs. SQL Server “Cluster”

Setup

• Install WSFC feature

• Setup WSFC cluster

• Configure SAN and Shared Disks

• Install SQL Server Failover Cluster Instance (FCI):

– Specify resource group

– Select shared disks

– Configure virtual IPs

– Configure virtual network names

– Specify domain accounts for services

– Configure domain groups*

Page 93: Sql Explore   Hebrew

Simplified WSFC Cluster setup in Windows

Server 2008+

Page 94: Sql Explore   Hebrew

Availability Group Concepts Recap

• Availability Group– Defines the high availability requirements

• Databases, Replicas, Availability Mode, Failover Mode etc

• Availability Replica– SQL Server Instances that are part of the

availability group which hosts the physical copy of the database

– Role: Primary, Secondary, Resolving

• Availability Database– SQL Server database that is part of an

availability group

– This can be a regular database or contained database

Page 95: Sql Explore   Hebrew

WSFC Service

SQL Server Instance

AG Res DLL

WSFC Service

SQL Server Instance

AG Res DLL

WSFC Service

SQL Server Instance

AG Res DLL

Availability Group 1

Availability Group 2

User tells SQL to failover Availability Group 2 to Node1

SQL confirms

and tells

WSFC

WSFC tells

AG Res DLL

to bring AG2

offline

WSFC tells

AG Res DLL

to bring AG2

online

Notification

of new

primary

Notification

of new

primary

Secondaries request

primary connection

Clients disconnected from AG2

Client connections transparently redirected

to primary via IP and network name

resources

Availability Group Architecture Drilldown

Page 96: Sql Explore   Hebrew

Active Secondary – Making Secondary

Readable

Readable secondary allow offloading read queries to secondary

Close to real-time data, latency of log synchronization impact data freshness

Read applications can reconnect to another secondary on failover

Not a replacement for replication scenarios

DB2DB1

SQLservr.exe SQLservr.exe

InstanceA

DB2DB1

Primary Secondary

InstanceB

Reports

PrimarySecondary

Reports

Page 97: Sql Explore   Hebrew

Active Secondary: Enabling Backup

On SecondaryR/W workload

Primary

Backups

Secondary

Backups

Secondary

Backups

Backups can be done on any replica of a database

Secondary replica may be synchronous or asynchronous

Backups on primary replica still works

Log backups done on all replicas form a single log chain

Recovery Advisor makes restores simple

Page 98: Sql Explore   Hebrew

Log

Cache

Log

Cache

Readable Secondary Latency

• Updated data is visible on the readable secondary as and when the page is redoneRedo happens asynchronously after log hardening on the secondary

DB1

DB1 Log

DB1

Primary Secondary

Log

CaptureLog

Apply

DB1

Data

Network

Redo

Thread

Redo

Pages

DB1 LogDB1

DataPage

Updated

Log

HardenLog

Flush

Commit

Acknowledge

Commit

Page 99: Sql Explore   Hebrew

Readable Secondary Behavior

• Contention between redo thread and query thread avoided by – Internally mapping read workload to non blocking isolation levels

• Read Uncommitted Snapshot Isolation

• Read Committed Snapshot Isolation

• Repeatable Read Snapshot Isolation

• Serializable Snapshot Isolation

– Ignore all locking hints

• Maintains query performance on secondary compared to primary– Auto-create statistics on the secondary replica but persist them in

TempDB

Page 100: Sql Explore   Hebrew

Providing Instance Availability

Page 101: Sql Explore   Hebrew

Flexible Failover Policy• Eliminates false failover• Configurable failure

condition levels• Better diagnostics

Native support for multi-site clustering across subnets enable DR using failover cluster instances

SMB support enables consolidation of more than 26 instances

Support TEMPDB on local drive

Key Enhancements

Fast instance failover through predictable database recovery time

Page 102: Sql Explore   Hebrew

AlwaysOn Failover Cluster Instance

• AlwaysOn Failover Cluster Instance provides instance level failover

• Key Enhancements– Multi-site clustering across subnets

– Flexible Failover Policy

– Improved system diagnostics

– Support for network attached storage (NAS) using SMB

– Support for tempdb on local drive

Page 103: Sql Explore   Hebrew

Multi-Site Clustering

• Multi-site clustering provides protection from site failures

• AlwaysOn Failover Cluster Instance natively supports multi-site clustering without requiring V-LAN

– Each site can have separate IP subnet

– DNS entry updated to reflect current IP address on failover

Page 104: Sql Explore   Hebrew

Flexible Failover Policy

• FailureConditionLevel (0 to 5):– 5 – Failover or restart on any qualified failure

conditions

– 4 – Failover or restart on moderate SQL Server errors

– 3 – Failover or restart on critical SQL Server errors

– 2 – Failover or restart on SQL Server unresponsive

– 1 – Failover or restart on SQL Server down

– 0 – No Automatic Failover or restart

• Diagnostics returned regardless of FailureConditionLevel

• All levels optimized to minimize false failures

WSFC Service

SQL Server Failover

Cluster Instance

FCI Res DLL

IsAlive /LooksAlive

WSFC asks Res

DLL if

SQL FCI alive

exec sp_server_diagnostics

Diagnostics generated

for Health State

Components

• System

• Resource

• Query Processing

• IO Subsystem

• Events

Diagnostics

(periodically returned)

User sets new Cluster properties

HealthCheckTimeout and FailureConditionLevel

IsAlive/ LooksAlive result based on diagnostics and FailureConditionLevel

Page 105: Sql Explore   Hebrew

Reducing Planned Downtime

Support for Windows Server Core Reduce OS patching by as much as 50-60%

Support for rolling upgrade and patching of SQL Server for both Availability Groups and Failover Cluster Instance

Fast failover time for both Availability Groups and Failover Cluster Instances

New online operations supported LOB Index

Adding of column with default

Page 106: Sql Explore   Hebrew

AlwaysOn Solution Guidance

Page 107: Sql Explore   Hebrew

Flexible Solution Choices

AlwaysOn Availability

Groups

AlwaysOn Failover Cluster

Instances

AlwaysOn Multi-site Failover Cluster

Instances

Optionally combine with Availability Groupsfor DR

Page 108: Sql Explore   Hebrew

Virtualization with AlwaysOn Guidance

Virtualization provides best consolidation isolation

Virtualization without AlwaysOn:Simplest management story for limited HA/DR:

When to use AlwaysOn for the guest:Need better HA/DR protection than standalone VM

Planned Unplanned

Host Live Migration VM failover (OS restart)

Guest Downtime during patch

No protection from virtualization

Page 109: Sql Explore   Hebrew

Available Now – CTP1• SQL Server Code Name Denali CTP1 is now public

• CTP1 has the following feature set that you can test and provide feedback– AlwaysOn Failover Cluster Instance Features are RTM Quality:

• Multi-Subnet Failover

• Flexible Failover Policy

– AlwaysOn Availability Groups Preview• Ability to configure availability groups through T-SQL, SSMS, and PowerShell

• Multiple databases support in availability groups

• Read-only access to the secondary

• Support for Filestream data type

• Manually failing over and resynchronizing without reseeding

• Failing over client connections using the new connectivity story based on virtual network names and virtual IP addresses

• Including logins in user databases through a Contained Database

• SSMS, Catalog Views, and DMVs to view and monitor state

• Support for multiple availability groups on the same instance

• Support for availability groups on standalone instances and/or failover cluster instances

Page 110: Sql Explore   Hebrew

Conclusion

• SQL Server AlwaysOn is a comprehensive high availability solution

– Better application availability,

– Higher return on investment and

– Simplified deployment and management

• AlwaysOn Availability Group and AlwaysOn Failover Cluster Instance provide flexibility in HA configuration

• Windows Server Core support significantly reduces downtime due to patching

• SQL Server AlwaysOn Availability Group– Multi-database failover

– Multiple secondaries

– Synchronous and asynchronous data movement

– Built in compression and encryption

– Automatic and manual Failover

– Flexible failover policy

– Automatic Page Repair

– Readable secondary

– Secondary backup

– Automatic application redirection using virtual name

– Configuration Wizard

– AlwaysOn Dashboard

– System Center Integration

– Automation using power-shell

– Rich diagnostic infrastructure

• SQL Server AlwaysOn Failover Cluster Instance

– Multi-site clustering across subnets

– Flexible Failover Policy

– Improved system diagnostics

– Support for network attached storage (NAS) using SMB

– Support for tempdb on local drive

Page 112: Sql Explore   Hebrew
Page 113: Sql Explore   Hebrew
Page 114: Sql Explore   Hebrew