Upload
ashilo
View
729
Download
0
Embed Size (px)
DESCRIPTION
Sql Server Code Name \'Denali\' HADR (High Availability Desaster Recovery)
Citation preview
שם המצגת(DBCS)מ "בע די בי סי אס| ל"מנכ| אהרון שילה
קצת עלי
DBA -מי אני
3+ נשוי •
שנים בתחום 10למעלה מ•
Sql Serverבטכנולוגיות PROמוסמך •
Oracle-ו
ון ברייס 'ומוביל תחום בג CTOלשעבר •
הדרכה
העוסקת בתחום DBCSל חברת "מנכ•
, פונטיס, ס"הלמ, ל"בזק בינ -יועץ ל•
.ועוד galcomm ,storenext, טרפילוג
• Introduction to High Availability in SQL Server: Hardware and software solutions
• Features and techniques comparison – Log Shipping
– Database Mirroring
– Replication
– Database Snapshots
– Backup improvements
– Online operations
• HADR deep dive: How to implement the next generation of high availability and disaster recovery solution with SQL Server
Introduction to High Availability
and Disaster Recovery
• Definitions
– Introduce key terms and concepts
• Business Continuity Planning
– Overview of the BCP process
• SQL Server High Availability Planning
– How does BCP apply to SQL Server availability?
High Availability and Disaster
Recovery: Definition
• High Availability
• High availability is a system design protocol and associated implementation that ensures a certain absolute degree of operational continuity during a given measurement period
• Availability defined in terms of service level agreements (SLA)
– Recovery Time
– Data loss during unplanned downtime
• A highly available application should be accessible by users x% of the time
• Disaster Recovery
• Processes and procedures
designed to restore business
operations due to a natural or
human-induced disaster
– Typically involves providing
redundancy spanning multiple
sites or across geographic
regions
Defining x and SLA
• Recovery Time Objective (RTO) guided by availability requirements
– How much downtime can you tolerate?
• Recovery Point Objective (RPO) guided by criticality of application data
– How much data can you lose?
Availability
Class
Acceptable Downtime (hrs/yr) OR RTO
Acceptable Data Loss (time of last copy) OR RPO
Tier 1 >99.99%
(1 hr or less)
5 min or less
Tier 2 99.9% - 99.99% (1-8.5 hrs)
5 mins to 8.5 hrs
Tier 3 (<99.9%)
(Hours to days)
Hours to days
Tier1
RTO
RPO
Protection Levels
• Protection against resource failures
– Machine
– Database Corruption
– Disk
• Location Redundancy
– Building
– < 10 miles
Local HA
Regional DR
Geographic DR
Protection against
Network Outages
Site Failures
Location Redundancy
– City, County
– < 100-200 miles
Protection against
Natural Disasters
Location Redundancy
– State, Country
– > 100-200 miles
Business Continuity Planning
Analysis
Solution Design
Implementation
Testing
Maintenance
• Impact Analysis
– Critical Functions
– Threat Identification
– Recovery Objectives
• Solution Design
– Achieve recovery objectives for relevant threats within specified constraints like budget, human resources etc
– Cost\Benefit analysis of solutions
• Implementation
– Deploy the recommended solution
• Testing
– Test to see if the solution meets the recovery requirements
• Maintenance
– Yearly testing and review of procedures
SQL Server High Availability Planning
• Analysis
– Application tiers serviced by the databases
– Causes of database downtime
– Protection levels: Local HA, Regional DR, Geographic DR
• Solution Design
– Need to understand what solutions exists?
– What are the characteristics and cost of the solution?
• Implementation
– What are the deployment steps and best practices?
• Testing
– How do I test my implementation?
• Maintenance
– How do I monitor and maintain the solution?
Analysis
Solution Design
ImplementationTesting
Maintenance
Database Downtime Drivers
Database Downtime
Unplanned Downtime
Failure
Protection
User Errors
Planned Downtime
Online
Administration
Predictable Resourcing
Analysis
Solution Design
Solution Architecture
HA Capabilities
Limitations and Caveats
Cost Vector
• Understand the
solutions and
choices before
making a decision
Solution Design
SQL Server
Always On Technologies Solution Design
Always On Technologies
• Provides a full
range of options to
minimize downtime
and maintain
appropriate levels
of application
availability
Solution Design
• Backup and Restore
• Log Shipping
• Database Mirroring
• Failover Clustering
• Peer-Peer Replication
Increases Availability
• Online Index Operations
• Table Partitioning
• Enhanced Locking
• Resource Governor
• Database Snapshot
• Dedicated Admin Connection
• Dynamic Configuration
Decreased Downtime
Always On Technology Overview
• Architecture Overview– How does it work?
• Solution Characteristics– Data Loss Guarantees
– Failover Characteristics
– Redundancy Levels and Utilization
– Cost
– Limitations and Caveats
Solution Design
• Backup and Restore
• Log Shipping
• Database Mirroring
• Failover Clustering
• Peer-Peer Replication
Increases Availability
What’s New in SQL Server 2008
• New Features
• Resource Governor
– Manage SQL Server
workloads and resources
by specifying limits on
resource consumption
• Backup Compression
– Reduce backup and restore
time
• Feature Enhancements
• Database Mirroring
– Automatic recovery from
page corruption
– Log stream compression
– Faster recovery on failover
• Log Shipping
– Sub-Minute Log Shipping
– Backup compression
• Failover Clustering
– 16 nodes
– Rolling upgrade
• Peer-Peer Replication
– Hot add new nodes
Backup & restore
Backup and Restore
• Base availability technology for any solution– Protects against failures and recovery from errors
– Provides Local HA and Site DR
• Need to ensure the backups are accessible if site goes down
– High RTO due to restore time
– RPO=0 can never be guaranteed
• Types: Full, Differential, and Transaction Log– File-group backup/restore for large databases
• Backup Compression provides faster and smaller backups in SQL Server 2008
Solution Design
Enhanced Error Detection
• In SQL Server 2000 RESTORE
VERIFYONLY does not guarantee that the
backup is good
– Data may be corrupt
• In SQL Server 2005 RESTORE
VERIFYONLY checks everything
– Ensures that the data is correct
Database Checksums
• SQL Server 2000 had TornPageDetection to detect incomplete I/O Operations by power failures
• SQL Server 2005 adds checksums to data pages– Header of every page contains a checksum value
– When reading page, it re-computes checksum and compares with checksum stored
– Returns error (824) if difference found
– Detects errors not reported by I/O Subsystem
Backup Checksums
• Detect errors introduced by backup hardware but not reported by hardware or operating system – Backup media error detection
– Backup devices do not always detect errors
– Works with • RESTORE
• RESTORE VERIFYONLY
• Restore also checks page checksums, if present – Disk error detection on data pages prior to backup
• Can continue past errors if desired
Backup Compression
• Common questions:– ―How much compression will I see?‖
– ―Will it be comparable to, say, SQL Litespeed?‖
• One simple answer: ―It depends!‖
• All data compresses differently – the compression ratio achieved depends on:
– The type of data in the database
– Whether the data in the database is already compressed
– Whether the data/database is encrypted
• ―We saw an 85 percent
reduction in file size using
SQL Server 2008 Backup
Compression,‖ says Colin
Neller, Senior Software
Engineer at ServiceU and
part of the company‘s SQL
Server 2008 implementation
team. ―A backup file that was
previously over 300 GB is
now only 40 GB, and the job
runs in about half the time.‖
Backup Compression: Backup
Performance
• Backup of a 322 MB Adventureworks database
A LOT more CPU used (avg 25%) BUT runtime = 21.6s (45% improvement) and backup stored in 76.7MB (4.2x compression ratio)
Hardly any CPU used (avg 5%), runtime = 39.5s, compression ratio of 0.
Un
com
presse
d
Co
mp
ressed
DEMO
DATABASE SNAPSHOTS
Database Snapshots
• Read-only, consistent view of a
database
– Specified point-in-time
• Modifying data
– Copy-on-write of affected pages
• Reading data
– Accesses snapshot if data has
changed
– Redirected to original database
otherwise12:00 Snapshot
Page
Page
Using Database Snapshot to Recover
Data
Scenario Example Code / Steps
Undeleting
rows
Undoing
an update
Recovering
a dropped
object
1 Script the object in the database snapshot
INSERT INTO Production.WorkOrderRouting
SELECT * FROM
AdventureWorks_dbsnapshot_1800.Prod.WorkOrderRouting
UPDATE HR.Department
SET Name = ( SELECT Name FROM
AdventureWorks_dbsnapshot_1800.HR.Department
WHERE DepartmentID = 1)
WHERE DepartmentID = 1
Caution: Not a substitute for a comprehensive backup and restore strategy
Execute the script in the source database2
Repopulate the object (if appropriate)3
DEMO
Log Shipping
Log Shipping
• Automated transaction log backup and restore provides redundancy at the database level
• SQLLogship.exe provides the underlying framework for doing automated backup, copy and restore
– Backup on primary instance
– Restore on secondary instance(s)
• Scheduling is done through SQL Server Agent jobs
– SQL Server 2008 provides sub-minute scheduling interval providing the ability to do quick backup and restores
• No automatic failover capabilities
Solution Design
Log Shipping (Key terms)
• Primary Server:
– Contains your primary database.
– SQL Server Agent makes periodic transaction log
backups to capture changes.
• Secondary Server
– Contain an unrecovered copy of the production
database.
– One standby server can contain standby databases
from multiple primary servers.
Log Shipping (Key terms) cont…
• Monitor Server (Optional)
– Monitors the status of the log-shipping jobs on the
primary and each standby server.
– One monitoring server can monitor multiple primary-
standby server pairs.
– Should use a server other than the primary or the
standby to detect problems on either server.
Log Shipping
Copy
Copy
Copy
Perform
Backups
Copy and
Restore
Backups
Raise
Alerts
Primary Database
Monitor Database
Secondary Database
Secondary Database
Secondary Database
Copy and
Restore
Backups
Copy and
Restore
Backups
Strength & weakness
• Strengths
– Can Ship Logs Across WAN (Wide-Area Network)
– Protects an Entire Database
• Weaknesses
– Configured Per Database
– NO AUTOMATIC FAILOVER
DEMO
Mirroring
Database Mirroring
• A database level high availability solution that provides complete protection against data loss and fast recovery through automatic failover
• Maintains a redundant database by shipping log blocks when the transactions are committed on the principal
• Synchronous and Asynchronous modes provide the spectrum of options to choose between availability and performance
• Automatic failover when using witness server
Solution Design
Database Mirroring Modes
• High-Availability Mode– Safety Full; Synchronous operation– Database is available whenever a quorum exists– Automatic failover
• High-Protection Mode– Safety Full; Synchronous operation– No witness – quorum provided by partners– If Principal loses quorum, it stops servicing the database
• Ensures high protection; database is never in ‗exposed‘ state
– Manual failover only; no automatic failover– A transition mode; should not be in this mode for long
• High-Performance Mode– Safety Off; Asynchronous operation– Manual failover only
• Supports only one form of role switching: forced service (with possible data loss)
Database Mirroring How it works
MirrorPrincipal
Witness
Log
Application
SQL Server SQL Server
2
2
4
51
Data DataLog
3>2 >3
Mirror is always redoing – it remains current
Commit
Principal
Witness
DataLog
Mirror
Data Log
1. Bad Page
DetectedX
2. Request page
3. Find page
4. Retrieve page
5. Transfer page6. Write
Page
DBM – Automatic Page Recovery
Client
Database Mirroring Enhancements
• Enhancements in SQL 2008
– Compression of stream data for which at least a 12.5
percent compression ratio can be achieved.
– Automatic Recovery from Corrupted Pages.
– Page read-ahead during the undo phase.
– Improved use of log send buffers.
Strength & Weakness
• Strengths
– Can Mirror Across WAN
– Automatic Failover, and Nearly Instantaneous, Better
than Failover Clustering
– Protects an Entire Database
• Weaknesses
– Requires Enterprise Edition
– Must be Configured Per Database
DEMO
Replication
Replication
• Primarily used where availability is required in conjunction with scale out of read activity
• Failover possible; a custom solution
• Not limited to entire database; Can define subset of source database or tables
• Copy of database is continuously accessible for read activity
• Latency between source and copy can be as low as seconds
Transactional Replication
• A high performance data replication solution that provides granular table level replication
– Logical data movement provides flexibility and better hardware utilization
• Key scenarios:
– Customized application-specific DR
– Real-time reporting on secondary server that be used for Site DR
– Scale out application queries with ability to use any one database copy for Site DR
• Two types relevant for HA and DR
– Transactional and Peer-to-Peer
Solution Design
Peer-to-Peer Replication
• Provides high availability and read scalability
• Builds redundancy by eliminating single point of failure
• Enable online upgrades of servers
• Maximize Application Uptime
• Support for both Ring and Grid Topology
• Centralized Management using Management Studio
Peer Node
Peer Node
Peer Node
Peer Node
New Features
Application Server
Load BalancingRead
Write
Replicated
Data
User
Requests
Strength & Weakness
• Strengths
– Perpetual or on-demand replication of data, local or
remote
– Protects (duplicates or merges) the exact portion of the
database I want
• Weaknesses
– Configured per database, even per table
– Generally does not protect or duplicate an entire
Database
DEMO
FailOver Custering
Failover Clustering
• Instance level protection built on Windows Failover Clustering shared disk model– Cluster nodes typically co-located within the
same site to provide local HA
– Regional DR possible using VLAN and stretch storage level replication
• No built in data redundancy like database mirroring and log shipping– Data protection has to be provided at the
storage level or by combining with other solutions
Solution Design
Virtual Server
Node 2
Node 3
Shared Disk
Node 1
Failover Clustering
• Supports many scenarios: • Single Instance
• Multiple Instance
• Multiple Active Nodes
• N+1
• N+M
N+1: N Active, 1 Inactive
Nodes
Inst2 *
* Inst1
Multiple Active Nodes
Inst2 *
* Inst1
Inst3 *
N+M: N Active, M Inactive
Nodes
Failover Cluster
* Inst1
SQL Server Cluster Topologies
Failover Clustering (Facts)
• Redundancy at database instance level– All databases fail over together
– Shared copy of system databases
• Single data copy on shared storage device– No I/O overhead reducing throughput
– Storage unit is single point of failure for cluster
• All database services are clustered– SQL Agent; Analysis Services; Full-Text engine, MS DTC
• Automatic failover (up to minutes)
• DBMS accessed over virtual IP
• Storage is controlled by one cluster node at a time
• Requires hardware certified by Microsoft for Microsoft Cluster Service
Strength & Weakness
• Strengths
– Provides Protection Against a Node Failure, Protects
the Entire SQL Instance
– Automatic Failover Supported
• Weaknesses
– Generally Expensive, Requires Specialty Hardware
– Specialty Hardware Requirements
– Not Trivial to Configure and Manage
– Doesn‘t Protect Against a Complete
Site Failure
DEMO
Best Practices
• Backup your system databases after
modifications.
• Test if backups are restorable.
• Practice / Test your disaster recovery plans.
• Documentation is not only for you.
• Keep dedicated DR Server ready.
• Use BACKUP CHECKSUM features.
• Run DBCC CHECKDB regularly.
• Don‘t ignore any runtime errors.
What Solution Is Best For US ?
Always On Solution
Characteristics
No Data Loss(RPO=0)
Failover Unit AutoFailover(RTO)
Inst DB Tab
+ **
Read Mult-iple
Write
*
*
*
Solutions
Log Shipping
DBM Sync
Async
Cluster
TransactionalReplication
Peer-PeerReplication
RPO FailoverRedundancy and
Utilization
Hard-ware
App PerfImpact
Manag-eability
Low Low Low
Low High Low
Low Low Low
High*** Low *** Low***
Low Low High
Low Low High
Cost
Solution Design
* Database Mirroring and Log Shipping can provide point in time read capability using STANDBY or database snapshots respectively
** Database Mirroring provides fastest failover to hot secondary*** Depends on SAN technology
Recap
• Application availability requirements
or SLA drive primary solution choices
– RPO and RTO are the key metrics
used to define the SLA
• Need mitigation against planned and
unplanned downtimes
• Multiple solution choices that
provides varying cost\benefits
• Other requirements apart from
application SLA factor into the choice
• Understand constraints and tradeoffs
you can make
Database Mirroring
Clustering
Log Shipping
Peer-PeerReplication
Application Availability
Unplanned downtime
Planned Downtime
Solution Design
Always On Solution
Characteristics
No Data Loss(RPO=0)
Failover Unit AutoFailover(RTO)
Inst DB Tab
+ **
Read Mult-iple
Write
*
*
*
Solutions
Log Shipping
DBM Sync
Async
Cluster
TransactionalReplication
Peer-PeerReplication
RPO FailoverRedundancy and
Utilization
Hard-ware
App PerfImpact
Manag-eability
Low Low Low
Low High Low
Low Low Low
High*** Low *** Low***
Low Low High
Low Low High
Cost
* Database Mirroring and Log Shipping can provide point in time read capability using STANDBY or database snapshots respectively
** Database Mirroring provides fastest failover to hot secondary*** Depends on SAN technology
Solution Design
AdventureWorks Inc Scenario
Adventureworks Inc is a manufacturing company that manufactures and sells bicycles across the world. There are a number of applications, some that are mission critical that run on multiple SQL Server Instances
• The DBA team is run by Darren who is responsible for deploying and managing the application databases. One of his core responsibilities is to ensure availability of all application databases in order to meet the application SLA
• One datacenter located in Omaha
• Three applications
– Manufacturing – Tier 1
– Finance – Tier 2
– Scheduling – Tier 3
• Manufacturing application runs on a dedicated SQL Server 2008 Instance
– All other applications run on a second instance
• Availability of manufacturing application is critical
• Implement a solution at the lowest possible cost
Solution Design
Application Requirements
• Manufacturing application has strict SLA‘s
• Finance application requires readability on the secondary
– The reports are run every 4 hours and need to be fresh as of the
last one hour. To offload the reporting load from the main system
they would like to utilize the mirror
Data LossRPO=0
RTO insecs
Failover Unit AutoFailover
Inst DB Tab
Read Multiple Sites
ReadWrite
Applications
Manufacturing
Finance
Scheduling
Solution Design
Solution Choice for Manufacturing
Application
• Clustering can provide a zero data loss solution that can also provide fast instance level failover
• Use RAID configuration to provide data redundancy on the SAN
• If a redundant copy is required that can provide instance failover with zero
data loss use SAN replication– High Cost Solution
• Use synchronous database mirroring if instance failover is not needed
Solutions Data LossRPO=0
FastRTO
Failover Unit AutoFailover
Read > 1Sites\Copy
Read Write
Inst DB Tab
Cluster
SAN Replication
DBM - Sync
Log Shipping
TransactionalReplication
Peer-PeerReplication
Clustering with RAID
DBM - Async
Solution Design
For database level redundancy with acceptable data loss with minimal perf impact, asynchronous database mirroring is an optimal choice
Use database snapshots at periodic intervals to provide a readable snapshot of the data for reporting
Low cost solution
DBM - Async
Cluster
Solution Choice for Finance
ApplicationSolutions Data Loss
RPO=0FastRTO
Failover Unit AutoFailover
Read > 1Sites\Copy
Read Write
Inst DB Tab
SAN Replication
Log Shipping
TransactionalReplication
Peer-PeerReplication
Async Database Mirroring
Omaha Datacenter
Finance
Db Snapshotevery hour
Reports
Scheduling
DBM - Sync
Solution Design
Adding a Regional Datacenter Into
the Mix
• Regulatory and compliance requirements drive the need for having a additional datacenter within a 10 mile radius to provide redundancy against site level failure.– It is now required that all applications have the ability
to failover to the regional datacenter across the river in Council Bluff
• The SLA need to be maintained for tier 1 applications even in the case of site failures
Solution Design
Regional Site Solution
Choices
Async Database Mirroring
Omaha Datacenter
Finance
Db Snapshotevery hour
Reports
Scheduling
Sync Mirroringno witness
Log Shipping
CB Datacenter
Cluster with SAN
Manufacturing
Solution Design
A Complete Topology
• Considering the potential of floods and tornadoes destroying the regional data centers, Adventureworks Inc wants to maintain a disaster recovery site in San Antonio, TX
• The disaster recovery site has lower SLA requirements for all applications– The manufacturing application can have an RPO of 1
hour
– The RTO is set at 4 hours
Solution Design
Topology Diagram
Sync MirroringNo witness
Cluster with SAN
Log Shipping
Manufacturing
Solution Design
Scale Out and Availability
Scenario• Adventureworks is building
a new web based order
management system that
allows customers from all
over the world access the
system and place orders
• The core group of
customers are in Western
Europe, South East Asia
and North America
Requirements
– Geo Redundancy
– Data Locality
– High Availability
– Local Read-Scale
Workload Characteristics
– Mainly reads
– Few writes
Application Characteristics
– Each user logging in connects to a particular server Partitioned based on user-id and region
Writes from a user always happen on one server regardless of the region the user log in from
– All reads redirected to the closest geo-location Reasonable tolerance for latency (5-10 minutes)
Solution Design
Replication Topology
Peer Nodes
Read-Only Servers
Asia1 Asia2
Solution Design
Licensing Facts
• Passive servers are mirror, log shipped secondary and clustering passive node
• No license required on passive if it is truly passive
• A passive server does not need a license if the number of processors in the passive server is equal to or less than the number of processors in the active server.
• The passive server can take the duties of the active server for 30 days. Afterwards, it must be licensed accordingly.
HA Features Edition SupportFeature Express Workgroup Standard Enterprise Comments
Database Mirroring
1
Advanced high availability solution that includes fast failover and automatic client redirection
Failover Clustering 2
Backup Log-shipping
Data backup and recovery solution
Online System Changes
Includes Hot Add Memory, dedicated administrative connection, and other online operations
Online Indexing
Online Restore
Fast RecoveryDatabase available when undo operations begin
₁Single thread redo₂ Limited to 2 node cluster
Summary
• There is no ―one size fits all‖ solution
• Consider the cost\benefits\constraints and compare that to availability requirements of the organization to determine the best solution
• Use the charts to understand cost, benefit and constraints of the various SQL Server High Availability solutions
• TEST the solution to ensure it can meet the availability requirements and meet SLA‘s
•question & answer
SQL Server AlwaysOn:
Mission Critical Capabilities in SQL
Server “Denali”
• Jon Jahren
• Exec VP, Prediktor
High Availability and Disaster
Recovery
SQL Server “Denali” AlwaysOn
• Faster failover, easier administration with Availability Groups
• Identify databases to failover as a unit to reduce unplanned downtime
• Faster application failover using virtual name
• Increase application uptime using flexible failover policy
• Enable better data redundancy and protection with up to four secondaries and up to two synchronous secondaries
• Limited downtime with enhanced online operations
• Run Microsoft SQL Server® on Windows Server® Core to reduce planned downtime (50-60% fewer OS patch reboots)
A
A
Disaster Recovery
Shared Storage
A
Non-Shared Storage
A
A
A
AA A
Maximize Resources
Higher return on high availability investments
• Increase hardware utilization through active secondaries for backups, reporting, and ad hoc queries
• Reuse existing infrastructure with support for both SAN and direct attached storage
Simplify management and administration
• Integrated manageability for one-stop configuration
• Easy setup and monitoring integrated into Microsoft SQL Server Management Studio
• Availability Groups that provide failover units with contained dependencies (such as logons)
Breakthrough Performance and Scale
• Dramatically faster star-join query processing—
much faster than current SQL Server (~10X)
• Query speed increase varies with query and data
• Reduced I/O
• Consistent query performance
• Reduced performance tuning effort
80
11001010010100101001110101100101001
•
Mission Critical High Availability Solution
81
Meets
mission
critical high
availability
SLA
Integrated EfficientFlexible
Microsoft recommended prescriptive HA solutions and
customer references
Introducing SQL Server
AlwaysOnIntegrated, Flexible, Efficient high Availability for
mission critical business
Multisite Clustering
Flexible Failover Policy
Improved Diagnostics
Built for consolidation scenarios
Multi-Database Failover
Multiple Secondaries
Active Secondaries
Integrated HA Management
AlwaysOn Availability Groupsfor database protection
AlwaysOn Failover Cluster Instancesfor instance level protection
AlwaysOn provides database level and instance level protection
A high availability platform built for the future
AlwaysOn – A flexible solution
AlwaysOn provides the flexibility of different HA
configurations
83
Synchronous Data Movement
Asynchcronous Data Movement
Shared Storage, regional and geo secondaries
A
A
A
A
A
Direct attached storage local, regional and geo target
AA
AlwaysOn Availability Groups
AlwaysOn Availability Groups is a new feature that enhances and
combines database mirroring and log shipping capabilities
Multi-database failover
Multiple secondaries
Total of 4 secondaries
2 synchronous
secondaries
1 automatic failover pair
Synchronous and
asynchronous
data movement
Built in compression
and encryption
Automatic and manual
failover
Flexible failover policy
Application
failover using
virtual name
Configuration
Wizard
Dashboard
System Center
Integration
Rich diagnostic
infrastructure
File-stream
replication
Replication
Flexible Integrated Efficient
Active
Secondary
Readable
Secondary
Backup from
Secondary
Automation
using power-
shell
Availability Groups Virtual Name
Availability Groups Virtual Name allow applications to failover seamlessly to any secondary
– Application reconnects using a virtual name after a failover to a secondary
AG_HR
HRDB
HR
DB
Primary Secondary
HR_VNN
-server HR_VNN;-catalog HRDB
Application retry during failover
Connect to new primary once
failover is complete
and the virtual name is online
PrimarySecondarySecondary
HR
DB
ServerA ServerB ServerC
Backward Compatible
What about Server Objects?
Introducing Contained Databases or CDB‘s Unit of application
programmability in DenaliA DB which establishes a
boundary between application and server
CDBs sever the user–login relationshipWindows users no longer need
matching logins
Users with passwords replace SQL logins
Authentication information moves with the CDB
Databases are not always easy to move
Master MSDB
Temp
Master MSDB
User DBTempDB
Instance CollationLogins
CredentialsLinked Server Defs.
CLR…
AgentReplication
DB Mail…
…TempDB Collation
Other AppsOther DBs
User DBUser DBUser DB
Introducing the Contained Database
• New database option – CONTAINMENT• Only option supported in Denali is PARTIAL meaning,
non-enforaced containment
• Partially contained databases solve problems related to:
• Logins: Database Users with passwords or mapped directly to Windows principles
• System Collation: Temp tables use the database‘s collation
• sys.dm_db_uncontained_entities will display all potential containment breaches
Availability Group Architecture
Inter-node health detection,
Failover coordination,
Primary health detection,
Distributed data store for
settings and state,
Distributed change
notifications
Windows Server Failover Cluster
WSFC Common Microsoft Availability
Platform
SQL Server AlwaysOn Failover cluster
instances
SQL Server AlwaysOn Availability Group
Microsoft Hyper-V
Microsoft Exchange
Built-in WSFC workloads (e.g. file share,
NLB, etc) and third party workloads
DatabaseActive Log Synchronization
DatabaseActive Log Synchronization
Availability Group uses Windows
Server Failover Cluster (WSFC) for
AlwaysOn Availability Group
Instance Preparation
1. Install WSFC on each machine and create a single WSFC cluster
2. Install SQL Server Instances on each machine
3. Enable AlwaysOn through SQL Configuration Manager
4. CREATE ENDPOINT on each instance
• Notes:
– Steps 1 and 2 can occur in any order (except for AlwaysOn Failover Cluster Instance (FCI) installation which of course requires WSFC installed)
WSFC Cluster vs. SQL Server “Cluster”
Setup
• Install WSFC feature
• Setup WSFC cluster
• Configure SAN and Shared Disks
• Install SQL Server Failover Cluster Instance (FCI):
– Specify resource group
– Select shared disks
– Configure virtual IPs
– Configure virtual network names
– Specify domain accounts for services
– Configure domain groups*
Simplified WSFC Cluster setup in Windows
Server 2008+
Availability Group Concepts Recap
• Availability Group– Defines the high availability requirements
• Databases, Replicas, Availability Mode, Failover Mode etc
• Availability Replica– SQL Server Instances that are part of the
availability group which hosts the physical copy of the database
– Role: Primary, Secondary, Resolving
• Availability Database– SQL Server database that is part of an
availability group
– This can be a regular database or contained database
WSFC Service
SQL Server Instance
AG Res DLL
WSFC Service
SQL Server Instance
AG Res DLL
WSFC Service
SQL Server Instance
AG Res DLL
Availability Group 1
Availability Group 2
User tells SQL to failover Availability Group 2 to Node1
SQL confirms
and tells
WSFC
WSFC tells
AG Res DLL
to bring AG2
offline
WSFC tells
AG Res DLL
to bring AG2
online
Notification
of new
primary
Notification
of new
primary
Secondaries request
primary connection
Clients disconnected from AG2
Client connections transparently redirected
to primary via IP and network name
resources
Availability Group Architecture Drilldown
Active Secondary – Making Secondary
Readable
Readable secondary allow offloading read queries to secondary
Close to real-time data, latency of log synchronization impact data freshness
Read applications can reconnect to another secondary on failover
Not a replacement for replication scenarios
DB2DB1
SQLservr.exe SQLservr.exe
InstanceA
DB2DB1
Primary Secondary
InstanceB
Reports
PrimarySecondary
Reports
Active Secondary: Enabling Backup
On SecondaryR/W workload
Primary
Backups
Secondary
Backups
Secondary
Backups
Backups can be done on any replica of a database
Secondary replica may be synchronous or asynchronous
Backups on primary replica still works
Log backups done on all replicas form a single log chain
Recovery Advisor makes restores simple
Log
Cache
Log
Cache
Readable Secondary Latency
• Updated data is visible on the readable secondary as and when the page is redoneRedo happens asynchronously after log hardening on the secondary
DB1
DB1 Log
DB1
Primary Secondary
Log
CaptureLog
Apply
DB1
Data
Network
Redo
Thread
Redo
Pages
DB1 LogDB1
DataPage
Updated
Log
HardenLog
Flush
Commit
Acknowledge
Commit
Readable Secondary Behavior
• Contention between redo thread and query thread avoided by – Internally mapping read workload to non blocking isolation levels
• Read Uncommitted Snapshot Isolation
• Read Committed Snapshot Isolation
• Repeatable Read Snapshot Isolation
• Serializable Snapshot Isolation
– Ignore all locking hints
• Maintains query performance on secondary compared to primary– Auto-create statistics on the secondary replica but persist them in
TempDB
Providing Instance Availability
Flexible Failover Policy• Eliminates false failover• Configurable failure
condition levels• Better diagnostics
Native support for multi-site clustering across subnets enable DR using failover cluster instances
SMB support enables consolidation of more than 26 instances
Support TEMPDB on local drive
Key Enhancements
Fast instance failover through predictable database recovery time
AlwaysOn Failover Cluster Instance
• AlwaysOn Failover Cluster Instance provides instance level failover
• Key Enhancements– Multi-site clustering across subnets
– Flexible Failover Policy
– Improved system diagnostics
– Support for network attached storage (NAS) using SMB
– Support for tempdb on local drive
Multi-Site Clustering
• Multi-site clustering provides protection from site failures
• AlwaysOn Failover Cluster Instance natively supports multi-site clustering without requiring V-LAN
– Each site can have separate IP subnet
– DNS entry updated to reflect current IP address on failover
Flexible Failover Policy
• FailureConditionLevel (0 to 5):– 5 – Failover or restart on any qualified failure
conditions
– 4 – Failover or restart on moderate SQL Server errors
– 3 – Failover or restart on critical SQL Server errors
– 2 – Failover or restart on SQL Server unresponsive
– 1 – Failover or restart on SQL Server down
– 0 – No Automatic Failover or restart
• Diagnostics returned regardless of FailureConditionLevel
• All levels optimized to minimize false failures
WSFC Service
SQL Server Failover
Cluster Instance
FCI Res DLL
IsAlive /LooksAlive
WSFC asks Res
DLL if
SQL FCI alive
exec sp_server_diagnostics
Diagnostics generated
for Health State
Components
• System
• Resource
• Query Processing
• IO Subsystem
• Events
Diagnostics
(periodically returned)
User sets new Cluster properties
HealthCheckTimeout and FailureConditionLevel
IsAlive/ LooksAlive result based on diagnostics and FailureConditionLevel
Reducing Planned Downtime
Support for Windows Server Core Reduce OS patching by as much as 50-60%
Support for rolling upgrade and patching of SQL Server for both Availability Groups and Failover Cluster Instance
Fast failover time for both Availability Groups and Failover Cluster Instances
New online operations supported LOB Index
Adding of column with default
AlwaysOn Solution Guidance
Flexible Solution Choices
AlwaysOn Availability
Groups
AlwaysOn Failover Cluster
Instances
AlwaysOn Multi-site Failover Cluster
Instances
Optionally combine with Availability Groupsfor DR
Virtualization with AlwaysOn Guidance
Virtualization provides best consolidation isolation
Virtualization without AlwaysOn:Simplest management story for limited HA/DR:
When to use AlwaysOn for the guest:Need better HA/DR protection than standalone VM
Planned Unplanned
Host Live Migration VM failover (OS restart)
Guest Downtime during patch
No protection from virtualization
Available Now – CTP1• SQL Server Code Name Denali CTP1 is now public
• CTP1 has the following feature set that you can test and provide feedback– AlwaysOn Failover Cluster Instance Features are RTM Quality:
• Multi-Subnet Failover
• Flexible Failover Policy
– AlwaysOn Availability Groups Preview• Ability to configure availability groups through T-SQL, SSMS, and PowerShell
• Multiple databases support in availability groups
• Read-only access to the secondary
• Support for Filestream data type
• Manually failing over and resynchronizing without reseeding
• Failing over client connections using the new connectivity story based on virtual network names and virtual IP addresses
• Including logins in user databases through a Contained Database
• SSMS, Catalog Views, and DMVs to view and monitor state
• Support for multiple availability groups on the same instance
• Support for availability groups on standalone instances and/or failover cluster instances
Conclusion
• SQL Server AlwaysOn is a comprehensive high availability solution
– Better application availability,
– Higher return on investment and
– Simplified deployment and management
• AlwaysOn Availability Group and AlwaysOn Failover Cluster Instance provide flexibility in HA configuration
• Windows Server Core support significantly reduces downtime due to patching
• SQL Server AlwaysOn Availability Group– Multi-database failover
– Multiple secondaries
– Synchronous and asynchronous data movement
– Built in compression and encryption
– Automatic and manual Failover
– Flexible failover policy
– Automatic Page Repair
– Readable secondary
– Secondary backup
– Automatic application redirection using virtual name
– Configuration Wizard
– AlwaysOn Dashboard
– System Center Integration
– Automation using power-shell
– Rich diagnostic infrastructure
• SQL Server AlwaysOn Failover Cluster Instance
– Multi-site clustering across subnets
– Flexible Failover Policy
– Improved system diagnostics
– Support for network attached storage (NAS) using SMB
– Support for tempdb on local drive
AlwaysOn Resources
―Denali‖ AlwaysOn Resource Center: http://msdn.microsoft.com/en-us/sqlserver/gg490638(en-us,MSDN.10)
CTP download
Documentation
MSDN forums
Microsoft Connect
AlwaysOn Blog
Credits :
Vinod Kumar
Balmukund Lakhani
Matt Hollingsworth
Jon Jahren