26
Hakan YÜKSEL hakan.yuksel@turkiyefinans.com.tr http://yukselis.wordpress.com Windows 2008 R2 Failover Cluster Mimarisi ve Sorun Çözümleme

Webcast - Failover Cluster Architecture

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Webcast - Failover Cluster Architecture

Hakan YÜ[email protected]://yukselis.wordpress.com

Windows 2008 R2 Failover Cluster Mimarisi ve Sorun Çözümleme

Page 2: Webcast - Failover Cluster Architecture

Ajanda

Cluster Nedir, Niçin Cluster KullanıyoruzCluster Mimarisi ve KavramlarFailover Cluster Gereksinimleri ve Kurulum

(Demo) Cluster, Server Rollerinin Konfigure Edilmesi

(Demo)İyileştirme - Bakım ÇalışmalarıMulti-Site Cluster (GeoCluster)Soru – Cevap

Page 3: Webcast - Failover Cluster Architecture

Cluster Nedir, Niçin KullanıyoruzCluster birden fazla sunucununun kümelenerek tek bir sunucu gibi davranmasıdır. Bu şekilde yüksek erişilebilirlik ,yedeklilik sağlanmış olmaktadır. Bu ortamda sunucular tarafından kullanılan veriyi ortak bir disk alanında saklamak ve bu ortak diske aynı anda cluster sisteminin yalnızca bir üyesinin erişimini sağlanması şeklinde çalışmaktadır.

High Availability HA Cluster üzerinde host ettiğimiz servis ve uygulamarın daimi ayakta olmasını garanti altına almak için kullanıyoruz.

Clusters improve manageability by: Disaster recovery: Clusters help recover applications in case of a failure

üü

Update management: Clusters enable applications to continue to be available when updates are applied to applications or the node operating system

üü

Page 4: Webcast - Failover Cluster Architecture

Cluster Kavramları

– Node, Active, Passive– Virtual IP,Name– Group, Resource, Service, Application– Split Brain,SCSI Bus Reset,SCSI3 Reservation– Quorum, MNS, Arbitration Process– HeartBeat, Private, Public Network– Cluster Aware Software– Failover, Failback,Dependency– SAN Kavramlar

• HBA, LUN,Multipath,Target,Inıtıator

Page 5: Webcast - Failover Cluster Architecture

Failover Nedir?

• Herhangi sebepten aktif olan nodun inaktif duruma geçmesi • Grup yada grup içerisindeki resourceların fail, offline duruma geçmesi •Administrator tarafından manuel yapılan müdahale ile

Cluster üzerinde host edilen grup ve resourceların bir node üzerinde diğerine taşınması durumudur

Failover durumu yaşanan durumlarda aşağıdaki kaynaklardan birinde problem olduğu varsayılır;

Node,Interface, Group,Resource,Disk

Page 6: Webcast - Failover Cluster Architecture

Cluster FailoverClient PCs

Node A Node B

Disk cabinet A

Disk cabinet B

HeartbeatSQL Passive Node

Failure Occurs!

SCSIReserveBroken

SQL fails overand is available to

clients

SQL

New Reservation Established

Page 7: Webcast - Failover Cluster Architecture

Quorum ve Majority Node Set

• Quorum cluster konfigürasyom ve durum bilgisinin olduğu alan.

• Majority Node Set MNS demokratik bir sistemdir. Quorum da sadece bir oy var ise ve buna sahiplenen cluster a sahiplenebiliyorsa, MNS de çoğunluk clustera sahiplenir. Mesela 5 nodelu cluster da split brain senaryosu yaşanırsa her node toplam kaç node ila haberleşebildiğine bakar. Bir node iki node ile haberleşebiliyorsa, 3 node 5 nodedan çoğunluğu oluşturur ve cluster sahiplenir. Diğer iki node azınlıkta olduklarını anlar ve diğer 3 node un haberleşebildiğini varsayarlar.

• Windows Server 2008 ile yeni bir Quorum modelimiz de var (Node and Disk Majority), bu sefer Quorum diskin kullanımı biraz farklı oluyor: Quorumu node sayısı ile beraber bir oy hakkı olarak kullanıyoruz..

• http://yukselis.wordpress.com/2010/06/28/quorum-nedir/#comments Başar Güner

Page 8: Webcast - Failover Cluster Architecture

Quoruma Bakış

• Node majority• Node and File Share majority

• Disk only (not recommended)• Node and Disk majority

Vote Vote Vote VoteVote

Majority is greater than 50% Possible Voters:

Nodes (1 each), Disk Witness (1 max), File Share Witness (1 max) 4 Quorum Types

Page 9: Webcast - Failover Cluster Architecture

Quorum Model Summary

No Majority: Disk Only– Note Recommended– Only use as directed by vendor

Node and Disk Majority– Only use as directed by vendor

Node Majority– Odd number of nodes

Node and File Share Majority– Best availability solution– Recommended for

• Exchange Server 2007 CCR

Page 10: Webcast - Failover Cluster Architecture

Quorum Modelini Seçme

Considerations for choosing a quorum mode include:

• By default, failover clustering chooses:- Node Majority if there are an odd number

of nodes in the cluster- Node and Disk Majority if there are an even number

of nodes in the cluster • Node and File Share Majority is recommended for geographically dispersed clusters

• No Majority: Disk Only is not recommended, because of the disk subsystem’s single point of failure

• Plan changes to the quorum mode carefully to avoid a mode that may result in loss of quorum

Page 11: Webcast - Failover Cluster Architecture

Windows Server 2008 R2 ile Gelen Özellikler

• Validation processinde yapılan iyileştirmeler– Windows Server 2008 R2 includes a Best Practices Analyzer (BPA) for all major server roles, including

Failover Clustering. This analyzer examines the best practices configuration settings for a cluster and cluster nodes.

• Gelişmiş Cluster Node Hata Toleransı– Because of the architecture of CSV, there is improved cluster node connectivity fault tolerance that

directly affects VMs running on the cluster. The CSV architecture implements a mechanism, known as dynamic I/O redirection, where I/O can be rerouted within the failover cluster based on connection availability

• The addition of a Windows PowerShell interface.• Additional options for migrating settings from one cluster to

another.– Administrators can migrate cluster workloads currently running on Windows Server 2003 and

Windows Server 2008 to Windows Server 2008 R2.

Page 12: Webcast - Failover Cluster Architecture

Cluster GereksinimleriReview hardware and infrastructure requirements for a failover cluster.• Servers: Microsoft supports a failover cluster solution only if all the hardware components are

marked as "Certified for Windows Server 2008 R2." In addition, the complete configuration (servers, network, and storage) must pass all tests in the Validate a Configuration Wizard, which is included in the Failover Cluster Manager snap-in

• Storage: You must use shared storage that is compatible with Windows Server 2008 R2• Network adapters and cable (for network communication): The network hardware, like other

components in the failover cluster solution, must be marked as "Certified for Windows Server 2008 R2." If you use iSCSI, your network adapters should be dedicated to either network communication or iSCSI, not both

• Account for administering the cluster: When you first create a cluster or add servers to it, you must be logged on to the domain with an account that has administrator rights and permissions on all servers in that cluster. The account does not need to be a Domain Admins account—it can be a Domain Users account that is in the Administrators group on each clustered server. In addition, if the account is not a Domain Admins account, the account (or the group that the account is a member of) must be delegated Create Computer Objects and Read All Properties permissions in the domain

• Standart Edition üzerindeki sunucular üzerinde cluster activate edilebilir • SCSI-3 Commands• Basic GPT and MBR disks supported• Multipath IO (MPIO) recommended• Persistent Reservations (PRs) Required

Page 13: Webcast - Failover Cluster Architecture

Failover Cluster Kurulum Adımları

Failover Cluster PrerequisitesEstablish a Network Naming Convention TCP/IP Network Configuration Public NetworkStorage NetworkHeartbeat Network

ProceduresPrepare the Failover ClusterCreate a Domain User AccountAdd Nodes to an Active Directory DomainExpose Storage to Cluster NodesInstall the Failover Cluster FeatureRun Cluster ValidationCreate and Configure the Failover ClusterCreate a ClusterSet Cluster Network Properties and Apply Naming Convention

Create a Highly Available Services

-> Create a Highly Available iSCSI TargetConfiguring Windows Firewall for Microsoft iSCSI Software TargetInstalling the Microsoft iSCSI Software TargetCreate the Failover iSCSI Target Resource GroupCreate an iSCSI Target in the Microsoft iSCSI Target MMCCreate and Configure Virtual DisksConnect Initiators

Testing Your Failover Cluster Configuration

demo

Page 14: Webcast - Failover Cluster Architecture

R2 ile Gelen Validation Özellikleri• Cluster Configuration

– List Information (Core Group, Networks, Resources, Storage, Services and Applications)

– Validate Quorum Configuration– Validate Resource Status– Validate Service Principal Name– Validate Volume Consistency

• Network– List Network Binding Order– Validate Multiple Subnet Properties

• System Configuration– Validate Cluster Service & Driver Settings– Validate Memory Dump Settings– Validate OS Installation Options

• Replaced Validate Operating Systems

– Validate System Drive Variable

demo

Page 15: Webcast - Failover Cluster Architecture

Troubleshooting

• Reviewing cluster events • Reviewing hardware events • Using the Validate a Configuration Wizard • Reviewing storage/SAN events• Troubleshooting methodologies for cluster issues, whether in Windows 2003 or Windows 2008, are

fairly similar. Most of the typical support issues in the cluster category fall under the following categories:

– · Cluster Service fails to start.– · Cluster resources in a failed state or fail to come online.– · Determine root cause of cluster failure.– · Initial configuration of the cluster

• The Win 2003 legacy CLUSTER.LOG text file no longer exists. In Win 2008 the cluster log is handled by the Windows Event Tracing (ETW) process. This is the same logging infrastructure that handles events for other aspects you are already well familiar with, such as the System or Application Event logs you view in Event Viewer.

• Command Line– c:\>cluster log /gen

• Powershell– C:\PS> Get-ClusterLog

• ForceQuorum– net start clussvc /forcequorum (or /fq)

demo

Page 16: Webcast - Failover Cluster Architecture

Cluster Eventları• Cluster Events

• Recent Cluster Events üzerinde son 24 saate ait eventlar görünmektedir.

• Monitoring Cluster Events– Fully featured Failover Cluster Management Packs

• Cluster logging level– Set-ClusterLog –level 3

Page 17: Webcast - Failover Cluster Architecture

Failover Süreci

2 node birbirine ulaşamadiği durumda quarum diskine erişmeye çalışır bu duruma arbitration process denilir. Clusdisk.sys dosyası nodeların ikisininde disklere erişimin engellemek için yönetimi yapar. MNS mimarisi ile birlikte quarum bilgisi register replikasyonu ile sağlanmaktadır. Bu dosyalara %\windows\system32\config altından erişilebilinir. Cluster açılması esnasında clusdb dosyasını registryden download edilerek cluster işletimi çalışmaya başlar. Bu konfigürasyon dosyasında hangi disklere erişebileceğinin bilgisi yer almaktadır. 2008 cluster sistemleri register replikasyona dayalı olarak çalışır.

Page 18: Webcast - Failover Cluster Architecture

Scsi Bus Reset, SCSI3 Persistent Reservarion

• Split Brain Senaryosu: İki node birbirleri arasındaki network iletişimi kaybetme durumu. Bu durumda Cluster servisi (clusdisk.sys) Challande/Defense protokolu ile SCSI reserver komutları vasıtasıyla önce reset komutu gönderir bundan sonra reserve komutu ile quorum diskini reserve eder online getirir akabinde ownershipliği alarak tüm resourceları online duruma çeker.

• Windows Server 2008 ile birlikte artık scsi bus resetleri kullanılmıyor. Scsi 3 serial persistent reservation kullanılmaktadır. Scsi bus reset den sadece o disk değil aynı bus üzerindeki bütün diskler etkilenmekte, konfigürasyona bağlı olarak her disk için her node dan bir bus reset gönderilebilmekte bu durumda cluster kendisini online etme süreleri uzamakta ve offline kalabilmekteler bu durumda manuel online çekilmesi gerekebiliyor idi.

Page 19: Webcast - Failover Cluster Architecture

Failover Cluster Mimari

• Microsoft Cluster Service (MSCS) sharing nothing modelini kullanır. Bunun anlamı sadece bir server kaynakların sahibi olabilir bunlar disk,virtual server, IP vb..

• Classdb file HKLM\Cluster registry hive üzerinden download eder. • When the computer is started, the Cluster Disk Driver (Clusdisk.sys)

reads the following local registry key to obtain a list of the signatures of the shared disks under cluster management:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ClusDisk\Parameters \Signatures

• Recommandation private only hb public mix olmalı • Cluster üzerinde resource groupların doğru çalışıp çalışmadığını

kontrol eden resource monitorler mevcuttur. Resource monitor clsusvc altinda çalişan dll lerden oluşmaktadır. 2008 ‘de bunun adi RHS.exe

The Resource Hosting Subsystem (RHS) conducts periodic health checks of all cluster resources to ensure they are functioning properly. This is accomplished by executing IsAlive and LooksAlive processes which are specific to the type of resource

Page 20: Webcast - Failover Cluster Architecture

Microsoft Failover Cluster Virtual Adapter

• Microsoft Cluster ortamlarda “Microsoft Failover Cluster Virtual Adapter” adında bir interface oluşturur, hidden bir interface’dir NetFT (Network Faut Tolerant) dosyasını simüle eder, clusterlar arası iletişimi yürütür, heartbeat için bir redundancy sağlar. Bu interface mevcut interface üzerine bind olur smb’den SAN’e olan trafik bu kart üzerinde utilize edilir. NetFT, ipconfig /All üzerinden görülür kendisine APIPA adresi tahsis (169.254.1.2) eder, bu ip üzerinden aslında data transferi yapılmaz bu IP fiziksel kart üzerine bind olduğunda TM üzerinden utilizasyon görülmektedir.

Page 21: Webcast - Failover Cluster Architecture

Site B

Multi Site Cluster (GeoCluster) Nedir

• 2+ physically separate sites• 1+ node at each site• Storage at each site with data replication• Application moves during a failover

Site A

SANSAN

Geographical Cluster’ın kısaltması olan geocluster yada multicluster coğrafik olarak dağıtılmış farklı bölgelerde bulunan sunucuların cluster olarak çalıştırılmasıdır. Kavramsal olarak cluster üyelerinin iki ayrı lokasyonda bulunması durumuna Geo-Clustering ya da Multi Site Clustering adı verilmektedir.

Page 22: Webcast - Failover Cluster Architecture

Multi-Site Cluster Faydaları

• Hizmet sürekliliği hedeflenmektedir. Klasik clusterda sunucular tek storage ortamına erişirken geocluster her site üzerinde bibirine senkron replike edilmiş datalara erişirler. Replikasyon tipi senkron “anlık” yada asenkron “gecikmeli” olabilir.Replikasyon Seviyeleri ;

– Storage bazlı “block-level” – Yazilimsal “host based”

• Senkron yapıda sunucu disk üzerine datayı yazdıktan sonra data anlık olarak 2.site üzerindeki storage üzerinede yazilir, 1.site üzerindeki storage 2.site üzerine datanin yazilmasi akabinde sunucuya yazdim bilgisi gönderir.

• Protects Against Loss of an Entire Datacenter– Power outage, fires, hurricanes, floods, earthquakes, terrorism

• Automates Failover– Reduced downtime– Lower complexity of disaster recovery plan

• Reduces Administrative Overhead– Automatically synchronize application and cluster changes– Easier to keep consistent than unclustered servers

Page 23: Webcast - Failover Cluster Architecture

Synchronous Replication

• Host receives “write complete” response from the storage after the data is successfully written on both storage devices

PrimaryStorage

SecondaryStorage

WriteComplete

Replication

Acknowledgement

WriteRequest

Page 24: Webcast - Failover Cluster Architecture

WANSite A

Multi-Site Clustering Review

Site B

Site C

SAN SAN

4, 6, 8… nodes + FSW = odd # votesLocal failover first (preferred owner)Site failover second (possible owner)AntiAffinityClassNames

File Share Witness

Replicated Storage from vendor

Faster DNS UpdatesRegister all IPs for a Network NameShorten client’s DNS record TTLEnsure application tries all IPs

Encrypt WAN traffic for securityAdjust health checks for latency

Configure ‘OR’ dependencies

Page 25: Webcast - Failover Cluster Architecture

Sorular & Teşekkürler

Teşekkürlerhttp://yukselis.wordpress.com

Page 26: Webcast - Failover Cluster Architecture

Cluster Nedir, Niçin Kullanıyoruz

• Cluster Blog– http://blogs.msdn.com/b/clustering/

• Technet Failover Cluster– http://technet.microsoft.com/en-us/library/cc754482.aspx

• Configuring Auditing for a Windows Server 2008 Failover Cluster– http://blogs.technet.com/b/askcore/archive/2009/01/19/configuring-auditing-for-a-windows-server-2008-failover-cluster.aspx

• Top Issues for Microsoft Support for Windows 2008 Failover Clusters– http://

blogs.technet.com/b/askcore/archive/2008/10/13/top-issues-for-microsoft-support-for-windows-2008-failover-clusters.aspx

• Checklist: Create a Clustered Virtual Machine– http://technet.microsoft.com/en-us/library/dd759220.aspx

• Top Issues for Microsoft Support for Windows 2008 Failover Clusters– http://

blogs.technet.com/b/askcore/archive/2008/10/13/top-issues-for-microsoft-support-for-windows-2008-failover-clusters.aspx

• Failover Clusters in Windows Server 2008 R2– http://technet.microsoft.com/en-us/library/ff182338(WS.10).aspx

• TechEd 2011 demo install step-by-step (Hyper-V, AD, DNS, iSCSI Target, File Server Cluster, SQL Server over SMB2)

– http://blogs.technet.com/b/josebda/archive/2011/05/19/teched-2011-demo-install-step-by-step-hyper-v-ad-dns-iscsi-target-file-server-cluster-sql-server-over-smb2.aspx