• Nem Talált Eredményt

IT Infrastructure Scale in Numbers

N/A
N/A
Protected

Academic year: 2023

Ossza meg "IT Infrastructure Scale in Numbers"

Copied!
18
0
0

Teljes szövegt

(1)

Components and Their Operation

Balázs Kuti

(2)

Challenges faced by enterprises today, scale of the IT plant

Diversity of an IT plant

Key Server Infrastructure Components

Configuration Management

ITIL, IT Support Models

Change and Risk Management

Data Centers

Q&A

(3)

IT Challenges of Enterprises today

• Challenges:

Scale

Deployment and OS build

OS & Configuration Diversity/Hygiene

Support personnel

High availability/resiliency

Special HW (trader desktops)

Environment, power saving

(4)

IT Infrastructure Scale in Numbers

The most popular social network’s server count: 60,000 +

• Physical expansion

• Capacity planning

(5)

IT Infrastructure Scale in Numbers

Unix / linux

Windows

SAN / NAS

(6)

Diversity of an IT plant

Every effort is made to have uniform components (e.g. hw models, software components)

Avoid vendor locking (price competition, delivery capability, service quality)

Lifecycle management (HW and SW), decommission is often a pain

Custom solutions

Wrappers, for easier work

Central configuration database

Access and auditing

Protection from mistakes

Examples: managing VMWare servers from Unix command line, manipulating NAS filers and shares, managing SAN configuration

Self service, post-build custom application profiles

(7)

Key Components of the IT Infrastructure

• Network and Boot services

DNS, DHCP, PXE, Printing, Monitoring

• Security components

Firewalls, network monitoring

• Store user information (authentication/authorization)

Active Directory, LDAP

• Cross-platform authentication

Kerberos

• Lifecycle and configuration management

Distribution servers, Configuration and patch management, CMDB

(8)

Grid Node management

• Configuration management for tens of thousands of nodes

• Utilization and health monitoring

• Managing node allocations and chargeback

• Single or multiple schedulers

• Low HW specification

• Special network configuration

• Storage issues

(9)

Change and Risk Management

• What is change management?

• Change / Configuration / Release Management

Development and testing

Approval process

Importance of checkout and backout

• Major incidents can be caused by minor changes

• Blackout periods

(10)

Change and Risk Management

• How to make it measurable?

• Identify – Prioritize – Plan and Schedule – Track and Report

• Examples

Data Center in Iceland

(11)

Support model

• Why do we need support model?

• Who are the customers?

• ITIL (Service Desk, L1-L2-L3-Eng, ECC, local IT support), Service Managers, SLA

• Follow the Sun

Availability Downtime [mins]

99.999% 525

99.9999% 52

99.99999% 5

(12)

Data Centers

Problem

Safe and reliable centralized operation of the IT infrastructure under extreme

circumstances

Design

Many engineering disciplines involved

Site selection criteria

Accommodate computers, storage, backup, network equipment

Accommodate supplementary equipment:

Fire extinguisher, cooling, UPS, Generators, fuel, etc.

Redundant network (IP, FC) and grid connection on physically different paths

Security (physical, internal, external)

Change, risk, vendor management

CO2 emission, green technologies

(13)

HOURS 8000 7500 7000 6500 6000 5500 5000

Datacenter Site Strategy

Property price

Risk assessment:

Political stability

Economy

Natural, terrorist disasters

Green energy sources:

Hydro- , solar-, wind power

Waste heat recycling opportunities

IBM’s DC in Switzerland heats a town swimming pool

Cheap cooling (air and/or water)

Independent and high capacity

Power sources

Dark Blue Zone: Free cooling available for circa 8000hrs per year (91%)

(1 year = 8760 hours)

Data hall recommended range: 18ºC - 27ºC

Google - St. Ghislain HP - Wynyard

Microsoft - Dublin

(14)

Data Center Scale and Management

IT vs. non-IT floor space up to 1:1

Power usage monitoring (Powerdown events)

Finding and fixing cooling inefficiencies

(15)

Classification and Operation Models

Resiliency Levels: Tier 1-2-3-4 Operation model

Rent computing power from the “Cloud”

(Amazon, HP, Oracle)

Rent a facility with personnel

Buy a facility

BCP site ration models Tier

Level Requirements 1

•Single non-redundant distribution path serving the IT equipment

•Non-redundant capacity components

•Basic site infrastructure guaranteeing 99.671%availability

2

•Fulfils all Tier 1 requirements

•Redundant site infrastructure capacity components guaranteeing 99.741% availability

3

•Fulfils all Tier 1 & Tier 2 requirements

•Multiple independent distribution paths serving the IT equipment

•All IT equipment must be dual-powered and fully compatible with the topology of a site's architecture

•Concurrently maintainable site infrastructure guaranteeing 99.982%availability

4

•Fulfils all Tier 1, Tier 2 and Tier 3 requirements

•All cooling equipment is independently dual-powered, including chillers and Heating, Ventilating and Air Conditioning (HVAC) systems

•Fault tolerant site infrastructure with electrical power storage and distribution facilities guaranteeing 99.995%

(16)

Hardware Implementation

The Google Way Traditional solutions:

blade chassis, IBM iDataPlex HP Spartans with top-of-rack switch

(17)

Q & A

(18)

Questions for invaluable prize

How would you make the Grid power consumption more efficient?

What kind of performance counters would you check if there’s a suspected disks subsystem performance issue?

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

The study found that factors of school learning environment such as infrastructure, facilities, teacher quality, teaching approaches, academic support,

[Table 2.], [8, 15, 16, 17] The cloud level deals on the one hand with errors and problems related to the infrastructure (hardware or network failures), on the other hand with

 application layer of IoT infrastructure (Data-Centric IoT). The development of IoT infrastructure in smart cities includes: development platforms, accessing

It was possible to value parameters such as cycling impacts in society, cycling infrastructure development and its spatial behavior, perceived safety, morale, safe

That data growth, in turn, is driving IT leaders to deploy increasing amounts of storage hardware in data centers, to store more data in the cloud, and to increase implementations

Fault Tolerant Power Converter Topologies for Sensor-less Speed Control of PMSM Drives.. Mongi Moujahed 1* , Bilel Touaiti 1 , Hechmi Ben Azza 1 , Mohamed Jemli 1 , Mohamed

To deal with complex disturbances and the presence of partial loss of propeller effectiveness in work-class remotely operated vehicles (ROVs), a method of robust fault

To provide a comfortable IEV movement it is required: to lay bike trails, bike paths and lanes, to organize means of traffic, to build cycling infrastructure facilities, to