Data Centre Operations Blog

Air-side free-cooling: direct or indirect systems?

Often when data centre free cooling is discussed, people assume this means direct fresh air cooling. People intuitively assume that direct cooling ismore efficient because there are less components. It’s as simple as opening a window!

However, in many cases this is absolutely incorrect. Indirect systems have unappreciated advantages.

(more…)

Human Errors in Data Centres

Intrigued by human error in data centres I came across David Smith’s book on Reliability Maintainability and Risk where he describes TESEO (empirical technique to estimate operator failure) by G.C.Bellow and V.Colombari.

The principle is that the probability of failure is the product of each of the factors within the following five groups.

1) Activity Difficulty

2) Time Stress

3) Operator Experience

4) Task Related Anxiety

5) Ergonomic Design

article01

In this table I have estimated an example of the probability of failure for a UPS upgrade and a mains power restoration having untrained and trained operators.

Human Errors in Data Centres

Whilst this tool is quite simplistic, it does provide some interesting conclusions for data centres:

1) Regular site tests reduce failures 10 times

2) Expert operatives (compared to average) reduce failures by half

3) Training reduces failures by a factor of 3

4) Regular site tests reduce anxiety failures by 1.5

5) Good visual display (ergonomics) can reduce failures by a factor of 1.5

 

 

 

 

 

More is required – How can this be achieved? Monitored? How can people help each other? What will this be like in the future?

Containment: The Future?

Cold aisle, hot aisle and rack exhaust containment are all effective air management methods to separate the hot and cold air streams in data centres. However, both cold aisle and hot aisle containment require working in hot areas (the hot aisle). The rack exhaust system has the potential for higher densities (all aisles are cold); this is restricted by the exhaust duct volume. For hot aisle containment and rack exhaust containment the area available for overhead cable trays is reduced. With today’s low utilisation of IT equipment, the hot air is around 5°C (9°F) to 10°C (18°F) warmer than the cold air supplied to IT equipment. Given that the design delta T of IT equipment at full load is 25°C (45°F) to 30°C (54°F), the tendency to increase IT equipment utilisation and to increase air supply temperatures (to save energy), it is likely that in the future we will have hot air streams (e.g. in hot aisles) of around 50°C (122°F), quite uncomfortably high.

The UK Health and Safety Executive states: “Where the temperature ….would otherwise be uncomfortably high…. all reasonable steps should be taken to achieve a reasonably comfortable temperature, for example by: ….providing air-cooling plant.”

For future designs and for energy efficiency reasons we do not want to provide cooling in hot aisles, so therefore we should we be thinking of containing the hot air to be out of the working area, e.g. rack exhaust containment systems or similar.

Is Unapplied Training Pointless?

This is the title of an excellent book by Duffey and Saull, that analyses the space, nuclear, aviation, chemical and other industries and reports that 80% of all failures are down to human error. This correlates well with the Uptime Institute’s reports of approximately 70% of data centre failures attributed to human error.

With time we ALL become complacent and therefore it is better to plan for an inevitable rare failure

(more…)

(ASHRAE 2015 Publication) – $1.7 Energy Efficiency Case Study

In 2014, Operational Intelligence and the Operations team of a global finance service firm achieved an exceptional result. Through energy assessments, data hall temperature measurements and operator education workshops we implemented air management improvements, optimised fan control and gradually increased air and chilled water temperature set points to achieve significant financial savings through reduced energy costs. On top of that we installed an indirect free cooling circuit. This resulted in a PUE reduction, from 2.3 to 1.49, an exceptional result for a legacy data centre. To find about more on the technical details:

European code of conduct on data centre energy efficiency participant usage guide

It can be difficult and frustrating to receive organisational funding and project approval. By holistically analysing which initiatives deliver the maximum benefit, operators can prioritise improvements and produce a strong business case to approve changes to reduce energy and mitigate risk. Our team help curate the European code of conduct on data centre energy efficiency participant usage guide. For the full publication, please follow the link.

Improved Resilience Through Reduced Complexity and Increased Training

There is sufficient research into the causes of failure to assert that any system with a human interface will eventually fail. In the data centre, as with other industries, human error is believed to account for as much as 80% of downtime. Limiting these interfaces and the design complexity, and continually training the humans that operate them is therefore imperative for resilient data centres.

The biggest single barrier to risk reduction is knowledge sharing and lack of risk awareness. Many sites document risk analyses, but often these are not shared with all the operators and therefore their impact is limited.

(more…)

Risks Become Tangible

The Gulf of Mexico oil spill has a tangible $100b liability for BP (and 32 million Google hits for “Gulf of Mexico oil”). The Blow out Preventer (BOP) failed despite having a low catastrophic failure rate (4 tests failed in around 90000 tests). Even if we assumed the MTBF (mean time between failures) to be around 100 year, this is still a high risk of $100b per failure / 100 years/failure = $1b/year. A risk worthwhile mitigating. There are reports suggesting that further redundancy should be considered (i.e. to remove the single points of failure and to minimise common cause failures). Assuming that this redundancy could have cost $10m, the simple payback of this is $10m / ($1b/year) = 4 days. Interestingly, now, a compensating provision of $10m does not seem to be a lot.

Trigeneration or Free-Cooling

The data centre industry is immersed in an interesting innovation process in search of higher efficiencies.

One idea is using trigeneration, where electricity is generated (via engine or turbine) and its residual heat is used as a heat source for an absorption chiller. For example, a car engine provides mechanical energy to move the car and waste heat for heating (radiator). Trigeneration systems tend to be more expensive and complex than traditional systems, but are able to save energy providing the waste heat is utilised all year round.

A simplistic analysis is that a data centre always requires electrical energy and refrigeration, therefore trigeneration is an ideal application.

However, when this is examined in more detail some problems are uncovered. Whilst we may need to
cool IT equipment, this does not always require refrigeration (chillers), as there are many free cooling (FC) solutions available. The following table summarises these requirements and features (UK typical values).

 

ilus5

Free cooling without chillers is the most interesting cost-effective option with lowest CO2 emissions.

In order to maximise free cooling potential we need good air management (including physical
containment of hot and cold air streams) and increase the supply air temperature from around 12°C to 20°C and higher. Whilst this is technically achievable, we also need to eliminate the logical apprehension of paradigm change.