Data Centre Operations Blog

Improved Resilience Through Reduced Complexity and Increased Training

There is sufficient research into the causes of failure to assert that any system with a human interface will eventually fail. In the data centre, as with other industries, human error is believed to account for as much as 80% of downtime. Limiting these interfaces and the design complexity, and continually training the humans that operate them is therefore imperative for resilient data centres. The biggest single barrier to risk reduction is knowledge.. Read More

Risks Become Tangible

The Gulf of Mexico oil spill has a tangible $100b liability for BP (and 32 million Google hits for “Gulf of Mexico oil”). The Blow out Preventer (BOP) failed despite having a low catastrophic failure rate (4 tests failed in around 90000 tests). Even if we assumed the MTBF (mean time between failures) to be around 100 year, this is still a high risk of $100b per failure / 100 years/failure = $1b/year… Read More

“Managing Risk: The Human Element”

This is the title of an excellent book by Duffey and Saull, that analyses the space, nuclear, aviation, chemical and other industries and reports that 80% of all failures are down to human error. This correlates well with the Uptime Institute’s reports of approximately 70% of data centre failures attributed to human error. Duffey and Saull construct a human failure rate bath tub curve and explain the Universal Learning Curve as an exponential.. Read More

Thermal runaway

Backup generators to start-up and power to be restored to the cooling systems before they restart. To overcome the high air temperatures entering IT equipment during this event, many new designs put CRAH fans and secondary chilled water pumps on UPS (so they are immune to a mains interruption) and may also include large chilled water storage tanks to provide cooling media inertia. The consensus amongst IT equipment manufacturers is a recommended server.. Read More

Compromising Commissioning

There is always pressure at the end of a project to reduce commissioning time. However, the interdependency dimensions of any project are financial, time and quality. Each one of these dimensions affects the other dimensions. If you shorten the programme time, then it will either cost more or the quality (of tests / commissioning) will be compromised. If you try to reduce the cost of it (cutting corners), again the quality will be.. Read More

Complexity and The Human Element

Operational Intelligence was founded on the understanding that significant risk and energy reduction within the data center environment could only be achieved through an active engagement with operations teams across all disciplines. Risk and energy reduction may be the responsibility of an individual, but it can only be delivered if there is commitment from all stakeholders.

Integrated Systems Test (IST) & Missed Opportunities

Based on feedback from Operational Intelligence Ltd’s Optimisation Workshops, David Cameron offers some feedback on the missed opportunity for knowledge transfer at the completion of the construction phase of a project and how the traditional structure of project teams is limiting the transfer of essential knowledge to the operations team. He claims that the industry is aware of this problem but perceives very little is being done to improve the situation due to the.. Read More

Data Centres: The Human Element

Human and management errors are the root cause of most failures and energy wastage in data centres. Learning curves for organisations and operators have been developed for several industries, such as nuclear power, space travel, chemical, aeronautical, medical. Operator depth of experience can be improved through effective training, thereby decreasing failure rates, optimising energy performance and reducing staff turnover. The main issues are management complacency, inter-team communication, air management ownership and metrics, risk awareness,.. Read More

Compromising Commissioning

Reflecting on Experience There is always pressure at the end of a project to reduce commissioning time. However, the interdependency dimensions of any project are financial, time and quality. Each one of these dimensions affects the other dimensions. Compromise One If you shorten the programme time, then it will either cost more or the quality (of tests / commissioning) will be compromised. Compromise Two If you try to reduce the cost of it (cutting.. Read More