Data Centre Operations Blog

Human Errors in Data Centres

Intrigued by human error in data centres I came across David Smith’s book on Reliability Maintainability and Risk where he describes TESEO (empirical technique to estimate operator failure) by G.C.Bellow and V.Colombari. The principle is that the probability of failure is the product of each of the factors within the following five groups. 1) Activity Difficulty 2) Time Stress 3) Operator Experience 4) Task Related Anxiety 5) Ergonomic Design In this table I.. Read More

Is Unapplied Training Pointless?

This is the title of an excellent book by Duffey and Saull, that analyses the space, nuclear, aviation, chemical and other industries and reports that 80% of all failures are down to human error. This correlates well with the Uptime Institute’s reports of approximately 70% of data centre failures attributed to human error. With time we ALL become complacent and therefore it is better to plan for an inevitable rare failure

Improved Resilience Through Reduced Complexity and Increased Training

There is sufficient research into the causes of failure to assert that any system with a human interface will eventually fail. In the data centre, as with other industries, human error is believed to account for as much as 80% of downtime. Limiting these interfaces and the design complexity, and continually training the humans that operate them is therefore imperative for resilient data centres. The biggest single barrier to risk reduction is knowledge.. Read More

“Human Unawareness” of Energy Saving Potential

While “human error” is responsible for most mission critical facilities failures, “human unawareness” is responsible for easily avoidable energy wastage in data centres. For most data centres 10-30% energy savings can be achieved with low investments. In a typical 1000m2 raised floor data centre, savings of hundreds of thousands of £, USD, Euros per year can be achieved with Return on Investments (ROI) under a year. Air management is normally the fundamental first.. Read More

“Managing Risk: The Human Element”

This is the title of an excellent book by Duffey and Saull, that analyses the space, nuclear, aviation, chemical and other industries and reports that 80% of all failures are down to human error. This correlates well with the Uptime Institute’s reports of approximately 70% of data centre failures attributed to human error. Duffey and Saull construct a human failure rate bath tub curve and explain the Universal Learning Curve as an exponential.. Read More

Complexity and The Human Element

Operational Intelligence was founded on the understanding that significant risk and energy reduction within the data center environment could only be achieved through an active engagement with operations teams across all disciplines. Risk and energy reduction may be the responsibility of an individual, but it can only be delivered if there is commitment from all stakeholders.

Integrated Systems Test (IST) & Missed Opportunities

Based on feedback from Operational Intelligence Ltd’s Optimisation Workshops, David Cameron offers some feedback on the missed opportunity for knowledge transfer at the completion of the construction phase of a project and how the traditional structure of project teams is limiting the transfer of essential knowledge to the operations team. He claims that the industry is aware of this problem but perceives very little is being done to improve the situation due to the.. Read More

Data Centres: The Human Element

Human and management errors are the root cause of most failures and energy wastage in data centres. Learning curves for organisations and operators have been developed for several industries, such as nuclear power, space travel, chemical, aeronautical, medical. Operator depth of experience can be improved through effective training, thereby decreasing failure rates, optimising energy performance and reducing staff turnover. The main issues are management complacency, inter-team communication, air management ownership and metrics, risk awareness,.. Read More