Based on feedback from Operational Intelligence Ltd’s Optimisation Workshops, David Cameron offers some feedback on the missed opportunity for knowledge transfer at the completion of the construction phase of a project and how the traditional structure of project teams is limiting the transfer of essential knowledge to the operations team. He claims that the industry is aware of this problem but perceives very little is being done to improve the situation due to the need for all stakeholders to work together.

Two modifications to the existing process are proposed:

  1. Prepare a Concept Design or Basis of Design document at the outset of the project and update this regularly throughout the design, construction and commissioning process. This should then form part of the handover documentation and will provide the reader a very clear overview of the purpose and limitations of the facility. This should also be used as the reference point for future upgrades and there should be an obligation on the operations team to keep it up to date.
  2. Change the final milestone from ‘project handover/completion’ to ‘design and construction knowledge transfer’ which provides a better description and focus. There are contractual definitions for Practical and Project Completion however provided that knowledge transfer is stated as a key deliverable of practical completion there should be no contradiction.

Background

The Integrated Systems Test (IST) is an accepted part of the data centre delivery model and very few people would argue against its inclusion in a new build or major refurbishment project delivery plan.

The IST definition varies from project to project however for the purposes of this paper we will define it as the final demonstration of the critical infrastructure to determine that ‘the designed levels of resilience and redundancy have been achieved in practice’.

IST is also known as level 5 commissioning and follows on from level 4 (systems testing), level 3 (equipment testing), level 2 (dead testing, pressure testing and flushing) and level 1 (factory acceptance tests).

Although there are more formal definitions, in this context resilience is defined as the ability to withstand a set of pre-determined failure events. Redundancy (N+m; where ‘N’ is the number of capacity components required to support the critical design load and ‘m’ is the number of redundant capacity components) allows for certain failure events to occur without impacting on the system’s ability to support the critical design load or allow maintenance activities to be undertaken without the need for a shutdown of the critical infrastructure.

The fundamental requirement for an IST is independent of levels of resilience and redundancy. As noted above it is generally a demonstration that ‘the designed levels of resilience and redundancy have been achieved in practice’.

The project team structure below is a generalisation and there are many different arrangements, however in principle we generally have a client who engages with a design team to produce a design that reflects his needs. This can be tendered and the successful contractor is engaged to deliver the project. The contractor may be able to select the specialist equipment based on the specification or may have to use suppliers named by the client. However at this stage of the project it would be unusual for the operational interfaces between specialist equipment to be identified in any more detail than an overview description of operations. Similarly the extent of systems commissioning (Level 4 commissioning) and IST (Level 5 commissioning) is only identified as an overview i.e. all critical systems to be tested and at what load conditions.


1-IST-Organisational-Strucutre-Operational-Intelligence

 

The two figures below show the Kolb learning cycle which is considered the most effective method of learning. This states that in order to learn effectively we must touch all 4 quadrants and that experience, reflection, design and experience are all equally important.

The difficulty in the construction industry is that these quadrants tend to be dominated by different business sectors and the interfaces become barriers to effective communication, learning and effective transfer of knowledge. In addition they tend to be contractual boundaries and as such the documentation tends to be ‘limiting’ as opposed to ‘all encompassing’.

 


Learning-Cycle
         Learning-Cycle-Project-Roles

The interface between designer and installation contractor is ‘mature’ and is such that any aspect of the design that is not developed by the designer is to be designed by the installation contractor. Typically this would include all the system integration interfaces required to deliver a working system. It is for this reason that there is a trend towards contractor design and build projects where this interface is effectively eliminated.

The interface between client and designer (or design build contractor) is particularly important and ASHRAE have proposed the development of Owners Project Requirements (OPR) document which identifies WHAT the client requires and a Basis of Design document (BOD) which identifies HOW it will work. The BOD approximates to a Concept Design document however there is a tendency for the Concept Design report to be superseded by the detailed design however the BOD is intended to be modified throughout the project so that the BOD at handover reflects the installed systems and commissioning that has taken place.

The two remaining interfaces vary from project to project with little consistency.

Taking our earlier project team and adding it to the quadrants of the learning cycle we get:

4-IST-Organisational-Strucutre-Add-Documentation-Operational-Intelligence

Traditional construction handover deliverables

Due to the way in which projects are traditionally procured the entire handover process can purely become a method for demonstrating completion of contractual obligations. Generally the focus is on:

  1. The capacity performance of the equipment (normally but not exclusively at full design load)
  2. Automatic recovery from pre-defined failure events
  3. Demonstration that pre-defined failure events are notified to a monitoring system.
  4. Demonstration that critical plant status (monitoring and measurement) is notified to a monitoring system.
  5. Collating a set of operation and maintenance manuals.
  6. Collating a set of record drawings
  7. Arranging for the specialist suppliers to provide handover training.

 

There is no doubt that this is important however if we accept that 70-80% of all critical system failures are as a result of human error (Uptime Institute and ‘Managing Risk; The Human Element’ by Duffy and Saul.) or in particular lack of awareness, then the potential value of the systems commissioning (Level 4 commissioning) and IST (Level 5 commissioning) is far more than this.

The transfer of knowledge from the construction team to the operations team is always difficult regardless of the type of project being delivered and impacts on both risk of downtime and energy efficiency. The later has been identified by BSRIA as a problem area with regard to energy performance of commercial office buildings and led to the introduction of their concept of ‘Soft Landings’ and seasonal commissioning. However in critical facilities these difficulties also introduce an increased risk of downtime.

If we consider the universal learning curve we are aware that maximum risk occurs at minimal experience.


Universal Learning Curve

 

We can see that maximum risk occurs at minimal experience and in the context of critical facilities experience must include a site specific element.

This curve also indicates the two aspects to risk reduction. The first part is based on the accumulated knowledge and shared experience of the organisation while the second is based on the knowledge and experience of the individual. To reduce risk we must address both of these elements.

Taking each of the points (1-7) above in turn allows us to consider how much more effective the traditional handover process could be.

The capacity performance of the equipment (normally at full design load)

Demonstrating the capacity performance of equipment at full load conditions is important in showing that the supplier has delivered an appropriately sized item of plant however there is far more that can be gained for this period of testing.

Of particular relevance to the operations team is the energy and stability performance at no-load conditions. Fixed no-load losses are the starting point for any future PUE analysis (refer to OI paper Scalable Data Centre Efficiency at www.dc-oi.com).

Full load performance is important for contractual reasons however from an operations perspective part load performance is far more relevant.

Through our site based operation learning program we come across many facilities that are running at part load in an inefficient way purely because the facility was optimised to full load conditions during commissioning. The operations team have no reference point to make changes to optimise performance and are concerned that any change they make may remove the contractual responsibility from the design/construction team. Clearly this is a barrier to effective energy optimisation.

Automatic recovery under pre-defined failure events

This is generally used to demonstrate the satisfaction of failure scenarios identified within the specification and the scope is dependent on the specified level of resilience.

For basic concurrent maintainability a good demonstration would be the independent isolation and re-instatement of all component parts of the critical services infrastructure for an extended period. Satisfactory demonstration of this requirement would fulfil the contractual obligations however it does not demonstrate how the systems will operate under failure events.

This is an area of divergence from the contractual obligations on the installation contractor and the requirements of the operations team. Within a concurrently maintainable design components can and will fail and although such failures may be acceptable within the design it’s still important that the operations team understand the implications of such failure events but more importantly how they would recover from them.

Demonstration that pre-defined failure events and plant status are notified to a monitoring system.

One of the principal reasons for a total mains failure test is that it generates the most alarms and as a result bombards the monitoring station with hundreds of alarm reports all of differing priorities. Consultants, contractors and specialists will all have an idea of how these alarms should be prioritised and who should be notified by email/SMS etc. These priorities are based on experience and/or engineering discipline but the tendency is always to over prioritise on the basis that no one ever got criticised for categorising a priority 3 alarm (low priority alarm) as a priority 1 (high priority alarm).

 

6-Typical-Alarm-Activation-Data-Centre-Operational-Intelligence

Once the facility is handed over to the operations team their next opportunity to witness a full mains failure event is likely to be a real event. Following such an event the alarm priorities are often modified.

The alarm priority list developed by an operations team will be different from that developed by a consultant, contractor and specialist supplier as their terms of reference and experience are totally different. There are therefore significant benefits in engaging with the operations team during the development of the equipment references, graphical and alarm interfaces.

 

Collating a set of operation and maintenance manuals.

O&M documentation needs to be relevant and of use to the operations team yet they are rarely involved in the review process. Who better to review the documentation or better still, who better to specify the makeup of the manuals than the operations team?

In addition to the normal contents the manuals should also include the Basis of Design document and the Close Out Report from the level 5 commissioning.

 

Collating a set of record drawings

The record drawings are generally the initial reference point for routine maintenance and fault recovery events and it’s important that they contain the same references and notation as the equipment in the field.

Inaccuracies in equipment references between drawings and plant has been cited as a contributing factor in failure events where the meantime to recover (MTTR) has been extended due to uncertainty in correctly identifying equipment and circuits in the pressurised environment of a real life scenario.

Arranging for the specialist suppliers to provide handover training.

The responsibility for handover training is generally with the installation contractor but is passed down to the specialist supplier and is generally delivered by a commissioning or sales engineer. The focus is always on the specific equipment and often misses its relevance in the overall system and in particular the operational interfaces.

Training is better received when delivered in context because the practical experience reinforces the theory. The equipment in isolation is important however it does not operate on its own. It has operational interfaces which also need to be understood, maintained and tested. These interfaces tend to fall between suppliers and they are seldom highlighted during supplier training.

The focus of handover training for critical facilities should be the transfer of all knowledge from the construction team to the operations team and this can only be conducted effectively by allowing the operations team time to get familiar with the systems prior to undertaking any training. To this end the level 4 and 5 testing are perfect introductions into the operational context of the equipment.

Handover training should include:

  • A review of the project brief (ideally the Basis of Design document)
  • Presentation of the overall schematics and layout drawings.
  • A review of the high level commissioning plan
  • Scope and purpose of monitoring and control systems
  • Program for level 4 and 5 commissioning. (to allow attendance)
  • Detailed review of asset list.

 

It is important that the operations team are allowed time to review and comment on the failure scenarios covered under the level 4 and 5 commissioning as the tests need to be relevant and comprehensive if they are to satisfy the objective of maximum information transfer as opposed to satisfaction of a contractual milestone.

Operations team initial tasks following handover

One of the initial tasks for the operations team is the development of Standard and Emergency Operating Procedures (SOP, EOP). While it is accepted that SOP’s can be based on good practice and experience and as such can be similar from project to project, the EOP’s are site specific. The order that systems must be recovered is based on dynamic conditions and will vary depending on the failure event. In general the EOP’s are site specific and the starting point for their development should be the level 4 (systems tests) and level 5 (integrated systems test) commissioning tests.

 

Conclusion

Operational Intelligence Ltd has facilitated many workshops on risk and energy applied to data centres and the transfer of knowledge from the design/construction team to the operations team is an evident weakness in both new and legacy data centres.

The process developed in the past is not addressing this issue and it needs a significant change of mind set from all stakeholders including Operations Teams, Maintenance Contractors, Construction teams and consultants. When we talk to individual companies they invariably support this view however we don’t perceive any improvement. The problem does not sit with any one party and it needs to be addressed as an industry. The following are cited as typical reasons for failure to improve the process:

  1. ‘It’s the clients fault for not appointing the maintenance team early enough’.
  2. ‘We gave the operations team an open invite to attend but they were too busy’
  3. ‘The training and information provided at handover was inadequate but they were given practical completion anyway’.
  4. We can’t rely on the record documentation so we’re not changing anything.’
  5. The handover documentation gives us a record of what was built, associated commissioning figures but nothing to say what it was designed to achieve’
  6. ‘We don’t need to see how it works. Just tell us how to start it, stop it and who to call if it breaks down.
  7. Everyone who knew anything about the project left once they achieved PC. They want to be paid to come back.’
  8. ‘The control system is too complicated so we intend to put it into manual and operate it like that’

 

When we talk specifically about energy optimisation we get comments along the lines of ‘we’re not changing anything otherwise it becomes our responsibility.’ This is preventing facilities from realising significant energy savings. A conservative estimate based on our experience sets this figure at 10% of annual energy consumption and applies to new as well as legacy data centres.

We would propose two modifications to the existing process:

  1. Prepare a Concept Design or Basis of Design document at the outset of the project and update this regularly throughout the design, construction and commissioning process. This should then form part of the handover documentation and will provide the reader a very clear overview of the purpose and limitations of the facility. This should also be used as the reference point for future upgrades and there should be an obligation on the operations team to keep it up to date.
  2. Change the final milestone from ‘project handover/completion’ to ‘design and construction knowledge transfer’ which provides a better description and focus. There are contractual definitions for Practical and Project Completion however provided that knowledge transfer is stated as a key deliverable of practical completion there should be no contradiction.