Ed Batalla
Transcription
Ed Batalla
NERC Monitoring and Situational Awareness Conference: Loss of Control Center Procedures and Testing Practices Ed Batalla – Director of Technology Florida Power & Light Company September 19, 2013 Agenda • Florida Power & Light Company (FPL) Overview – FPL EMS Overview – FPL Control Centers / Infrastructure Philosophy • Problem Statement – EMS Availability – Viability of Backup Processes • Solutions – PLAN: Solidify Backup Procedures – PREPARE: Failover Testing Processes – MITIGATE: Improve Alternative Real-Time Assessment Tools/Methods • Q&A 2 Florida Power & Light (FPL) is the largest electric utility in Florida and one of the largest rate-regulated utilities in the United States FPL Overview • FPL is a subsidiary of NextEra Energy, Inc. (NEE) • One of the largest U.S. electric utilities • Vertically integrated, retail rate-regulated • 4.6 MM customer accounts • 24,653 MW in operation • $10.1 B in operating revenues • $36 B in total assets NOTE: All data as of June 30, 2013, except operating revenue which is for the year ended December 31, 2012. 3 FPL’s Energy Management System (EMS) was upgraded on 11/10/2012 – it is a major component of the suite of missioncritical systems grouped as Grid Control Systems (GCS) Energy Management System (EMS) Overview • Vendor: – Commissioned: 11/10/2012 – Version: • Benefits of the upgrade – Improved redundancy and geographic diversity – Advanced reliability tools – Enhanced cyber security (user authentication) • EMS Interfaces – – – – Power Plants Substations (T&D) External Utilities Distribution Control Centers – Performance Diagnostic Centers – Historian Systems – Other Corporate Systems FPL’s EMS was upgraded to modernize system technology and infrastructure 4 FPL has geographically diverse and redundant control centers which contain the Grid Control Systems (GCS) – an integral part of our business continuity and recovery plans Backup Control Center (BUCC) Facility – FPL EMS is also used by FRCC Reliability Coordinator (RC) Fully functional backup control center capable of being activated within the required standards and has tools that closely replicate the primary control center and minimizes activation confusion – Distribution Control Centers (use and access same EMS) System Control Center (SCC) Facility Local Backup Control Center Remote Backup Control Center 5 The control room has emergency communication methods for loss of the primary communication tools; also has access to the Backup Control Center FPL’s EMS uses state-of-the-art infrastructure systems and technology Infrastructure Design/Philosophy • Geographically diverse and redundant control center facilities – Each facility equipped with redundant systems, but each facility backs up the other facility – Connected via dedicated and redundant communication links • Diverse and redundant communications to all FPL substations – SCADA data from FPL substations are dual-scanned from both the primary and backup control center facilities • Cybersecurity via defense-in-depth philosophy (logical separation of control center network from corporate network) 6 There are event categories directly related to the loss of monitoring or control functionality for control centers – procedures and testing practices need to be strengthened Event Categories • Category 1 Events – Unplanned evacuation from a control center facility with BPS SCADA functionality – Loss of monitoring or control, at a control center, such that it significantly affects the entity’s ability to make operating decisions • Category 2 Events – Complete loss of all BPS control center voice communication systems – Complete loss of SCADA, control or monitoring functionality Source: Electric Reliability Organization Event Analysis Process – Version 2 (July 2013) 7 FPL implemented solutions to improve control center procedures and testing practices Solutions • PLAN: Solidify recovery and business continuity plans • PREPARE: Validate backup control center processes through actual technology viability validation – Failover processes – Track performance • MITIGATE: Continuous improvements on alternative real-time assessment tools Measuring performance is key to ensure backup process viability 8 FPL has extensive business continuity / recovery plans for all electronic systems Business Continuity Plan (BCP) / Recovery Plan (RP) • Primary Control Center Facility – Manned with Operators (RC, TOP, BA, IA) – Technology Support Provided through Operational Technology Center (OTC) and callout support • Backup Control Center – Full redundant system (with diverse communication paths) – Unmanned – no Operators BCP and RP specify criteria for evacuation – Technology Support -- remotely supported (technical team located at primary facility) The recovery plan is reviewed and tested annually pursuant to NERC CIP Standards 9 BCP and RP has defined a set of criteria to evacuate the primary control facility Criteria for Evacuation • Evacuation Criteria – Incapacitated facility (fire, terrorist attack) – Total loss of building power supply – Critical function unavailability at the primary control center facility With no ability to connect to backup control center servers Redundant pair EMS or SCADA Front End not available • Evacuation Process and Interim Provisions – Since the backup control center is unmanned, FPL implemented interim control centers to facilitate evacuation process to meet the EOP Standards Local Backup Control Center (adjacent building) Remote Backup Control Center (approx. 5 miles away) 10 Health of the system is monitored constantly by 24/7 technology personnel System and Application Health Check FPL established the Operational Technology Center (OTC) organization to improve “operational certainty” 11 Improved monitoring process capabilities with alarming on CA failures CA Solution Progress Monitoring Dashboards were developed to improve situational awareness on real-time assessment tool viability OTC and operators are instructed to be on the high alert for CA yellow and red bar displays 12 A “CONS OPS” button has been added to access Conservative Operations display shows contingency data at key interfaces (in case CA is not available) Conservative Operations “Cons Ops” Button 13 FPL’s EMS/SCADA uses high-speed replication of data between two control center sites MRS to Support Failover Schemes • Memory Replication Services (MRS) is used to replicate data between systems • Essential to support failover schemes – Critical operational data is replicated automatically MRS provides a state-of-the-art, high availability configuration and failover scheme for FPL’s EMS/SCADA 14 FPL purposely performs a weekly scheduled failover of its EMS (and corresponding peripheral systems) to ensure viability of its critical systems (primary and backup systems) Testing Practices • Weekly Scheduled Failover Test – Cycle through different system configuration to ensure viability – Ensures that failover logic is fully functional – Coordinated with all users (done every Wednesday morning) • Other – EOP processes and procedures reinforced during operator training Operators cycle through familiarization of backup systems and control centers – CIP required test of recovery plan fulfilled (at least annually) 15 Unplanned EMS Unavailability Cumulative Downtime YTD = 28.98 minutes EMS Unplanned Unavailability Target (Max minutes downtime) 50 45 40 35 30 G O O D 25 20 15 10 5 0 Jan Feb Mar Apr May Jun Planned and Unplanned Jul Aug Sep Oct Nov Dec Unplanned FPL tracks cumulative EMS unavailability as part of its key performance indicators 16 Depending on the situation, FPL incorporated processes for operators to access alternative real-time assessment tools Alternative Real-Time Assessment Tools • Flowgate Display – Pre-calculated flowgate limits – Uses data that is not dependent on the EMS – Other displays were developed that uses non-EMS data sources • CA solution sharing between reliability entities within the FRCC – Currently being piloted 17 FPL has three major focus areas for its continuous improvement plan to continue to strengthen control center procedures and testing practices Final Note • Reliability entities need to continue to strengthen control center procedures and testing practices • FPL has three major focus areas for its continuous improvement plan: – PLAN: Solidify recovery and business continuity plans – PREPARE: Validate backup control center processes through actual technology viability validation – MITIGATE: Continuous improvements on alternative real-time assessment tools 18