by Kathryn Donelson
At the height of the storm, the main complex had been evacuated, except for essential staff who stayed to monitor the security of the facility and its critical equipment. Attention was fixed on the pumps that would be used to remove water from the basement should flooding occur.
Sitting adjacent to the swollen Addicks Reservoir and bordered by its southern earthen dam, the complex was in an area of increasing vulnerability due to the relentless rain and flooding.
On Tuesday, Aug. 29, due to worsening conditions, Real Estate & Facilities Services (REFS) contacted Information Technology (IT) to discuss pulling critical staff from the building. With no line of sight to the water levels or pumps, the electrical system would be at risk — as well as the data center it supplies.
“We couldn’t take chances,” said Derek Davis, IT Infrastructure & Operations manager. “It was clear we’d need to initiate our data center disaster recovery (DR) plan, which in effect relocates much of our Houston data center operation to our backup facility in our Bartlesville, Okla. offices.”
Davis’ teams had been huddling for days to discuss contingency plans and identify critical points of contact. This planning proved key when it came time to execute the plan.
Preparation also came into play in other important ways.
“Our DR plan makes provisions for the most critical set of systems to be recovered in Bartlesville, but it doesn’t cover everything,” said Darren McInturff, Unix-Linux Server Operations supervisor.
However, the team had recently installed new storage and hadn’t yet retired the old storage, so they had additional capacity available. Also, the team had recently wrapped up a two-year effort to modernize the data center and transition some of the computing environment to the cloud, creating additional capacity.
“Combined, this gave us the ability to implement not only our DR plan, but also a full data center rescue — 658 servers housing 1.6 petabytes of data, to be exact,” said McInturff.
McInturff was tapped to lead the transition, pinch-hitting for Todd Fink, supervisor, Data Center & Backup Operations, who was out of commission due to mandatory evacuation of his neighborhood.
With minimal exceptions, the team planned to transition all data and computing to Bartlesville.
The full transition took about 10 hours — an impressive hustle given the vast amount of data moved.
Weeks later, following a complex partial restore of the systems that support Geology, Geophysics & Reservoir Engineering, the team began discussions with the business to determine a date for the full data center restore back to Houston, as well as the hours of testing required to ensure a successful transition.
On Saturday, Sept. 30, over a period of 12 hours and with all hands on deck, the data center transition was completed.
“It helped that we were able to do much of this work remotely and from our offices in Bartlesville,” said Lee Roberts, director, Monitoring, Analysis & Production Services. “This relieved pressure on our colleagues in Houston who were attending to their families and homes. That we were able to move this massive amount of data in such a short time — with virtually no disruption — was truly remarkable.”
All told, approximately 100 people were involved in the Herculean effort of moving the Houston Data Center operation to Bartlesville and back again.
“Most of us in IT work our entire careers without ever having to execute a full DR plan — let alone a full data center rescue,” said Davis. “This was a first for me. We certainly prepare for and practice disaster recovery plans, but we’ve never had to execute an entire event. For it to have gone so smoothly is important confirmation of the team’s excellent preparation, collaboration and dedication to quality. I couldn’t be prouder of them.”