Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The IBM Cloud Delivery Services SRE team has monitoring in place for all sites and infrastructure under our control.  These are designed to allow the CDS SRE team to pro-actively respond to service impacting or service threatening events or conditions.  When a site is unavailable, or there are infrastructure issues leading to monitor alerts, an ‘Incident Record’ incident record is automatically generated within our Incident Management System.

At the same time, for production environments, a CDS SRE Incident Response Team (IRT) provides 24/7 critical outage support. The goal of IRT is to ensure our customer's applications are running when they should, and to provide effective and timely customer communication during availability incidents or Severity 1 cases during off hours. IRT is sometimes referred to as the "on call" team.

Please note IRT is not considered standard support. It is for emergency and Sev1 cases only. Please see our Support & Operations section for standard support details and hours of operation.

How is the IBM CDS SRE Incident Response Team (IRT) organized?

The IRT is organized into a 2-person rotating schedule on 8-hour cycles over 7 days. This means that there are two IRT members for each 8 hour period: a Client Communicator and a First Responder.
CDS uses a regional region based “follow the sun” support model. The IRT schedule is maintained and updated by CDS on a regular basis.

...

  1. Determine the impact of the alert or case

  2. Determine the cause of the alert or case

  3. Initiate corrective action if appropriate

  4. Alert the Client Coordinator Communicator if escalation is determined necessary.

The IBM first responder’s priority will be to restore service.  The IBM client coordinator communicator is notified if there are any challenges to restoring service.  The IBM client coordinator communicator will lead the recovery activities and escalate to any personnel required to resolve the issue, while also ensuring that continuous communication is maintained with the customer throughout the length of the incident.

Escalation Manager / Discipline Team Members
Additional support for IRT members is provided by an Escalation Manager as well as dedicated Database and Network discipline team members. These specific CDS IBM SRE individuals are assigned to the IRT schedule to also provide coverage.

...