Recent Status

From 3kWiki
Revision as of 23:16, 19 May 2009 by imported>Shadowspawn (Created page with 'VPSLink Subscriber - Firstly, please accept our sincere apologies for the events of the past twenty four hours - we have identified many issues with redundant systems which dese...')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

VPSLink Subscriber -

Firstly, please accept our sincere apologies for the events of the past twenty four hours - we have identified many issues with redundant systems which deserve an explanation and immediate action on our part. This incident represents a "perfect storm" of failures in our electrical failover systems and, as with any incident of this scope, we're taking every opportunity to improve our processes to prevent its recurrence.


To summarize the incidents of the past twenty four hours:


5/18/2009 15:10 PST - A measurable loss in voltage from our datacenter's main feeds at Seattle City Light was recorded. This loss in voltage triggered our UPS failovers and all hardware nodes and routers began running on UPS failover power.


5/18/2009 16:05 PST - UPS systems' battery reserves began to deplete and hardware nodes began to shut down. We have confirmed that the UPS systems were working properly - the loss in voltage tripped their circuits and forced all connections with our datacenter's feeds down. This action, unfortunately, prevented the UPS systems from recharging their batteries from available power following the brownout.


5/18/2009 19:00 PST - We did not have sufficient evidence to believe that the issue at Seattle City Light had been resolved. Our UPS service company and a technician arrived, an electrician from Titan Electric was dispatched, and our datacenter technicians switched from the Seattle City Light feed to diesel backup generator backups.

Upon confirming that all hardware nodes had sufficient power, our system administration team began bringing hardware nodes and key routing hardware back up (total potential outage time for all services at this point: three hours).

Our internal services (including VPSLink.com, our ticketing system, our billing system, the My.VPSLink.com control center, and the VPSLink forums) were unavailable for the duration of the outage. We were not able to send broadcasts or post to the forums regarding the cause of and action being taken on this issue.


5/18/2009 20:00 PST - All hardware nodes were restored to service and began fsck and VPS account procedures. Key routing hardware was restored to service.

Internal systems remained offline while our system administrators and developers worked to restore service to nodes hosting internal databases.


5/18/2009 21:30 PST - Our ticketing system was restored to service and broadcast replies were sent to all tickets received encouraging subscribers to submit a ticket if their service remained offline. Several hardware nodes were still in the process of completing fsck procedures and service had not yet been restored to all individual VPS accounts.


5/19/2009 01:30 PST - All internal services were restored to service and a notification was posted to the VPSLink Forums approximately eight hours after the incident began.


We recognize that hours of downtime without explanation or a way to establish contact with support and find out what has happened is unacceptable, and we want to ensure that everyone affected by this outage has the opportunity to claim service credits in accordance with our Service Level Agreement.

If you have not already claimed service credits for this incident, please submit a ticket to the VPSLink Billing department.


We are presently investigating appropriate courses of action to prevent anything like this from occurring again at our datacenter and we will post updates on progress at the VPSLink Forums:

VPSLink Datacenter Incident 5/18/2009 approx 18:30 PST - ongoing URL: http://forums.vpslink.com/system-network-status/9231-vpslink-datacenter-incident-5-18-2009-approx-18-30-pst-ongoing.html


Finally, we have had the opportunity to review the very reasonable requests submitted by our subscribers in response to this incident and the communication blackout which occurred will not occur again. By June 1st, 2009 we will have a redundant page in place to address any issues which affect the VPSLink.com site and ticketing system by providing a direct e-mail address to contact our staff.

Additionally, we have created a twitter.com account for all significant events on our network and any news items which may affect your service (particularly for instances when our notification tools are unavailable):

Twitter.com: vpslink https://twitter.com/vpslink

Twitter.com VPSLink RSS Feed: http://twitter.com/statuses/user_timeline/41262067.rss


We look forward to providing you with service for years to come - and that means preventing incidents like this one from occurring again. We hope that you will accept our apologies and service credits for any difficulty or inconvenience this incident may have caused and our promise to reduce the potential points of failure for communication and your service.

- The VPSLink Team