An unfortunate incidence saw Amazon.Com INC.’s web-based Simple Storage Service (S3) experience widespread problems that disrupted websites, apps and smart devices across the whole country on Tuesday.
Fortunately, the underlying cause of the nagging problem that froze out so many customers has been unearthed, a tweet made on Pinboard revealed that the migration of S3 which is hosted on Google cloud storage to AWS was the root cause of the outage. Many people have bought this explanation, terming it as the best explanation so far as compared to the lengthy explanations posted on the Amazon Web Services website.
The members of the Amazon Simple Storage Service team were busy debugging an issue that caused the S3 billing system to progress more slowly than expected on Tuesday morning. The team made attempts to take down a small number of servers for one of the subsystems that is used by the billing process.
However, Amazon said that one of the inputs to the command had a typo that was entered incorrectly and a larger set of servers were removed and that was not intended to happen. Amazon also said that the servers that were inadvertently remove supported two other S3 subsystems.
The unfortunate error had a gushing effect that lead to widespread problems with Amazon’s massive network of servers that are a huge part of the internet infrastructure. The incidence saw the servers accidentally taken down offline, this is because they had to be restarted, something that takes a while, as reported by the Verge.
Apps and Websites were adversely affected by the outage including the Securities and Exchange Commission, Business Insider, Quora and Slack. The source of the outage was in the AWS US-East-1 region in Northern Virginia, as reported by AWS’s Service Health Dashboard.
Amazon made a statement on Thursday in which it said that it was making several changes to curb the occurrence of such incidents in the future. Amazon said that it deeply apologizes for the impact that the event had to its clients, adding that it is proud of it’s long track record of availability with Amazon S3 and that it knows how critical the service is to its customers, the applications and end users and their businesses. The company pledged to do everything it can to learn from the unfortunate incident and use it to improve its availability even further.