Jump to content

Netflix suffers Christmas Eve outage. Amazon to blame again.


amenditman

Recommended Posts

Guest LilBambi

It has been a while I guess. The first outage for Facebook that came up in a search was from Sept 2010 but apparently there was one as late as this month, December 10, 2012.

 

But as you say, they have a pretty good reputation for uptime as they have to have because they would certainly hear about it, and I am sure they don't want to miss one day of that aggregated data they get from all the billions of users. :w00tx100:

 

I bet Netflix uptime has been pretty good as well overall. And if there were other sites affected, which there were, it does sound like not so much an Amazon issue as a data center issue in northern VA.

Link to comment
Share on other sites

I wouldn't call more than 27 Million a few though...

 

Netflix reported 25.1 million streaming subscribers in the U.S. as of the end of its third quarter. In other regions it reported 4.31 million, but that figure includes the U.K., Ireland and other countries that weren't affected.

 

These numbers do not add up. Something is wrong here. I wonder where the quoted articles are getting their numbers from.....

 

Adam

Link to comment
Share on other sites

I bet Netflix uptime has been pretty good as well overall. And if there were other sites affected, which there were, it does sound like not so much an Amazon issue as a data center issue in northern VA.

 

Well, as I understand it, the inherent design of AWS means the services should automatically shift from one data center to another if there is a problem with one. That is not what happened at AWS when the power dropped. So, it was Amazon's glitch which caused the outage.

 

Adam

Link to comment
Share on other sites

Guest LilBambi

That would depend on whether Amazon owns their own data center in Northern VA or if they lease space in someone else's data center.

 

I would imagine if they don't already own their own in Northern VA, it would seem that it may be time to build their own. That particular data center has been much more painful than others around the country for Amazon.

Link to comment
Share on other sites

and it looks like it is down again.

I got a DVD by mail and for the first time ever (been a member since March 2007) the DVD is broken. I wanted to report it but the site is down.

netflix_down.jpg

Edited by zlim
Link to comment
Share on other sites

Guest LilBambi

Wow, that sucks! Yep, it's not up again: http://downrightnow.com/netflix

 

They are not the only ones, LiveJournal too apparently right now. I also was in the middle of posting a Happy New Year posting on Wordpress.com and decided to come back later as it was totally unusable intermittently and was frustrating the heck out of me. And that's not even on the list. I also noticed that the first time I check my GMail this morning it was missing in action too. BTW: The downrightnow.com website doesn't show either of those two outages for today.

 

It's nuts out there today!

Edited by LilBambi
Link to comment
Share on other sites

Anonymous is working....... lol

 

Adam

 

PS- I've not noticed any outages today. Are you sure it is the service on not some issue with VZW routing?

Link to comment
Share on other sites

Guest LilBambi

No, I am not totally sure Adam. But I have had several times when things are a bit weird today. Earlier today, I was having trouble getting to both GMail for just a minute or so around 10AM EST, and although Wordpress.com is now back as evidenced by my Happy New Year posting, it was totally unusable for a frustrating 10 minutes or so when i first tried to post it earlier this afternoon.

Link to comment
Share on other sites

A Closer Look At The Christmas Eve Outage

 

 

 

It is still early days for cloud innovation and there is certainly more to do in terms of building resiliency in the cloud. In 2012 we started to investigate running Netflix in more than one AWS region and got a better gauge on the complexity and investment needed to make these changes.

 

We have plans to work on this in 2013. It is an interesting and hard problem to solve, since there is a lot more data that will need to be replicated over a wide area and the systems involved in switching traffic between regions must be extremely reliable and capable of avoiding cascading overload failures. Naive approaches could have the downside of being more expensive, more complex and cause new problems that might make the service less reliable. Look for upcoming blog posts as we make progress in implementing regional resiliency.

 

 

Summary of the December 24, 2012 Amazon ELB Service Event in the US-East Region

 

We have made a number of changes to protect the ELB service from this sort of disruption in the future.

 

Last, but certainly not least, we want to apologize. We know how critical our services are to our customers’ businesses, and we know this disruption came at an inopportune time for some of our customers. We will do everything we can to learn from this event and use it to drive further improvement in the ELB service.

 

Adam

Link to comment
Share on other sites

hypothesis: When events such as these occur, more than the customers suffer. If DNS lookups are not configured to know when to quit, the packets looking for the destination go all over the 'net looking for the destination, thereby causing congestion in the 'net traffic.

Edited by crp
Link to comment
Share on other sites

Guest LilBambi

ELB service - Elastic Load Balancing - Amazon AWS

 

Elastic Load Balancing automatically distributes incoming application traffic across multiple Amazon EC2 instances. It enables you to achieve even greater fault tolerance in your applications, seamlessly providing the amount of load balancing capacity needed in response to incoming application traffic. Elastic Load Balancing detects unhealthy instances within a pool and automatically reroutes traffic to healthy instances until the unhealthy instances have been restored. Customers can enable Elastic Load Balancing within a single Availability Zone or across multiple zones for even more consistent application performance. Elastic Load Balancing can also be used in an Amazon Virtual Private Cloud (“VPC”) to distribute traffic between application tiers.

 

Much more on the features on the page.

Link to comment
Share on other sites

Well, it was only one region that went down. If Netflix was doing their own CDN, it is likely they would experience their own problems too. Once you start working on this scale, downtime for some customers is inevitable. No system can have 100% uptime.

 

Netflix is huge, and delivers 1/3 of the internet traffic during peak hours. Any outage even at 1% of total users will be quite noticeable.

 

Adam

that is... incredible!

more unbelievable to me. I really doubt the veracity of the report .I'd like to see measuring guidelines, etc.
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...