Our site went down and we’re sorry

Our site went down and we’re sorry

Bhavin on July 22, 2017

July 24, 9:45am PDT

We think we’ve stopped the attack, but we’re continuing the monitor. The site should be working for most students and we’ll continue looking into the issues.

July 24, 9:15am PDT

Unfortunately, we’re still experiencing issues and intermittent down time. It looks like someone is flooding our servers with “fake” traffic, also known as a DOS attack. Our team and partners are trying to determine how to stop the attack.

July 22, 12:00pm PDT

We had an unexpected outage today, and I’m truly sorry. We know how important your studies are and that many of you have upcoming tests or blocked off time today to study, and we let you down. We don’t have all the details, and we’re not 100% sure we’re in the clear, but I wanted to share a little bit about what happened.

First, here’s the timeline (all times Pacific Daylight Time on Saturday, July 22nd):

  • 2:00am: our site became slow
  • 5:30am: the site was entirely down
  • 6:30am: the site was back up but very slow again and intermittently down
  • 11:00am: the site was back up

We host our servers with Amazon via Amazon Web Services. We also use a company in Australia to help us work with Amazon. They monitor our servers and try to make sure everything’s running smoothly.

So what happened?

There was a problem with the server our database is on. Our database is where we store all of your information, such as which questions you’ve answered, when your account expires, etc. Every time you submit an answer, flip a flashcard, or take any other action on our site, we store that in our database so we can keep track of your progress and provide that information back to you. The database server had connectivity issues which ultimately impacted your ability to use the site. We have automated backups that should kick in when something like this happens, but unfortunately they didn’t work either. We don’t exactly know why the issue happened and why the backups didn’t work yet, but we’re looking into it.

Our engineering team and our partners in Australia decided the best course of action would be to completely replace our database and database server. This meant we’d need to take the site down entirely for about 15 minutes while we set up the new server and copied over the new database. We took this action around 10:45am and the site was back up by 11am.

Ideally, the site would have never gone down, or if it did, we’d have identified the solution earlier. We’ll look into why this happened and see how we can prevent it in the future. I’m truly sorry for disrupting your studies, and if you want to share your thoughts or need a little extra time on your account or need anything else, let us know by emailing help@magoosh.com.

CEO, Magoosh