Using AWS Lambda to call and text you when your servers are down

Using AWS Lambda to call and text you when your servers are down(thisdata.com)

160 points by nhm 9 years ago | 53 comments

Dobbs 9 years ago |

From a engineering point of view this is really cool, but as an ex-sysadmin I feel that I need to reiterate and emphasize something that is alluded to in the second paragraph.

Too many things can go wrong and you are all around better off outsourcing this to something like Pingdom. You don't have sufficient levels of reliability, you aren't dual homed across twilio and another phone system. Maybe the cause of your outage is that AWS is having issues. Now your site and your monitoring is down.

Much better to outsource to people who obsess over doing this right and making sure they are properly redundant.

nhm 9 years ago | |

> outsource to people who obsess over doing this right

Completely agree! I often have to fight that "I could just build that myself" mentality, which glosses over the points you made so well.

avitzurel 9 years ago | | |

This and that!

It's the same as "Twitter clone" with just posting messages with 140 char limit and "build a blog in 15 minutes".

Alerting over a downed website are is sorta like a glacier, there's so much under the surface, if you just see the surface you're missing out.

1. Multiple locations 2. Multiple check intervals 3. SMS/email provider switch on fail 4. Auto recovery of your checkers 5. Multiple providers with a single storage.

melvinmt 9 years ago | |

> Now your site and your monitoring is down. Much better to outsource to people who obsess over doing this right and making sure they are properly redundant.

You make valid points about redundancy and levels of reliability but keep in mind that even Pingdom can go down: http://royal.pingdom.com/2016/10/24/ddos-attack-affects-ping...

user5994461 9 years ago | | |

Chances are that pingdom won't be down at the same time that your site is down.

Diversify to avoid cascading failures ;)

imtringued 9 years ago | | |

With your own solution you will likely encounter the same problems that pingdom faced including this one. The benefit of a service like pingdom is that they already solved those problems for you or if they haven't you don't have to waste time solving them yourself. It's not very efficient if everyone solves the same problems over and over again.

dharma1 9 years ago | | |

Use 2 or more providers. Signing up takes a minute or two and there are free alternatives

IgorPartola 9 years ago | |

My favorite issue recently came up with a Django app of mine which was set up to email me when a request errors out. Turns out, when I switched which server it ran on I misconfigured the email settings and one of the errors was caused due to the inability to send an email. Thankfully it only took a few days to figure this out.

teddyh 9 years ago | |

We’ve had issues with Pingdom at work. We don’t use them ourselves, but we host web sites, and some customer of ours used Pingdom to monitor their web site hosted on our servers. The customer would complain to us about downtime reported by Pingdom, but we would read the logs and find everything OK, with multiple successful accesses from other people during the time which Pingdom reported our customer’s site as being down. A huge pain.

snom380 9 years ago | | |

Doesn't services like Pingdom support multiple ping locations? If all of those fail, there's a very high chance there's an actual problem, if not with your server then with your (ISPs) connectivity.

tjholowaychuk 9 years ago | |

Yep, plus most engineering time is worth at minimum $60+/h, which would pay for a year or more with most of these services.

vacri 9 years ago | | |

On the other hand, it's 'set up once and it just keeps chugging along', and isn't Yet Another SaaS To Manage.

Also, if you want a 'proper' ops alerting SaaS, you're looking at something along the lines of $50/user/mo or $15/server/mo, neither of which is trivial.

falcolas 9 years ago |

And I'm sure Lambda will never go down. Right? Right??

(It has. Completely and silently stopped processing against Kinesis queues for a few hours recently. Guess what AWS Step is built on?)

jlgaddis 9 years ago | |

Well, sure, of course it will, but I don't think Nick is advocating replacing a complete, full featured monitoring system with this.

It could be very useful to, for example, keep an eye on your monitoring system. At $work, we have a pretty extensive monitoring system that we've built out. We use an external service to watch over the monitoring system, though, to alert us of any issues with it that we haven't otherwise caught.

Besides, like he said, it's "fun" and kinda neat.

cddotdotslash 9 years ago | |

Of course it can go down, and you can have CloudWatch alerts to alert you about that. But so can your Nagios server sending pings go down or the fancy SaaS you signed up for.

robinson-wall 9 years ago | | |

Did you just suggest using a third AWS service to let you know if the second AWS service monitoring your first AWS service goes down?

thisone 9 years ago | |

don't worry, the status was green the entire time...

falcolas 9 years ago | | |

Yes, it was indeed green the entire time. Of course, AWS is almost always green, so long as something is up...

tjholowaychuk 9 years ago |

I wrote Apex Ping (https://apex.sh/ping/) for those who want more features and/or don't want to waste the time to save a few bucks :D.

gingerlime 9 years ago | |

Apex ping is great, but I'm still waiting for SMS / Twilio integration (hint, hint, nudge nudge) :)

postila 9 years ago | | |

I now use okmeter.io and really happy (especially for nginx and Postgres monitoring). They improve it constantly, and installation took just a couple of minutes. SMS/email/slack notifications work great (however, for slack, I needed to put a webhook).

tjholowaychuk 9 years ago | | |

:D I wish SMS wasn't so awkward, you're pretty much forced to have a credit system since it's so expensive. I'll probably still do it at some point. Makes it awkward for the customer as well if you have to babysit the credits

paps 9 years ago | |

Thanks for this. I would use it but you only do HTTP requests, right? It would be great if you could also do a true ICMP ping (like the name suggests!)

tymm 9 years ago |

I wrote something similar in bash and put it into a docker image: https://hub.docker.com/r/simplepush/alerta

Just running this docker image on a server you want to monitor is enough.

Instead of Twilio it uses Simplepush (https://simplepush.io).

cyberferret 9 years ago | |

Simplepush looks like a cool service - thanks for the heads up. It seems that it accomplishes the author's main need - that for a constant buzzing which needs to be picked up and dealt with.

EDIT: Just seen that it is Android only! :-/

ubercow 9 years ago | | |

If you need something similar that works on iOS (and Android), take a look at https://pushover.net/

I use it for some personal automation scripts that might need to get my attention if something goes wrong.

dmourati 9 years ago |

Isn't the better plan to use Lambda instead of your servers?

justinc8687 9 years ago |

I use https://aremysitesup.com/ and I've found it really helpful as it one of the few inexpensive services I've found that will CALL me if things are down. SMS is nice, but I use the do-not-disturb feature on my phone in the evenings, and at least on iOS, the only way to punch through that is with a call from a number on my favorites list. This meets that need very well and I've found the service to be quite spot on alerting me (both when I had one instance of things hitting the fan, but also during scheduled maintenance). I'd highly recommend.

intrasight 9 years ago |

The advantage of this type of cloud solution over a one-size-fits-all cloud service like Pingdom (which I use) is flexibility. You can configure cloud agents to perform nearly any task you can envision.

gravypod 9 years ago |

I don't know if everyone knows this but you can make texts using email.

Most providers have SMTP gateways for SMS services. Verizon runs @vtext.com

gtaylor 9 years ago | |

Just keep in mind that these aren't incredibly reliable across the board. Others have very low or arbitrary autoban or blacklist policies. I eventually caved and paid Twilio to hassle with SMS logistics for me, rather than deal with the weirdness.

tomascot 9 years ago | |

I did this for non-critical ops, it's just a foward to my phone's email address, really simple. The problem is that not every company is reliable or even has that email

social_quotient 9 years ago |

Note: instead of dynamodb for the lookup mentioned at the bottom, maybe consider https://aws.amazon.com/athena/ for an s3 query

illumin8 9 years ago | |

You could literally just store a .CSV file in S3 with a table that has the on-call schedule in it and run SQL queries against Athena that would be cheap... you'd be querying a few KB, but DynamoDB is probably better for this use case, honestly. Athena is great for scanning huge datasets very quickly.

cagataygurturk 9 years ago |

Route53 Health Checks & SNS can send sms message without any Lambda involved.

theparanoid 9 years ago |

I've used montastic.com for years. 2min setup to fire and forget.