AWS Tips I Wish I'd Known Before I Started

AWS Tips I Wish I'd Known Before I Started(wblinks.com)

606 points by richadams 12 years ago | 149 comments

rkalla 12 years ago |

Fantastic list with much more depth than I expected. Some surprises that others might be interested in from this article and comments below:

  [1] Keeping buckets locked down and allowing direct client -> S3 uploads
  [2] Using ALIAS records for easier redirection to core AWS resources instead of CNAMES.
  [3] What's an ALIAS?
  [-] Using IAM Roles
  [4] Benefits of using a VPC
  [-] Use '-' instead of '.' in S3 bucket names that will be accessed via HTTPS.
  [-] Automatic security auditing (damn, entire section was eye-opening)
  [-] Disable SSH in security groups to force you to get automation right.

[1] http://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlU...

[2] http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/Cre...

[3] http://blog.dnsimple.com/2011/11/introducing-alias-record/

[4] http://www.youtube.com/watch?v=Zd5hsL-JNY4

jamiesonbecker 12 years ago | |

I like SSH. But I'm the founder of Userify ;) http://userify.com

Also, S3 buckets cannot scale infinitely. This is a huge myth http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tip...

mbreese 12 years ago |

I'd also add to the list - make sure that AWS is right for your workload.

If you don't have an elastic workload and are keeping all of your servers online 24/7, then you should investigate dedicated hardware from another provider. AWS really only makes sense ($$) when you can take advantage of the ability to spin up and spin down your instances as needed.

NathanKP 12 years ago | |

The startup I'm working for has minimal scaling required but we still use AWS despite the higher cost for EC2 because the broad ecosystem of AWS products make it easier to develop interesting things quickly and efficiently.

If we went with all of our own dedicated hardware, or cheaper instances from a different cloud provider then we'd miss out on ELB, have slower and more expensive communication to and from S3, not to mention that services like Elastic Beanstalk make deploying to EC2 instances very easy compared with rolling your own deployment system. And for those who don't want to bother with administrating databases and cache machines RDS and Elasticache are going to be cheapest and fastest if your instances are EC2.

So yeah I agree that EC2 is expensive, but the benefits of living fully within the Amazon ecosystem are pretty large.

osteele 12 years ago | | |

I think of AWS as a tool for prototyping and early-stage outsourcing of your infrastructure. Use it when you're fighting past market and technology risk with a single-digit team; consider dropping it in order to optimize costs when you've got more people (including fractional people) to evaluate, configure, and operate alternatives.

neverminder 12 years ago | | |

Isn't relying so much on AWS like "putting all your eggs in one basket"?

berkay 12 years ago | |

AWS != EC2. Do not assume that AWS is only used as VMs. It's different for everyone but AWS provides massive savings for many companies. It makes sense for many use cases in addition to elastic workloads.

mbreese 12 years ago | | |

True, but outside of S3 and Route53, how much under the AWS umbrella is much use without using at least one EC2 instance?

I can see a lot of benefit to using S3 without EC2, but after that, I'm not sure what else would be possible. Care to elaborate more?

Can you use their queues and database tools w/o using EC2? (If you are using a VPC, maybe?)

AaronBBrown 12 years ago | |

AWS makes a lot of sense in a 24/7 environment, particularly when you are a new startup and don't have enough information (or capital!) to make educated server purchases.

dangrossman 12 years ago | | |

They don't have to make any purchasing decisions. They can rent servers from companies like Softlayer and Rackspace (#3 and #4 behind AWS for YC startups), or spin up much cheaper VPS's (Linode's #2). We're talking $120/month commitments, not buying hardware and driving to a data center to install it. Deploying to a freshly imaged physical server is the same as deploying to EC2, and they can be provisioned for you in an hour or two. Each of those servers gets you many times the performance of an EC2 instance in the same price class, which means much more time to figure out your capacity needs as you grow.

Wilya 12 years ago | | |

I don't see how that makes sense.

If all you need is a server that is up 24/7, rent it by the month. You don't need information to make an educated choice, since they are pretty much all cheaper than EC2.

banachtarski 12 years ago | | |

With the additional caveat that you have tons of money sure. With 24/7 load, you'd probably pay a 10x premium to use AWS.

_zen 12 years ago | | |

> particularly when you are a new startup and don't have enough information (or capital!) to make educated server purchases.

I doubt there are many founders who are technically informed enough to know about Amazon Web Services, but don't know about the other big 3 (Digital Ocean, Linode, Rackspace). If you truly don't, then you must not be a tech company, and I have a hard time believing a non-tech company without any technical founders would even know about AWS.

richadams 12 years ago | |

Absolutely, great point! AWS isn't for everyone, and there can be lots of cases where it's cheaper to use dedicated hardware. Shop around before jumping in. I've added this as a new tip at the end of the article (crediting you of course). Thanks!

Judson 12 years ago |

One thing the article mentions is terminating SSL on your ELB. If you want more control over your SSL setup AND want to get remote IP information (e.g. X-Forwarded-For) ELB now supports PROXY protocol. I wrote a little introduction on how to set it up[0]. They haven't promoted it very much, but it is quite useful.

[0]: http://jud.me/post/65621015920/hardened-ssl-ciphers-using-aw...

richadams 12 years ago | |

Great post, I had no idea you could do this with ELB. I've added your link to the additional reading list in my post, thanks for sharing!

mslot 12 years ago |

Be very careful with assigning IAM roles to EC2 instances. Many web applications have some kind of implicit proxying, e.g. a function to download an image from a user-defined URL. You might have remembered to block 127.0.0.*, but did you remember 169.254.169.254? Are you aware why 169.254.169.254 is relevant to IAM roles? Did you consider hostnames pointed to to 169.254.169.254? Did you consider that your HTTP client might do a separate DNS look-up? etc.

There are other subtleties which make roles hard to work with. The same policies can have different effects for roles and users (e.g., permission to copy from other buckets).

IAM Roles can be useful, especially for bootstrapping (e.g. retrieving an encrypted key store at start-up), but only use them if you know what you're doing.

Conversely, tips like disabling SSH have negligible security benefit if you're using the default EC2 setup (private key-based login). It's really quite useful to see what's going on in an individual server when you're developing a service.

Also, it does matter whether you put a CDN in front of S3. Even when requesting a file from EC2, CloudFront is typically an order of magnitude faster than S3. Even when using the website endpoint, S3 is not designed for web sites and will serve 500s relatively frequently, and does not scale instantly.

Fizzer 12 years ago |

> you pay the much cheaper CloudFront outbound bandwidth costs, instead of the S3 outbound bandwidth costs.

What? CloudFront bandwidth costs are, at best, the same as S3 outbound costs, and at worse much more expensive.

S3 outbound costs are 12 cents per GB worldwide. [1]

CloudFont outbound costs are 12-25 cents per GB, depending on the region. [2]

Not only that, but your cost-per-request on CloudFront way more than S3 ($0.004 per 10,000 requests on S3 vs $0.0075-$0.0160 per 10,000 requests on CloudFront)

[1] http://aws.amazon.com/s3/pricing/ [2] http://aws.amazon.com/cloudfront/pricing/

richadams 12 years ago | |

Doh, I feel stupid now. I only looked at bandwidth costs, not the request prices. That's what I get for editing my post late at night based on reading, instead of based on personal experience.

For low bandwidth, you're absolutely right, the costs are at best the same. For high bandwidth however (once you get above 10TB), CloudFront works out cheaper (by about $0.010/GB, depending on region). But that wasn't taking into account the request cost, which as you point out, is more expensive on CloudFront, which can negate the savings from above depending on your usage pattern.

I'll update my post accordingly, thanks for pointing this error out!

_hyn3 12 years ago | | |

You do have to pay for S3 to CloudFront traffic, so really you're paying twice. (Although the S3 to CF traffic might be cheaper than S3 to Internet, according to the Origin Server section on the Cloudfront pricing page.) http://aws.amazon.com/cloudfront/pricing/

Also, S3 buckets cannot scale infinitely. They have to have their key names managed appropriately to do it. http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tip...

Finally :) I like SSH. But I'm the founder of Userify! http://userify.com

krallin 12 years ago |

Lots of very useful tips there!

There's one that I think could be improved on a little:

    Uploads should go direct to S3 (don't store on local filesystem and have another process move to S3 for example).

You could even use a temporary URL[0,1] and have the user upload directly to S3!

[0]: http://stackoverflow.com/questions/10044151/how-to-generate-... [1]: http://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlU...

jaibot 12 years ago | |

I have a desktop client that requests one-time upload URLs from my server via an API. Later they get downloaded and processed somewhere else - never actually touching my web server.

Jach 12 years ago | |

Even cooler I think if you need a lot of file uploads (and potentially organized into their own folders) is letting your customer connect to a WebDAV interface with their system file browser, then they can just drag and drop whatever. (https://code.google.com/r/1meref-sabredav-amazons3/)

richadams 12 years ago | |

Wow, didn't know about pre-signed URLs, very useful. I've added this info to my article, thanks!

toomuchtodo 12 years ago | |

I've always seen issues pushing objects directly to S3 from a browser using CORS. YMMV.

ceejayoz 12 years ago | | |

You can specify CORS headers for S3, or you can just use a standard form POST.

j-kidd 12 years ago |

Good article, but I think it touches too little about persistence. The trade-off of EBS vs ephemeral storage, for example, is not mentioned at all.

Getting your application server up and running is the easiest part in operation, whether you do it by hand via SSH, or automate and autoscale everything with ansible/chef/puppet/salt/whatever. Persistence is the hard part.

crescentfresh 12 years ago | |

Good point. We're struggling to see the benefits of EBS for Cassandra that has its own replication strategy (ie data is not lost if an instance is lost), voiding the "only store temporary data on ephemeral stores" argument.

lowbloodsugar 12 years ago | | |

You've probably read this, but in case anyone else is considering EBS and cassandra...

"EBS volumes are not recommended for Cassandra data volumes."

http://www.datastax.com/docs/1.1/cluster_architecture/cluste...

blakesmith 12 years ago | | |

How do you handle entire datacenter outages with ephemeral only setup? You can replicate to another datacenter, but if power is lost to both do you just accept that you'll have to restore from a snapshotted backup?

objectivefs 12 years ago | |

Persistence for AWS can be relatively simple if you use a distributed filesystem, such as GlusterFS (http://gluster.org) or our ObjectiveFS (https://objectivefs.com). You get a shared namespace for all your instances and persisting your data becomes as simple as writing files.

PhilipA 12 years ago |

Really useful article, though I don't agree with not using a CDN instead of S3. There are multiple articles which proves the performance of S3 being quite bad, and not useful for serving assets, comparing to CloudFront.

kolev 12 years ago | |

The issue with CloudFront is the tremendous cost of $600/mo for custom domain SSL certificate. You also need to apply and get approved. There many not-so-obvious limits and blocks and unless you pay for support (which is pretty cheap, by the way), it may take you a week to lift those - you need to request them one by one and various teams approve/disapprove the requests. It's totally ridiculous.

richadams 12 years ago | |

I'll admit I hadn't really look at this in depth, using S3 without a CDN solved a particular use case I had a while ago, and it just seemed unnecessary to add a CDN in front of it. I've been doing some reading today, and it seems I was wrong. Adding a CDN in front adds lots of benefits I didn't know about!

I'll update the article soon to add in the new information.

tyw 12 years ago | |

also the outbound bandwidth cost of S3 is very high. it would cost us several times what we're paying for s3+cloudfront to serve our content straight from s3.

tedivm 12 years ago | | |

Even cloudfront is ridiculously overpriced for a CDN. If you're pushing anything close to real bandwidth you could do a lot better elsewhere.

drob 12 years ago |

Along these lines, I recommend installing New Relic server monitoring on all your EC2 instances.

The server-level monitoring is free, and it's super simple to install. (The code we use to roll it out via ansible: https://gist.github.com/drob/8790246)

You get 24 hours of historical data and a nice webUI. Totally worth the effort.

match 12 years ago |

  > Use random strings at the start of your keys.
  > This seems like a strange idea, but one of the implementation details 
  > of S3 is that Amazon use the object key to determine where a file is physically 
  > placed in S3. So files with the same prefix might end up on the same hard disk 
  > for example. By randomising your key prefixes, you end up with a better distribution 
  > of your object files. (Source: S3 Performance Tips & Tricks)

This is great advice, but just a small conceptual correction. The prefix doesn't control where the file contents will be stored it just controls where the index to that file's contents is stored.

lfuller 12 years ago |

Your body tag is set to "overflow: hidden;". I wasn't able to scroll until I tweaked it manually in the inspector.

richadams 12 years ago | |

Oops, sorry about that. Should be fixed now.

paulgb 12 years ago | | |

Also, if you change the first line of http://wblinks.com/css/style.css

    @import url(http://fonts.googleapis.com/css?family=Droid+Sans:400,700);

you should notice an improvement in the boldface font rendering.

Great article, btw.

SixSigma 12 years ago | | |

I can't zoom on Firefox mobile for Android

kolev 12 years ago |

One painful to learn issue with AWS is the limits of services, which some of them are not so obvious. Everything has a hard limit and unless you have the support plan, it can take you days and weeks to get those lifted. They are all handled by the respective departments and lifted (or rejected) one by one. Many times we've encountered a Security Group limit right before a production push or other similar things. Last, but not least, RDS and CloudFront are extremely painful to launch. I have many incidents where RDS was taking nearly 2 hours to launch - blank multi-AZ instance! CloudFront distributions take 30 minutes to complete. I hate those two taking so long as my CloudFormation templates pretty much take an excess of an hour due to the blocking RDS and CloudFront. Last, but not least - VPC is nice, I love it, but it takes time to get what's the difference between Network ACL and Security groups and especially - why the neck do you need to run NATs?! Why isn't this part of the service?! They provide some outdated "high" availability scripts, which are, in fact, buggy, and support only 2 AZs. Also, a CloudFront "flush" takes over 20 minutes - even for empty distributions! Also, you can't do a hot switch from on distribution to another as it also take 30 minutes to change a CNAME and you cannot have two distributions having the same CNAME (it's a weird edge case scenario, but anyway).

kolev 12 years ago | |

Just recalled another big annoyance! CloudFormation allows you to store JSON files in the user data, which is a bit similar to CloudInit, but... it turns your numbers into strings! So, imagine you need to put some JSON config file in there and the software expect an integer and craps out if there's a string value instead. I won't even bring how limited and behind the API CloudFormation is... Even their AWS CLI is behind and doesn't support major services like CloudFront. They even removed the nice landing page of the CLI took, which made it very obvious which services are NOT supported - I guess they just got embarrassed by having so many unsupported ones!

noelherrick 12 years ago |

> Have tools to view application logs.

Yes! Centralized logging is an absolute must: don't depend on the fact that you can log in and look at logs. This will grow so wearisome.

txttran 12 years ago | |

What tools do you recommend for centralized logging?

Mizza 12 years ago |

That '.' instead of '-' tip for SSL'd buckets just saved me a large future headache. Good stuff!

croddin 12 years ago | |

I think you reversed them.

michaelmior 12 years ago |

Disabling SSH is an interesting tip. I guess the OP doesn't do any automation via SSH.

richadams 12 years ago | |

Just disabling inbound SSH connections, the servers can still SSH out to other systems to pull in files, configurations, clone git repos, etc.

It's just a way to stop yourself from cheating and SSHing in just to fix that one thing, instead of automating it.

milkshakes 12 years ago | | |

except that some automation frameworks rely on inbound ssh access to the machines. ansible would be an example of such a framework, in its default configuration at least.

michaelmior 12 years ago | | |

It can also be useful to SSH into a system to check what's going on with a specific problem. Sometimes weird things happen that you can't always anticipate or automate away.

jamiesonbecker 12 years ago | |

Userify is awesome for this - disable SSH user accounts at any time and then re-enable when you realize you still need SSH to find out why your instance stopped sending logs!! ;)

michaelmior 12 years ago | | |

Thanks Userify CEO! :)

novaleaf 12 years ago |

i'm a devops noob. what tools should i use to log / monitor all my servers?

i don't want to learn some complex stuff like cheff/puppet btw.... anything SIMPLE?

carlio 12 years ago | |

Though I haven't tried it, people tell me that ansible is pretty simple - http://www.ansible.com/home

For logging, try logstash? http://logstash.net/

Monitoring... well that's a large and complicated topic!

adenot 12 years ago | | |

+1 on Ansible, great tool and super simple to configure and use.

novaleaf 12 years ago | | |

thanks, i'll give ansible a try!

freerobby 12 years ago |

Can you (or somebody else) elaborate on disabling ssh access? Is this a dogma of "automation should do everything" or is there a specific security concern you are worried about? What is the downside of letting your ops people ssh into boxes, or for that matter of their needing to do so?

pavel_lishin 12 years ago | |

Based on the article, it seems it's there to make sure that you're automating everything, instead of logging in to do that one little thing by hand.

freerobby 12 years ago | | |

Thanks.

Does anybody else here agree with this mentality? This seems a major mispractice to me. I've worked at companies with as few as two people to as many as 50,000 people. None of them have had production systems that are entirely self-maintaining. Most startups are better off being pragmatic than investing man-years of time handling rare error cases like what to do if you get an S3 upload error while nearly out of disk space. There's a good reason why even highly automated companies like Facebook have dozens of sysadmins working around the clock.

I thought all of his other points were spot-on but this one rings very dissonant to my experience.

richadams 12 years ago | | |

This is correct. The tip about disabling SSH isn't about security, it's just about quickly highlighting areas where you're not automated.

When developing an application for example, it's often necessary to SSH in to play with some things. But once you've ready to go to production, you want as much automation as possible. Forcing yourself to not use SSH will quickly show you where you aren't automated.

jamiesonbecker 12 years ago | |

I like SSH personally.. how else do you log in to figure out why your production instances quit logging (or anything else). I do appreciate the logic behind this, though... what he really seems to be saying is "turn off SSH to see if you can live without it." Good call. (disclaimer: I developed Userify, which pushes out SSH keys and lets you disable SSH for any or all users anytime.. and then re-enable when you need it!)

tbarbugli 12 years ago | |

Pure bullshit

Estragon 12 years ago |

How hard is it to roll your own version of AWS's security groups? I want to set up a Storm cluster, but the methods I have come up with for firewalling it while preserving elasticity all seem a bit fragile.

jamiesonbecker 12 years ago | |

Check out Dome9. Amazing tool and I think they work with both AWS and elsewhere.

mblaney 12 years ago |

As an Australian developer, using an EC2 instance seems to be the cheapest option if you want a server based in this country. Anyone got any other recommendations?

kibibu 12 years ago | |

Ninefold aren't bad either

mootpointer 12 years ago | | |

As a Ninefold employee, I'd like to think we're pretty good. We do virtual servers and we have a solid Rails platform as well.

mblaney 12 years ago | | |

thanks will keep them in mind.

simonlebo 12 years ago |

Can anyone explain how disabling ssh has anything to do with automation? We automate all our deployments through ssh and I was not aware of another way of doing.

ceejayoz 12 years ago | |

I believe the idea is that by preventing SSH the temptation to just pop in and tweak something manually isn't possible.

richadams 12 years ago | | |

Yup, this was the intention. You could still allow your automation processes SSH access, just disable it for your users.

The idea is that if a user can't SSH in (at least not without modifying the firewall rules to allow it again), it will force them to try and automate what they were going to do instead. It worked well for me, but it's probably not for everyone.

jessaustin 12 years ago | |

ssh is handy if you're creating instances and then setting them up. However, if you're doing that on a regular basis, you might ought to use custom AMIs instead. Then (with proper "user data" management) you can just roll out instances that are already set up how you want.

rdl 12 years ago |

I'd probably also say "avoid ELB where possible, especially for instance storage" and "avoid ELB, roll your own."

late2part 12 years ago |

Thing I wish I'd known before I started: Don't rely on proprietary AWS solutions when open source solutions work just as well.

jamiesonbecker 12 years ago |

With regards to managing ssh, keys, etc... userify. Disclaimer: founder.

gesman 12 years ago |

Someone needs to create such list for Azure as well.

And make it Wiki-ized.

ape4 12 years ago |

Wow looks like a big pain.

Fasebook 12 years ago |

What's the point of auditing security in the Cloud? Is there any point at which you can know that your making any progress?

mscarborough 12 years ago | |

Just one example -- Amazon will sign a Business Associate's Agreement for HIPAA compliance. That doesn't absolve you of your application security responsibilities, but it does give you piece of mind on the PAAS EC2/S3 side of things.

tel 12 years ago | | |

For further note though, they won't unless you buy dedicated instances. This also disables RDS.

5ersi 12 years ago |

Aww man, my head hurts just looking at this list.

Just go with a PaaS, like Heroku or AppEngine, and forget about this sysadmin crap.

q3k 12 years ago | |

> sysadmin crap

Without this “sysadmin crap” you would not have your precious PaaS.

tburch 12 years ago | | |

>> sysadmin crap

>Without this “sysadmin crap” you would not have your precious PaaS.

The difference being that I don't have to deal with the “sysadmin crap”.