Amazon Web Services (AWS) is the world's leading cloud computing provider. As such, AWS has 16 data centers worldwide. This means that AWS offers the global cryo-EM community access to high-performance computing resources.

As discussed below, AWS provides pay-as-you-go computing and data storage services. This means that there is NO upfront cost to use their resources. That said, users need to be mindful of how they use AWS as charges do apply.

AWS allows us to solve cryo-EM structures without any waiting / queuing / administration time AND it costs <$300 per structure

Further reading:


Introduction to Amazon Web Services (AWS)

Why use AWS?

AWS provides on-demand computing for users worldwide. This means that AWS can easily expand to as many users are you need, whenever you need it. 

This makes it particularly ideal for cryo-EM facilities that are helping many users collect their data. By incorporating AWS into their workflow, facilities can help as many users as needed without any upfront cost.

Pay-as-you-go cloud computing

AWS operates on a different sense of billing & costs - AWS charges per hour (rounding up) for computing and per gigabyte/month (rounding up to the nearest hour) for storage. 

Therefore, every time you turn on and off a virtual machine, AWS will charge your account for the time you used. The benefit of this system is that you only pay for what you use. The downside is that if you were to leave virtual machines on AWS running 24/7 for months, it would cost more than the hardware alone if you bought it yourself (**IT costs, power, etc. not included).

Terminology for AWS resources

AWS (Amazon Web Services) The branch of Amazon that runs & administers to their cloud services.

Console The web interface used to deploy AWS services.

EC2 (Elastic Compute Cloud) The name for Amazon's cloud computing infrastructure.

EBS (Elastic Block Storage) Persistent storage servers to which your data will be uploaded and backed up throughout your computations. 

Instances The name given to individual computing nodes within Amazon's EC2. Amazon instances are the computing nodes that are grouped together into clusters by the Starcluster program.

Key Pairs A security measure to encrypt and decrypt login information on Amazon's cloud infrastructure. Practically, this means that both you and Amazon have files that are used to log into an Amazon instance. The encryption key on your computer must be used when trying to ssh or scp files from the Amazon server, specified with the '-i' input command.

AMI (Amazon machine image) A template used by Amazon when starting an instance. It provides the instance with the operating system and applications that have saved previously as the AMI. 

S3 (Simple storage service ) A 'static' storage service that is cheaper than EBS volumes ($0.0125-$0.03/GB/mo). However, the dat are stored as objects, which means that the S3 buckets are not the same as network file systems.

Bucket A term used to describe a 'folder' of files on Amazon S3. Buckets can have folders within them, and many different files and folders. When specified, users can make their S3 buckets publicly available, as all buckets come with a unique S3 URL.


AWS account setup

Creating an account ('root' user)

Before you can get access to EC2, you'll need to create an AWS account. Even if you already have an Amazon account that you use for other services from Amazon, you'll still have to create an 'Amazon Web Services' account. 

To sign up, go to the AWS website and click 'Sign Up': http://aws.amazon.com/

You can also watch a video tutorial here that will walk you through making an account.

Note: Make sure you select 'Basic' plan (which is free). Congratulations - you just set up root access to your account.

Creating users and settings

Now that you've created your account, you should perform a few tasks that are recommended by AWS to ensure that your account is secure. You should do these extra steps because the account that you set up has root privileges which means that anyone who logs in with these credentials can get credit card info, delete users, prevent access, etc. Which is a scenario that definitely happens.    

  1. Set up multi-factor authentication for root access  
  2. Customize user sign-in link to AWS
  3. Create administrator account   
  4. Set up multi-factor authentication for user
  5. (Optional) Repeat for users from your lab

Set up multi-factor authentication for root access

Log into the AWS console (https://aws.amazon.com/console).

Navigate to the IAM (Identity Access Management) interface:

 

On this page, go to the 'Activate MFA on your root account':

Follow the instructions to set up multi-factor authentication using a virtual device (a smart phone) with the Google Authenticator app (Android & iPhone). After setting this up, you will need your smart phone every time you log into AWS as root so that you can enter the code displayed in Google Authenticator.

Customize user sign-in link to AWS

At the top of the IAM web page, you will see a URL. In the above image, the URL has already been customized, but you might see something like this:

https://123456789012.signin.aws.amazon.com/console
  • The number shown (by default) is your account number

To make this link more memorable, you can customize the account number into any phrase. In the above example, we created 'leschzinerlab' as our sign-in portal.

Create administrator account

Within in the IAM interface, you should now create the administrator account that will be used for management of AWS resources. 

Navigate to the 'Users' tab on the left-hand side of the IAM interface:

On the next pages: 

  • Create a new user (e.g. admin)
  • Select both programmatic and console access
  • Download the security credentials
    • Important! These will be used in the configuration files for Cryo-EM Cloud Tools
  • Create a custom password for this account

Now go to the 'Groups' tab (above 'Users'): 

  • Create new group
  • Name it administrator
  • Select AdministratorAccess from the list of choices

After this group is created, go to Group Actions -> Add users to group, and then select your previously created user name.

As a last step, setup multi-factor authentication for this administrator account: 

  • Go to the users tab (while still logged in as root)
  • Select the newly created user name
  • Scroll down to the bottom, and then select Manage MFA device
  • Follow instructions to sync Google Authenticator app for this account

Set up user accounts

Follow the same steps above that you used to create the administrator account, with the following exceptions: 

  • User names should be memorable for user
  • Only allow programmatic access for users (not console)
    • Make sure to save the security credentials!!
  • Do NOT place these users into the administrator group
  • Instead:
    • Create new group called 'user'
    • Add to this group:
      • EC2FullAccess
      • S3FullAccess
      • GlacierFullAccess
      • Then, create a new 'inline' policy:
        • Go to: 'Inline Policies' > 'Create Group Policy'
        • Then select 'Custom Policy'
        • Name this policy: 'EC2CloudWatch' (Name doesn't actually matter, just to name it something descriptive)
        • Copy the text policy from this link into the empty field
        • Then click 'Apply Policy'
    • Add user accounts into this group

Create keypairs for users

In addition to a user's security credentials, every user needs a keypair. 

Still logged in as root, navigate to the EC2 interface: 

  • Top left corner: Click Services -> EC2

This is known as the EC2 console. 

Select Key Pairs on the bottom of the left-hand menu:

 

On the next screen: 

  • Select 'Create key pair'
  • Input name for keypair 
    • A systematic name is helpful here, such as {username}_{region}
    • So, for username 'Mike' in US-West-2 (Oregon):
      • mike_oregon
  • Download the keypair and keep it safe and alongside the security credentials from the previous step
  • IMPORTANT: In order to use the keypair, you have to change the permissions via the command line:
    • $ chmod 600 [keypairname].pem

Request instance limit increase

Now that you are ready for cloud computing, you will need to do an initial request for the p2 instance types for GPU computing with Relion along with m4 instance types for CPU computing with Rosetta.

Navigate to the Limits section of the left-hand menu in the EC2 console screen (still logged in as root):

Scroll down to p2.8xlarge and click 'Request limit increase':

  • This will open a new page where you will need to select a few options for the request:
    • Select the region where you will be doing your calculations. Keep in mind that p2 instances are only available in Oregon (US-West-2), N. Virginia (US-East-1), and Ireland (EU-West-1).
    • Then select the following instances by adding multiple requests together. For each instance, request a limit of 5.
      • p2.xlarge
      • p2.8xlarge
      • p2.16xlarge
      • m4.16xlarge
      • c4.8xlarge
    • You will need to justify why you need these instances. Your statement should include something like the following:
      • "We are scientists using AWS for computational analysis of protein structures. Given the large sizes of our datasets and intensive analyses, we need to use GPU-based instances (p2) as well as compute optimized instances (c4, m4)."
    • Then submit your request

It typically takes 1 - 2 business days to get your limit increased, and they may not give you all that you ask for at first. They will likely say that, since you are a new user, they will give you part of what you ask for. Once you use it regularly, then can increase your limits.

Activate cost explorer & allocation tagging in Billing Management

To monitor resource utilization per-user and per-project, in the administrator account, going to Billing Management > Cost allocation tags > Activate.