The following workflow will guide you through the basic steps involved in using the AWS command line tools. These steps will include:

  1. Start instance
  2. Upload data
  3. Run job
  4. Stop instance

 


Download & install cryoem-cloud-tools

Follow these instructions to download and install the cryoem-cloud-tools. 


Set up AWS environment 

At the end of the installation, you will be prompted to include your AWS credentials into the aws_init.sh script, as well as the name of your research group.

  • Security credentials need to be obtained after you create your AWS account. Read more here.
  • Research group names should only include characters and numbers, no spaces or special characters/symbols (e.g. leschziner)

To test that everything is working, trying running awsls to list your resources:

$ awsls

If you see any errors, such as /bin/sh: aws: command not found, the installation did not complete successfully.


Create EBS volume

Before we can turn on an instance and upload data, you first need to create an EBS volume. As a refresher, EBS volumes are like external hard drives where your data will live, and you will plug it into instances on AWS to analyze your data.

Important: Make sure to overestimate the size required! These volumes are static, which makes it difficult to increase in size later.

Command:

$ aws_ebs_create 100 us-west-2a "Sample prep from 10/22/17"

Where:

  • 100 - 100 GB EBS volume
  • us-west-2a - availability zone for my data, which is found in region us-west-2 (Oregon)

You will now be able to see this when you type:

$ awsls -v

Make sure to make note of the volume ID, which will be used in the next step.


Boot up instance

With your newly created EBS volume, we will now boot up an AWS instance for data analysis.

Here are the following recommendations for instance types based on type of Relion-2.0 job:

  • 2D classification
    • <10,000 particles - p2.xlarge
    • 10,000 - 100,000 particles - p2.8xlarge
    • 100,000+ particles - p2.16xlarge
  • 3D classification
    • <10,000 particles - p2.xlarge
    • 10,000+ particles - p2.8xlarge
  • 3D refinement
    • p2.8xlarge for all particle scenarios. (Need more than 1 GPU for gold standard refinement)

p2 instance specifications: 

  • p2.xlarge
    • $0.90/hr
    • 8 vCPUs
    • 1 GPU (NVIDIA K80)
  • p2.8xlarge
    • $7.20/hr
    • 32 vCPUs
    • 8 GPUs (NVIDIA K80)
  • p2.16xlarge
    • $14.40/hr
    • 64 vCPUs
    • 16 GPUs (NVIDIA K80)

For our test case, let's imagine that we will be performing 2D classification on 50,000 particles. This means that we will be booting up p2.8xlarge.

NOTE: Make sure that you have increased your instance limits to allow you to request p2 instances. Read more here.

Job launching considerations:

  • When launching, make sure to include the volume ID for the recently created EBS volume from the last step
  • The instance needs to be launched in the same availability zone as the EBS volume

$ awslaunch --instance=p2.8xlarge --availZone=us-west-2a --volume=vol-09d64fa31c27af83d

It will take 3 - 5 minutes to boot up and attach the EBS volume to the instance.

Once ready, it will display a message like this on the command line:

Instance is ready! To log in:

ssh -X -i /home/michaelc/.aws/mike_oregon.pem ubuntu@52.39.181.89

Copy and paste this command into your command line, and you will be able to log into your instance.

Transfer data to AWS

Now that your instance is started, you can transfer your data to AWS.  To do so, here are commands that will move your Relion-2.0 workflow directories to AWS. These commands will transfer the entire contents of the particle stack directory (Extract/job003) to AWS, in addition to all hidden and the default_pipeline.star files

$ rsync -avzu -R "ssh -i /home/michaelc/.aws/mike_oregon.pem" Extract/job003 ubuntu@52.39.181.89:/data/

$ rsync -avzu -R "ssh -i /home/michaelc/.aws/mike_oregon.pem" .* ubuntu@52.39.181.89:/data/

$ rsync -avzu -R "ssh -i /home/michaelc/.aws/mike_oregon.pem" default_pipeline.star ubuntu@52.39.181.89:/data/

This will put all of your data onto the EBS volume, which is mounted onto the instance at /data/. Typically, we will put all pieces of the Relion project directory onto /data/.

Run Relion-2.0 job

With your data on AWS, you can now log in to run your command using the Relion-2.0 GUI.

Log into instance with ssh:

$ ssh -X -i /home/michaelc/.aws/mike_oregon.pem ubuntu@52.39.181.89

Navigate to your Relion project directory:

$ cd /data/

Then open the Relion-2.0 GUI:

$ relion &

If you've transferred all of the files shown above, then the GUI should show you all of the same information that you had on your local machine.

Go through the menu, including options that you would have normally.

Now, depending on the type of instance you started, you need to include information about the number of GPUs & MPI processes.

For all instance types, you will include the following options in the Compute tab:

  • Use parallel disc I/O  Yes
  • Number of pooled particles:  100
  • Copy particles to scratch directory: [Leave blank]
  • Combine iterations through disc?  No
  • Use GPU acceleration?  Yes
  • Which GPUs to use: [Leave blank to use all GPUs]

For the instances listed below, include the following information in the Running tab:

  • p2.xlarge
    • Number of MPI procs:  2
    • Number of threads:  2
    • Submit to queue?  No
  • p2.8xlarge
    • Number of MPI procs:  9
    • Number of threads:  3
    • Submit to queue?  No
  • p2.16xlarge 
    • Number of MPI procs:  17
    • Number of threads:  3
    • Submit to queue?  No

Then hit 'Run now!'

You will see the job information displayed into the stdout and stderr panels.

Transfer data back to local machine

When your job has finished, you will need to move your results back down to your local machine. To transfer the files:

From your local machine: 

$ rsync -avzu -R "ssh -i /home/michaelc/.aws/mike_oregon.pem" ubuntu@52.39.181.89:/data/Class2D/job004/ .

$ rysnc -avzu -R "ssh -i /home/michaelc/.aws/mike_oregon.pem" ubuntu@52.39.181.89:/data/.* .

$ rysnc -avzu -R "ssh -i /home/michaelc/.aws/mike_oregon.pem" ubuntu@52.39.181.89:/data/default_pipeline.star .

Turn off instance

Before you can successfully terminate an instance, ALL processes must be stopped. This means:

  • Kill all jobs running (Check running jobs with 'top' command)
  • Log out of all shells into AWS instance

When these conditions are satisfied:

$ awskill i-09a6f5bf81b94a611

When this completes successfully, you will see the following message:

Removing instance ...

Success!

As always, the instance ID can be obtained from the command awsls.

Delete EBS volume

When you are finished with your data on the EBS volume, you need to delete the volume. This is because AWS charges you $0.10 per gigabyte per month (down to the hour).

In order to delete a volume the follow condition must be met:

  • The volume must not be attached to any instance, which means it is in the 'available' state

Important: This cannot be undone - so be careful!

$ aws_ebs_delete  vol-09d64fa31c27af83d