After months of tuning AWS instances, configurations, and job execution, we finally have a scalable movie alignment algorithm that can align 1500 movies in less than two hours (using 80 GPUs). This means that it takes 3 - 4 minutes per movie per GPU, which includes the time it takes to move the data around.

Launch this routine from the Relion-2.0 GUI, as you would any other command for AWS from Relion-2.0. See more about movie alignment here.

How did we do it?

Three big hurdles had to be surmounted:

  1. Enhanced networking on AWS
  2. Explicit assignment of tasks to CPUs
  3. High input-output-per-second (IOPS) SSD-backed EBS volumes

1.Enhanced networking on AWS

We are doing all movie alignments using MOTIONCORR and MOTIONCOR2 on GPUs, which means we are exclusively using the p2 generation of instances. Fortunately, this means that we could use 10G and 20G for networking into these instances for p2.8xlarge and p2.16xlarge instances, respectively.

It was easy to add this feature to our AMI, as we could just follow the instructions laid out here: Enhanced networking for AWS.

This allowed us to pull down 8 or 16 movies at a time using multiple CPU threads, letting us saturate the 10G or 20G networking. Effectively, this means we can download 8 or 16 movies in ~3-4 minutes (effective download rate of 400 MB/sec).

2. Explicit assignment of tasks to CPUs

Given that each machine has more CPUs than GPUs, we could divide up the tasks of:

  1. Movie alignment
  2. Downloading the next movies
  3. Uploading aligned movies

We took advantage of the taskset in Linux to make sure these tasks do not get co-assigned to the same CPUs.

3. High input-output-per-second (IOPS) SSD-backed EBS volumes

Finally, we realized that we were having delayed downloading/uploading despite orchestrating our jobs over separate CPU pools. We tracked down the problem to be an input/output (I/O) issue, where the 'standard' gp2 SSD backed EBS volumes did not give us enough I/O per second.

To solve this, we turned to the 'io1' EBS volume and we max-ed out the number of IOPS for the volume. This costs more than a gp2, but we are only using it for an hour or two, so the cost remains minimal (~$3/hr).

Conclusion

Once we did this, we had complete separation of tasks on our machines, allowing us to download, upload, and run movie alignments without them influencing each other.