Fixing slow AWS uploads

I usually store private datasets on my NAS. This lets me do a significant amount of prototyping locally; you can do a lot with a 10gbps network and a modern laptop1. But sometimes you really do need 48 or 96 CPUs chugging on a problem at the same time.

For non-GPU work I usually spin up a beefy box on AWS or GCP. But as I was rsyncing a particularly big set of files I walked away for a coffee and came back to upload speeds around 2MB/s. Sometimes it would drop to 500kbps and sometimes jump to 10MB but never push much higher. Let's see if you can spot the issue right off the bat:

rsync -av --progress \ --exclude='.*' \ -e "ssh -i ~/.ssh/primary-laptop.pem" \ /Volumes/Common_Drive/dataset \ [email protected]:~/dataset/

If you can - then congratulations! No need for this blog post. But if you didn't these results just don't make any sense:

  • Client has a 10 Gbps symmetric fiber connection (Sonic in SF)
  • Powerful EC2 instance: c5d.12xlarge so network nor CPU used by rsync should slow it down
  • This box has an SSD

I provisioned this box with a large NVMe storage for fast access to local compute. The d in c5d.12xlarge actually means "instance store volumes included." If I'm going to be paying for 48 CPUs I want to be fully saturating them.2

But I made the mistake of copying my rsync from a previous run to my local homelab. Where the home directory is a perfectly reasonable place to dump a folder until you figure out its permanent location. It's all backed by the same SSD. But on AWS that home directory has dragons. By default it's booted to a slow EBS network-attached storage device.

Specifically:

/dev/root → Amazon Elastic Block Store (EBS)

So my data path looked like:

Laptop → Network (Internet) → EC2 → Network (internal) -> EBS

With TCP, the receiver controls the pace. So even though it looked like an overall network issue, that was just the backpressure from the EBS "disk" leading to slower writes from the EC2 image which then in turn looked like slower network speeds on my end.

EBS volumes - for what it's worth - are pretty basic disks with low default throughput caps and IOPS limits. You can customize them to be higher but you're almost always better off using these nvme disks if you are doing data processing.

The Fix

If you suspect this might be happening to you, check lsblk:

nvme1n1  Amazon EC2 NVMe Instance Storage
nvme2n1  Amazon EC2 NVMe Instance Storage

Format and mount one:

sudo mkfs.ext4 -F /dev/nvme1n1
sudo mount /dev/nvme1n1 /mnt/nvme

Then change your rsync target to:

/mnt/nvme/

And in my case I immediately saw speeds jump from ~2 MB/s to ~26 MB/s.

Conclusion

Beware the home directory! And make use of your local disks. Your pipelines will thank you.

Footnotes

  1. Especially for data processing when using an OLAP database.

  2. Disk speeds be damned!

/dev/newsletter

Technical deep dives on machine learning research, engineering systems, and building scalable products. Published weekly.

Unsubscribe anytime. No spam, promise.

Ctrl+K
to Preview