Instructions for Downloading 2023

Instructions for Downloading

Once your project data is ready to download you will receive a download token via e-mail. In the examples below the token is "sKbcd69HNOIdoje9KKooo6969klnNIdA", which just leads to some small test files. Anyone who knows the token can access your data so be careful who you share it with. You may view the file listing using a web browser at https://transfer.genomics.ed.ac.uk/<token>, but there is no good way to download multiple large files with a web browser, so you need to use encrypted FTP.

 

Using RClone on Eddie (for Edinburgh Datastore users)

The Eddie compute cluster is directly connected to the Edinburgh University datastore, so downloading via Eddie is a fast way to get your files to your datastore area without going via your local PC (if you are at home then transferring files via your home internet and back will be very slow).

1. Log in to a staging node (specific to Eddie)

The Eddie cluster requires that large transfers are done via the 'staging' nodes, not the login nodes so after logging in do this:

$ qlogin -q staging

All being well, you should quickly get a command prompt on a staging node. Datastore files are now accessible under /exports/csce/datastore.

2. Get the Rclone Client program

This is not (currently) available as a module on Eddie, but does come as a single large executable so you do not need to compile or install it. Just download and unpack the binary file from the archive.

$ cd /tmp

$ wget https://downloads.rclone.org/rclone-current-linux-amd64.zip

$ unzip rclone-current-linux-amd64.zip

$ mkdir -p ~/bin

$ mv -t ~/bin rclone-*-linux-amd64/rclone

$ chmod +x ~/bin/rclone

3. Configure RClone to know about our server

You only need to do this once, no matter how many projects you download. The command below will simply write a small config file to your home directory. You may run "~/bin/rclone config paths" to confirm where the file is saved.

$ ~/bin/rclone config create edge webdav url=https://transfer.genomics.ed.ac.uk:9443 user=anonymous pass=rclone pacer_min_sleep=0

4. Get your data

Rclone has many different modes, but our recommended way to run Rclone is like so:

$ RCLONE_ALIAS_REMOTE=edge:sKbcd69HNOIdoje9KKooo6969klnNIdA/ ~/bin/rclone copy -P --inplace :alias: /where/you/want/to/save

Notes on this:

  1. Replace sKbcd69HNOIdoje9KKooo6969klnNIdA with your own secret token.
  2. The use of :alias: as a placeholder in the above command avoids the secret token being revealed to any other users logged into that staging node (eg. by running "ps -wwAf").
  3. The punctuation is important. The token must be preceded by "edge:" and followed by "/" or you will see an error like "Failed to create file system for ":alias:": invalid response"
  4. If the directory "/where/you/want/to/save" does not exist, Rclone will create it for you. Use "." to save into the current working directory.
  5. You can specify individual files or subdirectories to copy by putting them directly after the :alias: with no spaces - eg ":alias:example/hello.txt"
  6. The -P flag shows you download speed and progress, and the --inplace option avoids moving files to the "/tmp" filesystem which is inefficient on Eddie.
  7. Rclone uses parallel fetching by default to optimise transfer speeds. You should not need to change the threading options.
  8. Run "~/bin/rclone copy help" or see https://rclone.org/docs for advanced options

5. Verify the download

Rclone does not apply robust verification and data can occasionally be corrupted in transit. After downloading, use the file checksums to be sure that all were fully and correctly transferred. For each md5sums.txt file included with the data, change to the directory containing that file and run: 

$ md5sum -c md5sums.txt

 

Other Linux, Mac Command Line

In most cases, using Rclone as documented above is the best way to transfer your data. The Rclone software is available for Intel and M-series Macs as well as Linux from https://rclone.org/downloads. Follow the instructions above to configure and run Rclone.

We used to recommend the lftp software which is pre-installed on many Linux systems. This does work, but there is a known bug with this software where directories with thousands of files may only be partially downloaded, and also the download speeds can be slow. We have found that both of these can be mitigated by using the "sftp:" (as opposed to the default "lftp:") protocol, as shown below. However you should always check the md5sums of everything you have downloaded to ensure you really got everything.

How to download with lftp software on the Linux command line:

$ cd /where/you/want/to/save

$ lftp sftp://transfer.genomics.ed.ac.uk:33001 <<<'login anonymous lftp@ ; mirror -vv sKbcd69HNOIdoje9KKooo6969klnNIdA .'

Obviously, substitute your own token for the one above. The <<< shell syntax avoids making the secret token visible to other users who are logged into the system.

It's also possible to use wget. This doesn't work on Eddie because their version of wget (as at October 2020) does not support encrypted FTP and will say "Unsupported scheme ‘ftps’", but on more up-to-date systems it should be fine.

$ wget -crnH --cut-dirs=1 -i - <<<'ftps://transfer.genomics.ed.ac.uk/sKbcd69HNOIdoje9KKooo6969klnNIdA/.'

Again, the <<< syntax avoids making the secret token visible to other users on your system (if you have any!). The other arguments to wget enable some sensible recursive downloading options.

Both the wget and lftp methods will resume partial downloads, but to be sure you have fetched all files intact you should perform an md5sum check. After downloading, verify the file checksums to be sure that all were fully and correctly transferred. For each md5sums.txt file included with the data, change to the directory containing that file and run: 

$ md5sum -c md5sums.txt

 

Windows and Mac GUI

On both platforms, the free Cyberduck client may be used to download the files via encrypted FTP. 

  1. Launch Cyberduck and click the "Open Connection" button.
  2. The top dropdown will default to "FTP (File Transfer Protocol)". Select the dropdown and choose "FTP-SSL".
  3. In the "Server:" field, enter "transfer.genomics.ed.ac.uk"
  4. Check the "Anonymous Login" checkbox.
  5. Click the "More Options" button and a few more options will appear. In the "Path:" field, enter the token.
  6. Click connect and the available files will appear in the browser pane. Files can be dragged and dropped from this pane. 

On Windows, WinSCP also works well, and is also free. To connect to our server you must do the following:

  1. Start a new session with the file protocol as FTP (not SFTP!)
  2. Set the encryption to "TLS/SSL Explicit Encryption"
  3. Set the host name to transfer.genomics.ed.ac.uk
  4. Enable anonymous login
  5. Click advanced, and under Environment > Directories > Remote directory put the download token (eg. sKbcd69HNOIdoje9KKooo6969klnNIdA)

If your connection to the server fails, double-check each step 1-4. If you are connected to the server but do not see any files available, then check that you added the correct token at step 5.

Unlike MacOS and Linux, Windows does not have a built-in md5sum checker tool, or at least not one that can check multiple files at once. We suggest using the free one from http://getmd5checker.com/, or a MinGW command-line version like the one provided with MSYS2 - see https://www.msys2.org/

Still stuck?

Please contact the Edinburgh Genomics team (eg-bioinformatics@mlist.is.ed.ac.uk) if you are having any problems with downloading or checksumming your files.