Your Token
Once your project data is ready to download you will receive a download token via e-mail. In the examples below the token is sKbcd69HNOIdoje9KKooo6969klnNIdA, which just leads to some small test files. Anyone who knows the token can access your data so be careful who you share it with. You may view the file listing using a web browser at https://transfer.genomics.ed.ac.uk/sKbcd69HNOIdoje9KKooo6969klnNIdA, but there is no good way to download multiple large files with a web browser, so you need to use encrypted FTP.
Using Rclone on Eddie (for Edinburgh Datastore users)
The Eddie compute cluster is directly connected to the Edinburgh University datastore, so downloading via Eddie is a fast way to get your files to your datastore area without going via your local PC (if you are at home then transferring files via your home internet and back will be very slow).
1. Log in to a staging node (specific to Eddie)
Eddie requires that large transfers are done via the ‘staging’ nodes so after logging in, do this:
$ qlogin -q staging
All being well, you should get a command prompt on a staging node. Datastore files are now accessible under /exports/csce/datastore.
2. Get the Rclone Client program
This is not (currently) available as a module on Eddie, but does come as a single large executable so you do not need to compile or install it. Just download and unpack the binary file from the ZIP archive.
$ wget https://downloads.rclone.org/rclone-current-linux-amd64.zip
$ unzip rclone-current-linux-amd64.zip
$ mkdir -p ~/bin
$ mv -t ~/bin rclone-*-linux-amd64/rclone
$ chmod +x ~/bin/rclone
You do not need to keep the ZIP file or the other files unpacked from the archive, so it is safe to delete them.
3. Configure Rclone to know about our server
You only need to do this once, no matter how many projects you download. The command below will simply write a small config file to your home directory. You may run “~/bin/rclone config paths” to confirm where the file is saved.
$ ~/bin/rclone config create edge webdav url=https://transfer.genomics.ed.ac.uk:9443 user=anonymous pass=rclone pacer_min_sleep=0
4. Get your data
Rclone has many different modes, but our recommended way to run Rclone is like so:
$ RCLONE_ALIAS_REMOTE=edge:sKbcd69HNOIdoje9KKooo6969klnNIdA/ ~/bin/rclone copy -P --inplace :alias: /where/you/want/to/save
Notes on this:
- Replace sKbcd69HNOIdoje9KKooo6969klnNIdA with your own secret token.
- The use of :alias: as a placeholder in the above command avoids the secret token being revealed to any other users logged into that staging node (eg. by running “ps -wwAf”).
- The punctuation is important. The token must be preceded by “edge:” and followed by “/” or you will see an error like “Failed to create file system for “:alias:”: invalid response”
- If the directory “/where/you/want/to/save” does not exist, Rclone will create it for you. Use “.” to save into the current working directory.
- You can specify individual files or subdirectories to copy by putting them directly after the :alias: with no spaces – eg “:alias:example/hello.txt”
- The -P flag shows you download speed and progress, and the –inplace option avoids moving files to the “/tmp” filesystem which is inefficient on Eddie.
- Rclone uses parallel fetching by default to optimise transfer speeds. You should not need to change the threading options. It will also resume partial downloads if some files are already there.
- Run “~/bin/rclone copy help” or see https://rclone.org/docs for advanced options
5. Verify the download
Rclone does not apply robust verification and data can occasionally be corrupted in transit. After downloading, use the file checksums to be sure that all were fully and correctly transferred. For each md5sums.txt file included with the data, change to the directory containing that file and run:
$ md5sum -c md5sums.txt
Other Linux, Mac Command Line
In most cases, using Rclone as documented above is the best way to transfer your data. The Rclone software is available for Intel and M-series Macs as well as Linux from https://rclone.org/downloads. Follow the instructions above to configure and run Rclone.
We used to recommend the lftp software which is pre-installed on many Linux systems. This does work, but there is a known bug with this software where directories with thousands of files may only be partially downloaded, and also the download speeds can be slow. We have found that both of these can be mitigated by using the “sftp:” (as opposed to the default “lftp:”) protocol, as shown below. However you should always check the md5sums of everything you have downloaded to ensure you really got everything.
How to download with lftp software on the Linux command line:
$ cd /where/you/want/to/save
$ lftp sftp://transfer.genomics.ed.ac.uk:33001 <<<'login anonymous lftp@ ; mirror -vv sKbcd69HNOIdoje9KKooo6969klnNIdA .'
Obviously, substitute your own token for the one above. The <<< shell syntax avoids making the secret token visible to other users who are logged into the system.
It’s also possible to use wget. This doesn’t work on Eddie because their version of wget (as at October 2020) does not support encrypted FTP and will say “Unsupported scheme ‘ftps’”, but on more up-to-date systems it should be fine.
$ wget -crnH -l 10 --cut-dirs=1 -i - <<<'ftps://transfer.genomics.ed.ac.uk/sKbcd69HNOIdoje9KKooo6969klnNIdA/.'
Again, the <<< syntax avoids making the secret token visible to other users on your system (if you have any!). The other arguments to wget enable some sensible recursive downloading options.
Both the wget and lfpt methods will resume partial downloads, but to be sure you have fetched all files intact you should use the md5sum check as noted above.
Windows and Mac FTP Clients
On both platforms, the free Cyberduck client may be used to download the files via encrypted FTP but have slightly different instructions. You will need the delivery token and the project ID.
- Launch Cyberduck and click the “Open Connection” button.
- The top dropdown will default to “FTP (File Transfer Protocol)”. Select the dropdown and choose “FTP-SSL”.
- In the “Server:” field, enter “transfer.genomics.ed.ac.uk”
- Check the “Anonymous Login” checkbox.
- Click the “More Options” button and a few more options will appear. In the “Path:” field, enter the token then a forward slash, then the project ID (for example: sKbcd69HNOIdoje9KKooo6969klnNIdA/20178_Bloggs_Joe)
- Click connect and the available files will appear in the browser pane. Files can be dragged and dropped from this pane.
- Launch Cyberduck and click the “Bookmark Menu” and select “New Bookmark”
- The top dropdown will default to “FTP (File Transfer Protocol)”. Select the dropdown and choose “FTP-SSL”.
- In the “Server:” field, enter “transfer.genomics.ed.ac.uk”
- Check the “Anonymous Login” checkbox.
- Click the “More Options” button and a few more options will appear. In the “Path:” field, enter the token, then a forward slash, then the project ID (for example: sKbcd69HNOIdoje9KKooo6969klnNIdA/20178_Bloggs_Joe)
- Close this dialogue box, and click on the bookmark you have created.
- The available files will appear in the browser pane. Files can be dragged and dropped from this pane.
On Windows, WinSCP also works well, and is also free. To connect to our server you must do the following:
- Start a new session with the file protocol as FTP (not SFTP!)
- Set the encryption to “TLS/SSL Explicit Encryption”
- Set the host name to transfer.genomics.ed.ac.uk
- Enable anonymous login
- Click advanced, and under Environment > Directories > Remote directory put the download token (eg. sKbcd69HNOIdoje9KKooo6969klnNIdA)