Table of Contents
The problem
Rclone syncs on a directory by directory basis. If you have 10,000,000 directories with 1,000 files in and rclone will sync fine using hardly any memory, but if you have a directory with 100,000,000 files in you will need a lot of RAM to process it.
These situations are most common with the s3 backend.
Rclone uses approximately 1GB of RAM per 1,000,000 files in a directory when processing an s3 bucket.
Until the process is killed for using too much memory, the log is then filled by, :
2023/07/06 15:30:35 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 1m0.0s
... although HTTP REQUEST requests are made, with HTTP RESPONSE 200 in response (--dump-headers
option), no copy is made as rclone waits until listing the entire directory before doing any transfers.
This problem exists until at least version rclone v1.67.0
.
Workaround
We can get around the problem as follows.
- First read file or object names
rclone lsf --files-only -R src:bucket | sort > src
rclone lsf --files-only -R dst:bucket | sort > dst
- Now use
comm
to find what files/objects need to be transferred
comm -23 src dst > need-to-transfer
comm -13 src dst > need-to-delete
You now have a list of files you need to transfer from src
to dst
and another list of files in dst
that aren't in src
so should likely be deleted.
Then break the need-to-transfer file up into chunks of (say) 10,000 lines with something like split -l 10000 need-to-transfer
and run this on each chunk to transfer 10,000 files at a time. The --files-from
and the --no-traverse
means that this won't list the source or the destination:
rclone copy src:bucket dst:bucket --files-from need-to-transfer --no-traverse
rclone delete src:bucket dst:bucket --files-from need-to-delete --no-traverse
If you need to sync changes, you can include hash and/or size in the listing. For example, with hashes:
rclone lsf --files-only --format "ph" -R src:bucket | sort -t';' -k1 > src
rclone lsf --files-only --format "ph" -R dst:bucket | sort -t';' -k1 > dst
The comm
tool will then filter the two fields as one.