diff --git a/Big-syncs-with-millions-of-files.md b/Big-syncs-with-millions-of-files.md index 46458e1..19bc4c8 100644 --- a/Big-syncs-with-millions-of-files.md +++ b/Big-syncs-with-millions-of-files.md @@ -2,7 +2,7 @@ Rclone syncs on a directory by directory basis. If you have 10,000,000 directories with 1,000 files in and it will sync fine, but if you have a directory with 100,000,000 files in you will a lot of RAM to process it. -The log is then filled by : +The log is then filled by: ``` 2023/07/06 15:30:35 INFO : Transferred: 0 B / 0 B, -, 0 B/s, ETA - @@ -33,15 +33,14 @@ comm -13 src dst > need-to-delete You now have a list of files you need to transfer from src to dst and another list of files in dst that aren't in src so should likely be deleted. -Then break the need-to-transfer file up into chunks of (say) 10,000 lines with something like `split -l 10000 need-to-transfer` and run this on each chunk to transfer 10,000 files at a time. The `--files-from` and the `--no-traverse` means that this won't list the source or the destination +Then break the need-to-transfer file up into chunks of (say) 10,000 lines with something like `split -l 10000 need-to-transfer` and run this on each chunk to transfer 10,000 files at a time. The `--files-from` and the `--no-traverse` means that this won't list the source or the destination: ``` -rclone copy src:bucket dst:bucket --files-from need-to-transfer-aa --no-traverse +rclone copy src:bucket dst:bucket --files-from need-to-transfera --no-traverse +rclone delete src:bucket dst:bucket --files-from need-to-delete --no-traverse ``` -It's the same for deletion. - -If you need to sync changes, you can include hash and/or size in the listing : +If you need to sync changes, you can include hash and/or size in the listing. For example, with hashes: ``` rclone lsf --files-only --format "ph" -R src:bucket | sort -t';' -k1 > src