Updated Big syncs with millions of files (markdown)

lc63
2023-07-21 17:14:37 +02:00
parent 16a3d3eb62
commit 14d18d213a

@ -2,7 +2,7 @@
Rclone syncs on a directory by directory basis. If you have 10,000,000 directories with 1,000 files in and it will sync fine, but if you have a directory with 100,000,000 files in you will a lot of RAM to process it.
The log is then filled by :
The log is then filled by:
```
2023/07/06 15:30:35 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
@ -33,15 +33,14 @@ comm -13 src dst > need-to-delete
You now have a list of files you need to transfer from src to dst and another list of files in dst that aren't in src so should likely be deleted.
Then break the need-to-transfer file up into chunks of (say) 10,000 lines with something like `split -l 10000 need-to-transfer` and run this on each chunk to transfer 10,000 files at a time. The `--files-from` and the `--no-traverse` means that this won't list the source or the destination
Then break the need-to-transfer file up into chunks of (say) 10,000 lines with something like `split -l 10000 need-to-transfer` and run this on each chunk to transfer 10,000 files at a time. The `--files-from` and the `--no-traverse` means that this won't list the source or the destination:
```
rclone copy src:bucket dst:bucket --files-from need-to-transfer-aa --no-traverse
rclone copy src:bucket dst:bucket --files-from need-to-transfera --no-traverse
rclone delete src:bucket dst:bucket --files-from need-to-delete --no-traverse
```
It's the same for deletion.
If you need to sync changes, you can include hash and/or size in the listing :
If you need to sync changes, you can include hash and/or size in the listing. For example, with hashes:
```
rclone lsf --files-only --format "ph" -R src:bucket | sort -t';' -k1 > src