mirror of
https://github.com/rclone/rclone.git
synced 2025-08-10 14:17:58 +02:00
Created Big syncs with millions of files (markdown)
51
Big-syncs-with-millions-of-files.md
Normal file
51
Big-syncs-with-millions-of-files.md
Normal file
@ -0,0 +1,51 @@
|
|||||||
|
# The problem
|
||||||
|
|
||||||
|
Rclone syncs on a directory by directory basis. If you have 10,000,000 directories with 1,000 files in and it will sync fine, but if you have a directory with 100,000,000 files in you will a lot of RAM to process it.
|
||||||
|
|
||||||
|
The log is then filled by :
|
||||||
|
```
|
||||||
|
2023/07/06 15:30:35 INFO :
|
||||||
|
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
|
||||||
|
Elapsed time: 1m0.0s
|
||||||
|
```
|
||||||
|
|
||||||
|
... although HTTP REQUEST requests are made, with HTTP RESPONSE 200 in response (--dump-headers option), no copy is made.
|
||||||
|
|
||||||
|
This problem exists until at least version rclone v1.64.0-beta.7132.f1a842081.
|
||||||
|
|
||||||
|
# Workaround
|
||||||
|
|
||||||
|
We can get around the problem as follows.
|
||||||
|
|
||||||
|
- First read file or object names
|
||||||
|
|
||||||
|
```
|
||||||
|
rclone lsf --files-only -R src:bucket | sort > src
|
||||||
|
rclone lsf --files-only -R dst:bucket | sort > dst
|
||||||
|
```
|
||||||
|
|
||||||
|
- Now use comm to find what files/objects need to be transferred
|
||||||
|
|
||||||
|
```
|
||||||
|
comm -23 src dst > need-to-transfer
|
||||||
|
comm -13 src dst > need-to-delete
|
||||||
|
```
|
||||||
|
|
||||||
|
You now have a list of files you need to transfer from src to dst and another list of files in dst that aren't in src so should likely be deleted.
|
||||||
|
|
||||||
|
Then break the need-to-transfer file up into chunks of (say) 10,000 lines with something like split -l 10000 need-to-transfer and run this on each chunk to transfer 10,000 files at a time. The --files-from and the --no-traverse means that this won't list the source or the destination so will avoid using too much memory.
|
||||||
|
|
||||||
|
```
|
||||||
|
rclone copy src:bucket dst:bucket --files-from need-to-transfer-aa --no-traverse
|
||||||
|
```
|
||||||
|
|
||||||
|
It's the same for deletion.
|
||||||
|
|
||||||
|
If you need to sync changes, you can include hash and/or size in the listing :
|
||||||
|
|
||||||
|
```
|
||||||
|
rclone lsf --files-only --format "ph" -R src:bucket | sort -t';' -k1 > src
|
||||||
|
rclone lsf --files-only --format "ph" -R dst:bucket | sort -t';' -k1 > dst
|
||||||
|
```
|
||||||
|
|
||||||
|
The `comm` tool will then filter the two fields as one.
|
Reference in New Issue
Block a user