mirror of
https://github.com/rclone/rclone.git
synced 2024-11-22 08:23:47 +01:00
Add links and lightly edit
parent
90987aa34a
commit
0e9e3124aa
@ -1,17 +1,22 @@
|
|||||||
# The problem
|
# The problem
|
||||||
|
|
||||||
Rclone syncs on a directory by directory basis. If you have 10,000,000 directories with 1,000 files in and it will sync fine, but if you have a directory with 100,000,000 files in you will need a lot of RAM to process it.
|
Rclone syncs on a directory by directory basis. If you have 10,000,000 directories with 1,000 files in and rclone will sync fine using hardly any memory, but if you have a directory with 100,000,000 files in you will need a lot of RAM to process it.
|
||||||
|
|
||||||
|
These situations are most common with the s3 backend.
|
||||||
|
|
||||||
|
Rclone uses approximately 1GB of RAM per 1,000,000 files in a directory when processing an s3 bucket.
|
||||||
|
|
||||||
|
Until the process is killed for using too much memory, the log is then filled by, :
|
||||||
|
|
||||||
Until the OOM killer kills the process, the log is then filled by, :
|
|
||||||
```
|
```
|
||||||
2023/07/06 15:30:35 INFO :
|
2023/07/06 15:30:35 INFO :
|
||||||
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
|
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
|
||||||
Elapsed time: 1m0.0s
|
Elapsed time: 1m0.0s
|
||||||
```
|
```
|
||||||
|
|
||||||
... although HTTP REQUEST requests are made, with HTTP RESPONSE 200 in response (`--dump-headers` option), no copy is made.
|
... although HTTP REQUEST requests are made, with HTTP RESPONSE 200 in response (`--dump-headers` option), no copy is made as rclone waits until listing the entire directory before doing any transfers.
|
||||||
|
|
||||||
This problem exists until at least version `rclone v1.64.0-beta.7132.f1a842081`.
|
This problem exists until at least version `rclone v1.67.0`.
|
||||||
|
|
||||||
# Workaround
|
# Workaround
|
||||||
|
|
||||||
@ -24,14 +29,14 @@ rclone lsf --files-only -R src:bucket | sort > src
|
|||||||
rclone lsf --files-only -R dst:bucket | sort > dst
|
rclone lsf --files-only -R dst:bucket | sort > dst
|
||||||
```
|
```
|
||||||
|
|
||||||
- Now use comm to find what files/objects need to be transferred
|
- Now use `comm` to find what files/objects need to be transferred
|
||||||
|
|
||||||
```
|
```
|
||||||
comm -23 src dst > need-to-transfer
|
comm -23 src dst > need-to-transfer
|
||||||
comm -13 src dst > need-to-delete
|
comm -13 src dst > need-to-delete
|
||||||
```
|
```
|
||||||
|
|
||||||
You now have a list of files you need to transfer from src to dst and another list of files in dst that aren't in src so should likely be deleted.
|
You now have a list of files you need to transfer from `src` to `dst` and another list of files in `dst` that aren't in `src` so should likely be deleted.
|
||||||
|
|
||||||
Then break the need-to-transfer file up into chunks of (say) 10,000 lines with something like `split -l 10000 need-to-transfer` and run this on each chunk to transfer 10,000 files at a time. The `--files-from` and the `--no-traverse` means that this won't list the source or the destination:
|
Then break the need-to-transfer file up into chunks of (say) 10,000 lines with something like `split -l 10000 need-to-transfer` and run this on each chunk to transfer 10,000 files at a time. The `--files-from` and the `--no-traverse` means that this won't list the source or the destination:
|
||||||
|
|
||||||
@ -48,3 +53,8 @@ rclone lsf --files-only --format "ph" -R dst:bucket | sort -t';' -k1 > dst
|
|||||||
```
|
```
|
||||||
|
|
||||||
The `comm` tool will then filter the two fields as one.
|
The `comm` tool will then filter the two fields as one.
|
||||||
|
|
||||||
|
# See Also
|
||||||
|
|
||||||
|
- [the forum post where this was technique was proposed](https://forum.rclone.org/t/rclone-sync-s3-to-s3-runs-for-hours-and-copy-nothing/39687/23)
|
||||||
|
- [issue #7974](https://github.com/rclone/rclone/issues/7974).
|
||||||
|
Loading…
Reference in New Issue
Block a user