sync: implement --list-cutoff to allow on disk sorting for reduced memory use

Before this change, rclone had to load an entire directory into RAM in
order to sort it so it could be synced.

With directories with millions of entries, this used too much memory.

This fixes the probem by using an on disk sort when there are more
than --list-cutoff entries in a directory.

Fixes #7974
This commit is contained in:
Nick Craig-Wood
2024-12-09 11:30:34 +00:00
parent 0148bd4668
commit 385465bfa9
9 changed files with 493 additions and 18 deletions

View File

@@ -1448,6 +1448,19 @@ backends and the VFS. There are individual flags for just enabling it
for the VFS `--vfs-links` and the local backend `--local-links` if
required.
### --list-cutoff N {#list-cutoff}
When syncing rclone needs to sort directory entries before comparing
them. Below this threshold (1,000,000) by default, rclone will store
the directory entries in memory. 1,000,000 entries will take approx
1GB of RAM to store. Above this threshold rclone will store directory
entries on disk and sort them without using a lot of memory.
Doing this is slightly less efficient then sorting them in memory and
will only work well for the bucket based backends (eg s3, b2,
azureblob, swift) but these are the only backends likely to have
millions of entries in a directory.
### --log-file=FILE ###
Log all of rclone's output to FILE. This is not active by default.

View File

@@ -233,12 +233,18 @@ value, say `export GOGC=20`. This will make the garbage collector
work harder, reducing memory size at the expense of CPU usage.
The most common cause of rclone using lots of memory is a single
directory with millions of files in. Rclone has to load this entirely
into memory as rclone objects. Each rclone object takes 0.5k-1k of
memory. There is
directory with millions of files in.
Before rclone v1.70 has to load this entirely into memory as rclone
objects. Each rclone object takes 0.5k-1k of memory. There is
[a workaround for this](https://github.com/rclone/rclone/wiki/Big-syncs-with-millions-of-files)
which involves a bit of scripting.
However with rclone v1.70 and later rclone will automatically save
directory entries to disk when a directory with more than
[`--list-cutoff`](/docs/#list-cutoff) (1,000,000 by default) entries
is detected.
From v1.70 rclone also has the [--max-buffer-memory](/docs/#max-buffer-memory)
flag which helps particularly when multi-thread transfers are using
too much memory.