Commit Graph

5 Commits

Author SHA1 Message Date
Nick Craig-Wood
07133b892d dirtree: fix performance with large directories of directories and --fast-list
Before this change if using --fast-list on a directory with more than
a few thousand directories in it DirTree.CheckParents became very slow
taking up to 24 hours for a directory with 1,000,000 directories in
it.

This is because it becomes an O(N²) operation as DirTree.Find has to
search each directory in a linear fashion as it is stored as a slice.

This patch fixes the problem by scanning the DirTree for directories
before starting the CheckParents process so it never has to call
DirTree.Find.

After the fix calling DirTree.CheckParents on a directory with
1,000,000 directories in it will take about 1 second.

Anything which calls DirTree.Find can potentially have bad performance
so in the future we should redesign the DirTree to use a different
underlying datastructure or have an index.

https://forum.rclone.org/t/almost-24-hours-cpu-compute-time-during-sync-between-two-large-s3-buckets/39375/
2023-07-03 14:09:21 +01:00
Nick Craig-Wood
7c1f2d7c84 dirtree: fix tests with -fast-list
In commit

da404dc0f2 sync,copy: Fix --fast-list --create-empty-src-dirs and --exclude

The fix caused DirTree.AddDir to be called with the root directory.
This in turn caused a spurious directory entry in the DirTree which
caused tests with the -fast-list flag to fail with directory not found
errors.
2022-06-09 14:26:43 +01:00
Nick Craig-Wood
57d5de6fba build: fix up package paths after repo move
git grep -l github.com/ncw/rclone | xargs -d'\n' perl -i~ -lpe 's|github.com/ncw/rclone|github.com/rclone/rclone|g'
goimports -w `find . -name \*.go`
2019-07-28 18:47:38 +01:00
Nick Craig-Wood
9cafeeb4b6 dirtree: make tests more reliable 2019-07-02 16:29:40 +01:00
Nick Craig-Wood
bc70bff125 fs/dirtree: factor DirTree out of fs/walk and add tests 2019-07-02 15:26:55 +01:00