Currently rclone check supports matching two file trees by sizes and hashes.
This change adds support for SUM files produced by GNU utilities like sha1sum.
Fixes#1005
Note: checksum by default checks, hashsum by default prints sums.
New flag is named "--checkfile" but carries hash name.
Summary of introduced command forms:
```
rclone check sums.sha1 remote:path --checkfile sha1
rclone checksum sha1 sums.sha1 remote:path
rclone hashsum sha1 remote:path --checkfile sums.sha1
rclone sha1sum remote:path --checkfile sums.sha1
rclone md5sum remote:path --checkfile sums.md5
```
- Unify all hash names as lowercase alphanumerics without punctuation.
- Legacy names continue to work but disappear from docs, they can be depreciated or dropped later.
- Make rclone hashsum print supported hash list in case of wrong spelling.
- Update documentation.
Fixes#5071Fixes#4841
Before this change when the context was cancelled (due to
--max-duration for example) this could deadlock when uploading
multipart uploads.
This change fixes the problem by introducing another go routine to
monitor the context and close the pipe with an error when the context
errors.
This change checks the context whenever rclone might retry, and
doesn't retry if the current context has an error.
This fixes the pathological behaviour of `--max-duration` refusing to
exit because all the context deadline exceeded errors were being
retried.
This unfortunately meant changing the shouldRetry logic in every
backend and doing a lot of context propagation.
See: https://forum.rclone.org/t/add-flag-to-exit-immediately-when-max-duration-reached/22723
This change makes dedupe recursively count elements in same-named directories
and make the largest one primary. This allows to minimize the amount of data
moved (or at least the amount of API calls) when dedupe merges them.
It also adds a new fs.Object interface `ParentIDer` with function `ParentID` and
implements it for the drive and opendrive backends. This function returns
parent directory ID for objects on filesystems that allow same-named dirs.
We use it to correctly count sizes of same-named directories.
Fixes#2568
Co-authored-by: Ivan Andreev <ivandeex@gmail.com>
This commit modifies the operations.hashSum function by adding an alternate code path. This code path is triggered by passing downloadFlag = True. When activated, rclone will download files from the remote and hash them locally. downloadFlag = False preserves the existing behavior of using the remote to retrieve the hash.
This commit modifies HashLister to support the new hashSum method as well as consolidating the roles of HashLister, HashListerBase64, Md5sum, and Sha1sum. The printing of hashes from the function defined in HashLister has been revised to work with --progress. There are light changes to operations.syncFprintf and cmd.startProgress for this.
The unit test operations_test.TestHashSums is modified to support this change and test the download functionality.
The command functions hashsum, md5sum, sha1sum, and dbhashsum are modified to support this change. A download flag has been added and an output-file flag has been added. The output-file flag writes hashes to a file instead of stdout to avoid the need to redirect stdout.
This is done by making fs.Config private and attaching it to the
context instead.
The Config should be obtained with fs.GetConfig and fs.AddConfig
should be used to get a new mutable config that can be changed.
This adds a context.Context parameter to NewFs and related calls.
This is necessary as part of reading config from the context -
backends need to be able to read the global config.
Fix the copy and move operations that broke in 127f0fc when copying directly
to a remote without a specific destination.
Signed-off-by: Anagh Kumar Baranwal <6824881+darthShadow@users.noreply.github.com>
Before this change we counted the final summary error as an error,
producing confusing log messages like:
Failed to check with 54 errors: last error was: 53 differences found
This change marks the summary error as already being counted, so the
error message becomes:
Failed to check with 53 errors: last error was: 53 differences found
This change also returns a listing failure in preference to a summary error.
See: https://forum.rclone.org/t/slow-checksum-validation/19763/22
Before this change rclone would emit the message
--checksum is in use but the source and destination have no hashes in common; falling back to --size-only
When the source or destination hash was missing as well as when the
source and destination had no hashes in common.
This first case is very confusing for users when the source and
destination do have a hash in common.
This change fixes that and makes sure the error message is not emitted
on missing hashes even when there is a hash in common.
See: https://forum.rclone.org/t/source-and-destination-have-no-hashes-in-common-for-unencrypted-drive-to-local-sync/19531
Before ths fix --cutoff-mode soft and cautious would emit a Fatal
error which stopped the sync immediately.
This fix introduces a new error which is checked in the sync error
processing which stops the sync gracefully.
Fixes#4576
- add a directory to the optional Purge interface
- fix up all the backends
- add an additional integration test to test for the feature
- use the new feature in operations.Purge
Many of the backends had been prepared in advance for this so the
change was trivial for them.
This is preparation for getting the Accounting to check the context,
buf first we need to get it in place. Since this is one of those
changes that makes lots of noise, this is in a seperate commit.
This adds expire and unlink fields to the PublicLink interface.
This fixes up the affected backends and removes unlink parameters
where they are present.
Before this change if you specified --hash MD5 in rclone lsf it would
calculate all the hashes and just return the MD5 hash which was very
slow on the local backend.
Likewise specifying --hash on rclone lsjson was equally slow.
This change introduces the --hash-type flag (and corresponding
internal parameter) so that the hashes required can be selected in
lsjson.
This is used internally in lsf when the --hash parameter is selected
to speed up the hashing by only hashing with the one hash specified.
Fixes#4181
Before this change if there were two files with the same name and the
same ID in the same directory, dedupe would delete one of them but
since these are actually the same file (with the same ID) then both
files would be deleted leading to data loss.
This should never actually happen, however it did happen as part of a
bug introduced in rclone which was fixed by
dfc7215bf9 drive: fix duplicate items when using --drive-shared-with-me #4018
This change checks to see if any of the duplicates have the same ID
and if they do it refuses to delete them.
Before this change backends which introduce overhead (eg crypt) were
failing to upload the first file.
This change increases the threshold to 2k to allow the first file to
go through even with some overhead but the next file to definitely
fail.
Before this change we checked the transfer was out of range only
before the Read call. This means that we returned all the data to the
reader before declaring an error. This means that some backends wrote
the file even though an error was returned.
This fix checks the transfer after the Read as well, and chops the
excess characters off the read data if we are over the limit so that
we don't ever deliver all the data.
This fixes the tests introduced as part of 6f1766dd9e and #2672
on backends other than local.
Before this change the exit code for transfer limit exceeded was
incorrect. This was because the `resolveExitCode` function unwraps the
error thus reading the underlying error which is not the same as the
error it was comparing to (`ErrorMaxTransferLimitReached`).
This change fixes it by splitting the error definition in two so that
when the Fatal error is unwrapped we match against
`ErrorMaxTransferLimitReached` however when we return the error we
return `ErrorMaxTransferLimitReachedFatal`.
In bde0334bd8 "operations: fix setting the timestamp on Windows
for multithread copy" the test for multithread copy failed to take
into account the modify window of the remote under test.
Before this fix we attempted to set the modification time on the file
when it was open. This works fine on Linux but not on Windows. The
test was also incorrect testing the source file rather than the
destination file.
This closes the file before setting the modification time and fixes
the tests.
Fixes#3994
Before this changed we unconditionally fetched the MimeType. On Some
backends like s3 and swift this takes an extra transaction which meant
that `lsf` on those backends was needlessly slow.
This adds an internal option so `lsf` can declare whether it wants
MimeTypes or not depending on whether the user asked for them and an
external flag `--no-mimetype` for `lsjson`.
See: https://forum.rclone.org/t/reliably-setup-incremental-updates/14006/8
Statistics of ransfers which were interrupted are not cleared before
retry iteration. These transfers completed with over 100 percentage.
This change clears transfer accounting before next retry iteration is
done in order to keep numbers in track.
Fixes#3861
For few commands, RClone counts a error multiple times. This was fixed by
creating a new error type which keeps a flag to remember if the error has
already been counted or not. The CountError function now wraps the original
error eith the above new error type and returns it.
Before this change --update would transfer any file which was newer
than the destination regardless of whether it had changed or not.
This is needlessly wasteful of bandwidth.
After this change --update will only transfer files if they are newer
**and** they are different (checked with checksum and size).
'Couldn't find file - Need to Transfer' changed to 'Need to transfer -
File Not Found at Destination' because while reading the debug logs, it
confuses with failure in operation.
Before this change if -u/--update was in effect we compared the size
of the files to see if the transfer should go ahead. This was
comparing -1 with an actual size so the transfer always proceeded.
After this change we use the existing `sizeDiffers` function which
does the correct comparison with -1 for files of unknown length.
See: https://forum.rclone.org/t/sync-with-google-photos-to-local-drive-will-result-in-recoping/11605
Before this change we didn't calculate or check hashes of transferred
files if --size-only mode was explicitly set.
This problem was introduced in 20da3e6352 which was released with v1.37
After this change hashes are checked for all transfers unless
--ignore-checksums is set.
Before this change we calculated the checkums when using
--ignore-checksum but ignored them at the end.
Now we don't calculate the checksums at all which is more efficient.
Before this change for a post copy Hash check we would run the hashes sequentially.
Now we run the hashes in parallel for a useful speedup.
Note that this refactors the hash check in Copy to use the standard
hash checking routine.
Before this fix rclone calculated all the hashes on transfer. This
was particularly slow for the local backend.
After the fix we just calculate one hash which is enough for data
integrity.
This was factored from fstest as we were including the testing
enviroment into the main binary because of it.
This was causing opening the browser to fail because of 8243ff8bc8.
Introduce stats groups that will isolate accounting for logically
different transferring operations. That way multiple accounting
operations can be done in parallel without interfering with each other
stats.
Using groups is optional. There is dedicated global stats that will be
used by default if no group is specified. This is operating mode for CLI
usage which is just fire and forget operation.
For running rclone as rc http server each request will create it's own
group. Also there is an option to specify your own group.
This is done to make clear ownership over accounting object and prepare
for removing global stats object.
Stats elapsed time calculation has been altered to account for actual
transfer time instead of stats creation time.
We now have a backend (fichier) which doesn't support 0 length
files. Therefore all 0 length files in the tests have been replaced
with length 1.
In a future commit we will implement a test for 0 length files.
- Change rclone/fs interfaces to accept context.Context
- Update interface implementations to use context.Context
- Change top level usage to propagate context to lover level functions
Context propagation is needed for stopping transfers and passing other
request-scoped values.
Before this change, when using rclone copy or move with --backup-dir
and the source was a single file, rclone would fail to use the backup
directory.
This change looks up the backup directory in the Fs cache and uses it
as appropriate.
This affects any commands which call operations.MoveFile or
operations.CopyFile which includes rclone move/moveto/copy/copyto
where the source is a single file.
Fixes#3219
Before this change we calculated the checksum which is potentially
time consuming and then ignored the result. After the change we don't
calculate the checksum if we are about to ignore it.
In c5ac96e9e7 we made --files-from only read the objects specified and
don't scan directories.
This caused problems with Google drive (very very slow) and B2
(excessive API consumption) so it was decided to make the old
behaviour (traversing the directories) the default with --files-from
and use the existing --no-traverse flag (which has exactly the right
semantics) to enable the new non scanning behaviour.
See: https://forum.rclone.org/t/using-files-from-with-drive-hammers-the-api/8726Fixes#3102Fixes#3095
Before the fix we were only de-duping the ListR batches.
Afterwards we dedupe everything.
This will have the consequence that rclone uses more memory as it will
build a map of all the directory names, not just the names in a given
directory.
This dramatically increases the speed (7x in my tests) of the de-dupe
as google drive supports ListR directly and dedupe did not work with
`--fast-list`.
Fixes#2902
This will increase speed for backends which support ListR and will not
have the memory overhead of using --fast-list.
It also means that errors are queued until the end so as much of the
remote will be listed as possible before returning an error.
Commands affected are:
- lsf
- ls
- lsl
- lsjson
- lsd
- md5sum/sha1sum/hashsum
- size
- delete
- cat
- settier
This brings it up to par with lsjson.
This commit also reworks the framework to use ListJSON internally
which removes duplicated code and makes testing easier.
This puts a shim on the reader opened by Copy so that if an error is
returned, the reader is re-opened at the correct seek point.
This should make downloading very large files more reliable.
Before this change TestPurge would remove a container and subsequent
tests would fail because the container was still being deleted so
couldn't be created.
This was fixed by introducing an fstest.NewRunIndividual() test runner
for TestPurge which causes the test to be run on a new container.
Before this change, Purge on the fallback path would try to delete
directories starting from the root rather than the dir passed in.
Rmdirs would also attempt to delete the root.
Before this change using --files-from would scan all the directories
that the files could possibly be in causing rclone to do more work
that was necessary.
After this change, rclone constructs an in memory tree using the
--fast-list mechanism but from all of the files in the --files-from
list and without scanning any directories.
Any objects that are not found in the --files-from list are ignored
silently.
This mechanism is used for sync/copy/move (march) and all of the
listing commands ls/lsf/md5sum/etc (walk).
Before this change remotes without server side Move (eg swift, s3,
gcs) would not be able to rename files.
After it means nearly all remotes will be able to rename files on
rclone mount with the notable exceptions of b2 and yandex.
This changes checks to see if the remote can do Move or Copy then
calls `operations.Move` to do the actual move. This will do a server
side Move or Copy but won't download and re-upload the file.
It also checks to see if the destination exists first which avoids
conflicts or duplicates.
Fixes#1965Fixes#2569
--one-way argument will check that all files on source matches the files on detination,
but not the other way. For example files present on destination but not on source will not
trigger an error.
Fixes: #1526
* Implement about for:
* local, crypt, cache, drive, swift, hubic, onedrive, pcloud, dropbox
* Implement `--json` and `---full` flag for `rclone about`
* change About interface to return a Usage structure
* Remove operations.About as it is too thin an interface
* Implement Integration test
Relates to #1138 and #1564
This problem was introduced with eca99b33c0. It seems Box is the only
remote which converts time zones, so if you give it a GMT time zone,
it returns a PST time zone which represents the same instant.