Commit Graph

4 Commits

Author SHA1 Message Date
nielash
810644e873 bisync: add --resync-mode for customizing --resync - fixes #5681
Before this change, the path1 version of a file always prevailed during
--resync, and many users requested options to automatically select the winner
based on characteristics such as newer, older, larger, and smaller. This change
adds support for such options.

Note that ideally this feature would have been implemented by allowing the
existing `--resync` flag to optionally accept string values such as `--resync
newer`. However, this would have been a breaking change, as the existing flag
is a `bool` and it does not seem to be possible to have a `string` flag that
accepts both `--resync newer` and `--resync` (with no argument.) (`NoOptDefVal`
does not work for this, as it would force an `=` like `--resync=newer`.) So
instead, the best compromise to avoid a breaking change was to add a new
`--resync-mode CHOICE` flag that implies `--resync`, while maintaining the
existing behavior of `--resync` (which implies `--resync-mode path1`. i.e. both
flags are now valid, and either can be used without the other.

--resync-mode CHOICE

In the event that a file differs on both sides during a `--resync`,
`--resync-mode` controls which version will overwrite the other. The supported
options are similar to `--conflict-resolve`. For all of the following options,
the version that is kept is referred to as the "winner", and the version that
is overwritten (deleted) is referred to as the "loser". The options are named
after the "winner":

- `path1` - (the default) - the version from Path1 is unconditionally
considered the winner (regardless of `modtime` and `size`, if any). This can be
useful if one side is more trusted or up-to-date than the other, at the time of
the `--resync`.
- `path2` - same as `path1`, except the path2 version is considered the winner.
- `newer` - the newer file (by `modtime`) is considered the winner, regardless
of which side it came from. This may result in having a mix of some winners
from Path1, and some winners from Path2. (The implementation is analagous to
running `rclone copy --update` in both directions.)
- `older` - same as `newer`, except the older file is considered the winner,
and the newer file is considered the loser.
- `larger` - the larger file (by `size`) is considered the winner (regardless
of `modtime`, if any). This can be a useful option for remotes without
`modtime` support, or with the kinds of files (such as logs) that tend to grow
but not shrink, over time.
- `smaller` - the smaller file (by `size`) is considered the winner (regardless
of `modtime`, if any).

For all of the above options, note the following:
- If either of the underlying remotes lacks support for the chosen method, it
will be ignored and will fall back to the default of `path1`. (For example, if
`--resync-mode newer` is set, but one of the paths uses a remote that doesn't
support `modtime`.)
- If a winner can't be determined because the chosen method's attribute is
missing or equal, it will be ignored, and bisync will instead try to determine
whether the files differ by looking at the other `--compare` methods in effect.
(For example, if `--resync-mode newer` is set, but the Path1 and Path2 modtimes
are identical, bisync will compare the sizes.) If bisync concludes that they
differ, preference is given to whichever is the "source" at that moment. (In
practice, this gives a slight advantage to Path2, as the 2to1 copy comes before
the 1to2 copy.) If the files _do not_ differ, nothing is copied (as both sides
are already correct).
- These options apply only to files that exist on both sides (with the same
name and relative path). Files that exist *only* on one side and not the other
are *always* copied to the other, during `--resync` (this is one of the main
differences between resync and non-resync runs.).
- `--conflict-resolve`, `--conflict-loser`, and `--conflict-suffix` do not
apply during `--resync`, and unlike these flags, nothing is renamed during
`--resync`. When a file differs on both sides during `--resync`, one version
always overwrites the other (much like in `rclone copy`.) (Consider using
`--backup-dir` to retain a backup of the losing version.)
- Unlike for `--conflict-resolve`, `--resync-mode none` is not a valid option
(or rather, it will be interpreted as "no resync", unless `--resync` has also
been specified, in which case it will be ignored.)
- Winners and losers are decided at the individual file-level only (there is
not currently an option to pick an entire winning directory atomically,
although the `path1` and `path2` options typically produce a similar result.)
- To maintain backward-compatibility, the `--resync` flag implies
`--resync-mode path1` unless a different `--resync-mode` is explicitly
specified. Similarly, all `--resync-mode` options (except `none`) imply
`--resync`, so it is not necessary to use both the `--resync` and
`--resync-mode` flags simultaneously -- either one is sufficient without the
other.
2024-01-20 17:17:01 -05:00
nielash
b4216648e4 bisync: full support for comparing checksum, size, modtime - fixes #5679 fixes #5683 fixes #5684 fixes #5675
Before this change, bisync could only detect changes based on modtime, and
would refuse to run if either path lacked modtime support. This made bisync
unavailable for many of rclone's backends. Additionally, bisync did not account
for the Fs's precision when comparing modtimes, meaning that they could only be
reliably compared within the same side -- not against the opposite side. Size
and checksum (even when available) were ignored completely for deltas.

After this change, bisync now fully supports comparing based on any combination
of size, modtime, and checksum, lifting the prior restriction on backends
without modtime support. The comparison logic considers the backend's
precision, hash types, and other features as appropriate.

The comparison features optionally use a new --compare flag (which takes any
combination of size,modtime,checksum) and even supports some combinations not
otherwise supported in `sync` (like comparing all three at the same time.) By
default (without the --compare flag), bisync inherits the same comparison
options as `sync` (that is: size and modtime by default, unless modified with
flags such as --checksum or --size-only.) If the --compare flag is set, it will
override these defaults.

If --compare includes checksum and both remotes support checksums but have no
hash types in common with each other, checksums will be considered only for
comparisons within the same side (to determine what has changed since the prior
sync), but not for comparisons against the opposite side. If one side supports
checksums and the other does not, checksums will only be considered on the side
that supports them. When comparing with checksum and/or size without modtime,
bisync cannot determine whether a file is newer or older -- only whether it is
changed or unchanged. (If it is changed on both sides, bisync still does the
standard equality-check to avoid declaring a sync conflict unless it absolutely
has to.)

Also included are some new flags to customize the checksum comparison behavior
on backends where hashes are slow or unavailable. --no-slow-hash and
--slow-hash-sync-only allow selectively ignoring checksums on backends such as
local where they are slow. --download-hash allows computing them by downloading
when (and only when) they're otherwise not available. Of course, this option
probably won't be practical with large files, but may be a good option for
syncing small-but-important files with maximum accuracy (for example, a source
code repo on a crypt remote.) An additional advantage over methods like
cryptcheck is that the original file is not required for comparison (for
example, --download-hash can be used to bisync two different crypt remotes with
different passwords.)

Additionally, all of the above are now considered during the final --check-sync
for much-improved accuracy (before this change, it only compared filenames!)

Many other details are explained in the included docs.
2024-01-20 16:08:06 -05:00
nielash
57624629d6 bisync: account for differences in backend features on integration tests - see #5679
Before this change, integration tests often could not be run on backends with
differing features from the local system that goldenized them. In particular,
differences in modtime precision, checksum support, and encoding would cause
false positives. After this change, the tests more accurately account for the
features of the backend being tested, which allows us to see true positives
more clearly, and more meaningfully assess whether a backend is supported.
2024-01-20 14:50:08 -05:00
nielash
422b037087 bisync: fallback to cryptcheck or --download when can't check hash
Bisync checks file equality before renaming sync conflicts by comparing
checksums. Before this change, backends without checksum support (notably
Crypt) would fall back to --size-only for these checks, which is not a very
safe method (differing files can sometimes have the same size, especially if
they're small.) After this change, Crypt remotes fallback to using Cryptcheck
so that checksums can be compared. As a last resort when neither Check nor
Cryptcheck are available, files are compared using --download so that we can be
certain the files are identical regardless of checksum support.
2024-01-20 14:50:08 -05:00