Before this fix, if uploading to a union consisting of all bucket
based remotes (eg s3), uploads failed with:
Failed to copy: object not found
This was because the union backend was relying on parent directories
being created to work out which files to upload. If all the upstreams
were bucket based backends which can't hold empty directories, no
directories were created and the upload failed.
This fixes the problem by returning the upstreams used when creating
the directory for the upload, rather than searching for them again
after they've been created.
This will also make the union backend a little more efficient.
Fixes#6170
strings.ReplaceAll(s, old, new) is a wrapper function for
strings.Replace(s, old, new, -1). But strings.ReplaceAll is more
readable and removes the hardcoded -1.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
The "relative" argument was missing when Put'ing a file. This
sets an incorrect object entry in the cache, leading to the file being
unreadable when using mount functionality.
Fixes#6151
Before this change rclone used presigned requests to upload single
part objects. This was because of a limitation in the SDK which didn't
allow non seekable io.Readers to be passed in.
This is incompatible with some S3 backends, and rclone wasn't adding
the `X-Amz-Content-Sha256: UNSIGNED-PAYLOAD` header which was
incompatible with other S3 backends.
The SDK now allows for this so rclone can use PutObject directly.
This sets the `X-Amz-Content-Sha256: UNSIGNED-PAYLOAD` flag on the PUT
request. However rclone will add a `Content-Md5` header if at all
possible so the body data is still protected.
Note that the old behaviour can still be configured if required with
the `use_presigned_request` config parameter.
Fixes#5422
Uses b2_list_file_versions to retrieve all file versions, and returns
the one that was active at the specified time
This is especially useful in combination with other backup tools, such
as restic, which may use rclone as a backend.
Jottacloud have several different apis and endpoints using a mix of different
timestamp formats. In existing code the list operation (after the recent liststream
implementation) uses format from golang's time.RFC3339 constant. Uploads (using the
allocate api) uses format from a hard coded constant with value identical to golang's
time.RFC3339. And then we have the classic JFS time format, which is similar to RFC3339
but not identical, using a different constant. Also the naming is a bit confusing,
since the term api is used both as a generic term and also as a reference to the
newer format used in the api subdomain where the allocate endpoint is located.
This commit refactors these things a bit.
Now using the utility function for deduplication that was newly implemented to
fix an issue with server-side copy. This function uses the original, and generic,
"jfs" api (and its "cphash" feature), instead of the newer "allocate" api dedicated
for uploads. Both apis support similar deduplication functionaly that we rely on for
the SetModTime operation. One advantage of using the jfs variant is that the allocate
api is specialized for uploads, an initial request performs modtime-only changes and
deduplication if possible but if not possible it creates an incomplete file revision
and returns a special url to be used with a following request to upload missing content.
In the SetModTime function we only sent the first request, using metadata from existing
remote file but different timestamps, which lead to a modtime-only change. If, for some
reason, this should fail it would leave the incomplete revision behind. Probably not
a problem, but the jfs implementation used with this commit is simpler and
a more "standalone" request which either succeeds or fails without expecting additional
requests.
A strange feature (probably bug) in the api used by the server-side copy implementation
in Jottacloud backend is that if the destination file is in trash, the copy request
succeeds but the destination will still be in trash! When this situation occurs in
rclone, the copy command will fail with "Failed to copy: object not found" because
rclone verifies that the file info in the response from the copy request is valid,
and since it is marked as deleted it is treated as invalid.
This commit works around this problem by looking for this situation in the response
from the copy operation, and send an additional request to a built-in deduplication
endpoint that will restore the file from trash.
Fixes#6112
The existing code in rclone set the value "offline_access+openid",
when encoded in body it will become "offline_access%2Bopenid". I think
this is wrong. Probably an artifact of "double urlencoding" mixup -
either in rclone or in the jottacloud cli tool version it was sniffed
from? It does work, though. The token received will have scopes "email
offline_access" in it, and the same is true if I change to only
sending "offline_access" as scope.
If a proper space delimited list of "offline_access openid"
is used in the request, the response also includes openid scope:
"openid email offline_access". I think this is more correct and this
patch implements this.
See: #6107
Adds a configuration option to the GCS backend to allow skipping the
check if a bucket exists before copying an object to it, much like
f406dbb added for S3.
Before this change the cache backend was passing -1 into
rate.NewLimiter to mean unlimited transactions per second.
In a recent update this immediately returns a rate limit error as
might be expected.
This patch uses rate.Inf as indicated by the docs to signal no limits
are required.
Before this change the 206 responses from putio Range requests were being
returned as errors.
This change checks for 200 and 206 in the GET response now.
Before this fix, rclone retries chunks of multipart uploads. However
if they had been partially received dropbox would reply with an
incorrect_offset error which rclone was ignoring.
This patch parses the new offset from the error response and uses it
to adjust the data that rclone sends so it is the same as what dropbox
is expecting.
See: https://forum.rclone.org/t/dropbox-rate-limiting-for-upload/29779
This commit switches Google Cloud Storage from the drive pacer to the
s3 pacer. The main difference between them is that the s3 pacer does
not limit transactions in the non-error case. This is appropriate for
a cloud storage backend where you pay for each transaction.
Before this fix `NewObject` could return a wrapped `fs.Object(nil)`
which caused a crash. This was caused by `wrapObject` returning a
`nil` `*Object` which was cast into an `fs.Object`.
This changes the interface of `wrapObject` so it returns an
`fs.Object` instead of a `*Object` and an error which must be checked.
This forces the callers to return a `nil` object rather than an
`fs.Object(nil)`.
See: https://forum.rclone.org/t/panic-in-hasher-when-mounting-with-vfs-cache-and-not-synced-data-in-the-cache/29697/11
Having a replace directive in go.mod causes "go get
github.com/rclone/rclone" to fail as it discussed in this Go issue:
https://github.com/golang/go/issues/44840
This is apparently how the Go team want go.mod to work, so this commit
hard forks github.com/jlaffaye/ftp into github.com/rclone/ftp so we
can remove the `replace` directive from the go.mod file.
Fixes#5810
Before this change rclone send pre-1970 timestamps as negative
numbers. pCloud ignores these and sets them as todays date.
This change sends the timestamps as unsigned 64 bit integers (which is
how the binary protocol sends them) and pCloud accepts the (actually
negative) timestamp like this.
Before this change the new multipart upload ETag checking code was
failing in the integration tests with Alibaba OSS.
Apparently Alibaba calculate the ETag in a different way to AWS.
This introduces a new provider quirk with a flag to disable the
checking of the ETag for multipart uploads.
Mulpart Etag checking has been enabled for all providers that we can
test for and work, and left disabled for the others.
Before this rclone ignored the ETag on multipart uploads which missed
an opportunity for a whole file integrity check.
This adds that check which means that we now check even harder that
multipart uploads have arrived properly.
See #5993
Before this change `rclone about swift:container` would show aggregate
info about all the containers, not just the one in use.
This causes a problem if container listing is disabled (for example in
the Blomp service).
This fix makes `rclone about swift:container` show only the info about
the given `container`. If aggregate info about all the containers is
required then use `rclone about swift:`.
See: https://forum.rclone.org/t/rclone-mount-blomp-problem/29151/18
Before this change, rclone supported authorizing for remote systems by
going to a URL and cutting and pasting a token from Google. This is
known as the OAuth out-of-band (oob) flow.
This, while very convenient for users, has been shown to be insecure
and has been deprecated by Google.
https://developers.googleblog.com/2022/02/making-oauth-flows-safer.html#disallowed-oob
> OAuth out-of-band (OOB) is a legacy flow developed to support native
> clients which do not have a redirect URI like web apps to accept the
> credentials after a user approves an OAuth consent request. The OOB
> flow poses a remote phishing risk and clients must migrate to an
> alternative method to protect against this vulnerability. New
> clients will be unable to use this flow starting on Feb 28, 2022.
This change disables that flow, and forces the user to use the
redirect URL flow. (This is the flow used already for local configs.)
In practice this will mean that instead of cutting and pasting a token
for remote config, it will be necessary to run "rclone authorize"
instead. This is how all the other OAuth backends work so it is a well
tested code path.
Fixes#6000
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.
Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Before this change a multipart upload with the --no-head flag returned
the MD5SUM as a base64 string rather than a Hex string as the rest of
rclone was expecting.
Before this change attempting NewObject on a SAS URL's root would
crash the Azure SDK.
This change detects that using the code from this previous fix
f7404f52e7 azureblob: fix crash when listing outside a SAS URL's root - fixes#4851
And returns not object not found instead.
It also prevents things being uploaded to the root of the SAS URL
which also crashes the Azure SDK.
Before this fix if a file was updated, but to the same length and
timestamp then the local backend would return the wrong (cached)
hashes for the object.
This happens regularly on a crypted local disk mount when the VFS
thinks files have been changed but actually their contents are
identical to that written previously. This is because when files are
uploaded their nonce changes so the contents of the file changes but
the timestamp and size remain the same because the file didn't
actually change.
This causes errors like this:
ERROR: file: Failed to copy: corrupted on transfer: md5 crypted
hash differ "X" vs "Y"
This turned out to be because the local backend wasn't clearing its
cache of hashes when the file was updated.
This fix clears the hash cache for Update and Remove.
It also puts a src and destination in the crypt message to make future
debugging easier.
Fixes#4031
Currently the B2 docs don't specify which format the download_url
setting should have, and if you input it wrong, there is nothing
in the verbose logs or anywhere else that can let you know that.
* Wasabi starts to provide AP Northeast 2 (Osaka) endpoint, so add it to the list
* Rename ap-northeast-1 as "AP Northeast 1 (Tokyo)" from "AP Northeast"
Signed-off-by: lindwurm <lindwurm.q@gmail.com>
After speed testing it was discovered that upload speed goes up pretty
much linearly with upload concurrency. This patch changes the default
from 4 to 16 which means that rclone will use 16 * 4M = 64M per
transfer which is OK even for low memory devices.
This adds a note that performance may be increased by increasing
upload concurrency.
See: https://forum.rclone.org/t/performance-of-rclone-vs-azcopy/27437/9
Previously only the fs being checked on gets passed to
GetModifyWindow(). However, in most tests, the test files are
generated in the local fs and transferred to the remote fs. So the
local fs time precision has to be taken into account.
This meant that on Windows the time tests failed because the
local fs has a time precision of 100ns. Checking remote items uploaded
from local fs on Windows also requires a modify window of 100ns.
This is possible now that we no longer support go1.12 and brings
rclone into line with standard practices in the Go world.
This also removes errors.New and errors.Errorf from lib/errors and
prefers the stdlib errors package over lib/errors.
This removes the checks against the provider throughout the code and
puts them into a single setQuirks function for easy maintenance when
adding a new provider.
It also updates the quirks with the results of testing against
backends we have access to.
This also adds a list_url_encode parameter so that quirk can be
manually set.
This implements a quirks system for providers and notes which
providers we have tested to support ListObjectsV2.
For those providers which don't support ListObjectsV2 we use the
original ListObjects call.
The API doesn't seem to accept a value of "0" any more for the root
directory ID, giving the error "Could not decode folder id".
However omitting it seems to work fine.
In this commit, released in 1.56.0 we started reading the size of the
object from the Content-Length header as returned by the GET request
to read the object.
4401d180aa s3: add --s3-no-head-object
However some object storage systems, notably Ceph, don't return a
Content-Length header.
The new code correctly calls the setMetaData function with a nil
pointer to the ContentLength.
However due to this commit from 2014, released in v1.18, the
setMetaData function was not ignoring the size as it should have done.
0da6f24221 s3: use official github.com/aws/aws-sdk-go including multipart upload #101
This commit correctly ignores the content length if not set.
Fixes#5732
Before this change the `shared_credentials_file` config option was
being ignored.
The correct value is passed into the SDK but it only sets the
credentials in the default provider. Unfortunately we wipe the default
provider in order to install our own chain if env_auth is true.
This patch restores the shared credentials file in the session
options, exactly the same as how we restore the profile.
Original fix:
1605f9e14d s3: Fix shared_credentials_file auth
This patch reverts this commit
1605f9e14d s3: Fix shared_credentials_file auth
It unfortunately had the side effect of making the s3 SDK ignore the
config in our custom chain and use the default provider. This means
that advanced auth was being ignored such as --s3-profile with
role_arn.
Fixes#5468Fixes#5762
The API has changed in the directory move call JSON response from
returning a TaskID as a string to returning it as an integer. In other
places it is still returned as a string though.
This patch allows the TaskID to be an integer or a string in the JSON
response and keeps it internally as a string like before.
Before this change the backoff for the error_background error was 6
seconds. This means that if it wasn't resolved in 60 seconds (with the
default 10 low level retries) then an error was reported.
This error was being reported frequently in the integration tests, so
is likely affecting real users too.
This patch changes the backoff into an exponential backoff
1,2,4,8...1024 seconds to make sure we wait long enough for the
background operation to complete.
See #5734
- setup correct path encoding (fixes backend test FsEncoding)
- ignore range option if file is empty (fixes VFS test TestFileReadAtZeroLength)
- cleanup stray files left after failed upload (fixes test FsPutError)
- rebase code on master, adapt backend for rclone context passing
- translate Siad errors to rclone native FS errors in sia errorHandler
- TestSia: return proper backend options from the script
- TestSia: use uptodate AntFarm image, nebulouslabs/siaantfarm is stale
In
05f128868f azureblob: add --azureblob-no-head-object
we incorrectly parsed the size of the object as the Content-Length of
the returned header. This is incorrect in the presense of Range
requests.
This fixes the problem by parsing the Content-Range header if
avaialble to read the correct length from if a Range request was
issued.
See: #5734
This reverts commit
dc06973796 Revert "s3: use rclone's low level retries instead of AWS SDK to fix listing retries"
Which in turn reverted
5470d34740 "backend/s3: use low-level-retries as the number of SDK retries"
So we are back where we started.
It then modifies it to set the AWS SDK to `--low-level-retries`
retries, but set the rclone retries to 2 so that directory listings
can be retried.
Before this change the cleanup routine exited on the first deletion
error.
This change counts any errors on deletion and exits when the iteration
is complete with an error showing the number of deletion failures.
Deletion failures will be logged.
Before this change we uses limit/offset paging for directories in the
main directory listing routine and in the trash cleanup listing.
This switches to the new scheme of limit/marker which is more reliable
on a directory which is continuously changing. It has the disadvantage
that it doesn't tell us the total number of items available, however
that wasn't information rclone uses.
This changes the interface to NewObject so that if NewObject is called
on a directory then it should return fs.ErrorIsDir if possible without
doing any extra work, otherwise fs.ErrorObjectNotFound.
Tested on integration test server with:
go run integration-test.go -tests backend -run TestIntegration/FsMkdir/FsPutFiles/FsNewObjectDir -branch fix-stat -maxtries 1
The egress charges while using a CloudFront CDN url is cheaper when
compared to accessing the file directly from S3. So added a download
URL advanced option, which when set downloads the file using it.
Before this patch the md5all option would skip creating metadata with
hashsum if base filesystem provided md5, in hope to pass it through.
However, if base hash is slow (for example on local fs), chunker passed
slow md5 but never reported this fact in features.
This patch makes chunker snapshot base hashsum in metadata when md5all is
set and base hashsum is slow since chunker was intended to provide only
instant hashsums from the start.
Fixes#5508
Before this change, when uploading to a crypt, the ObjectInfo
accidentally used the encrypted size, not the unencrypted size when
--crypt-no-data-encryption was set.
Fixes#5498
In presence of no_data_encryption the Crypt's Put method used to over-optimize
and returned base object. This patch makes it return Crypt-wrapped object now.
Fixes#5498
I discovered that `rclone` always upload in chunks of 16MiB whenever
uploading a file smaller than `--drive-upload-cutoff`. This is
undesirable since the purpose of the flag `--drive-upload-cutoff` is
to *prevent* chunking below a certain file size.
I realized that it wasn't `rclone` forcing the 16MiB chunks. The
`google-api-go-client` forces a chunk size default of
[`googleapi.DefaultUploadChunkSize`](32bf29c2e1/googleapi/googleapi.go (L55-L57))
bytes for resumable type uploads. This means that all requests that
use `*drive.Service` directly for upload without specifying a
`googleapi.ChunkSize` will be forced to use a *`resumable`*
`uploadType` (rather than `multipart`) for files less than
`googleapi.DefaultUploadChunkSize`. This is also noted directly in the
Drive API client documentation [here](https://pkg.go.dev/google.golang.org/api/drive/v3@v0.44.0#FilesUpdateCall.Media).
This fixes the problem by passing `googleapi.ChunkSize(0)` to
`Media()` method calls, which is the only way to disable chunking
completely. This is mentioned in the API docs
[here](https://pkg.go.dev/google.golang.org/api/googleapi@v0.44.0#ChunkSize).
The other alternative would be to pass
`googleapi.ChunkSize(f.opt.ChunkSize)` -- however, I'm *strongly* in
favor of *not* doing this for performance reasons. By not explicitly
passing a `googleapi.ChunkSize(0)`, we effectively allow
[`PrepareUpload()`](https://pkg.go.dev/google.golang.org/api/internal/gensupport@v0.44.0#PrepareUpload)
to create a
[`NewMediaBuffer`](https://pkg.go.dev/google.golang.org/api/internal/gensupport@v0.44.0#NewMediaBuffer)
that copies the original `io.Reader` passed to `Media()` in order to
check that its size is less than `ChunkSize`, which will unnecessarily
consume time and memory.
`minChunkSize` is also changed to be `googleapi.MinUploadChunkSize`,
as it is something specified we have no control over.
Google Drive API allows for clauses like "modifiedTime > '2012-06-04T12:00:00'"
in the query param, so the filter flags --max-age and --min-age can be applied
directly at the directory listing phase rather than in a filter.
This is extremely helpful when we want to do an incremental backup of a remote
drive with many files but the number of recently changed file is small.
Co-authored-by: fotile96 <fotile96@users.noreply.github.com>
This replaces built-in os.MkdirAll with a patched version that stops the recursion
when reaching the volume part of the path. The original version would continue recursion,
and for extended length paths end up with \\? as the top-level directory, and the error
message would then be something like:
mkdir \\?: The filename, directory name, or volume label syntax is incorrect.
Before this change the union's feature flags were a strict AND of the
underlying remotes. This means that a union of a local disk (which can
Move but not Copy) and a bucket based remote (which can Copy but not
Move) could neither Move nor Copy.
This fix advertises Move in the union if all the remotes can Move or
Copy. It also implements Move as Copy+Delete (like rclone does
normally) if the underlying union does not support Move.
This enables renames to work with unions of local disk and bucket
based remotes expected.
Fixes#5632
This was started in
3626f10f26 pcloud: add sha256 support - fixes#5496
But this support turned out to be incomplete and caused the
integration tests to fail.
After updating rclone's dependencies these tests started failing on
windows/386
- TestInternalDoubleWrittenContentMatches
- TestInternalMaxChunkSizeRespected
The failures look like this. The root cause is unknown. The `Wait(n=1)
would exceed context deadline` errors come from golang.org/x/time/rate
but it isn't clear what is calling them.
2021/08/20 21:57:16 ERROR : worker-0 <one>: object open failed 0: rate: Wait(n=1) would exceed context deadline
[snip ~10 duplicates]
2021/08/20 21:57:56 ERROR : tidwcm1629496636/one: (0/26) error (chunk not found 0) response
2021/08/20 21:58:02 ERROR : worker-0 <one>: object open failed 0: rate: Wait(n=1) would exceed context deadline
--- FAIL: TestInternalDoubleWrittenContentMatches (45.77s)
cache_internal_test.go:310:
Error Trace: cache_internal_test.go:310
Error: Not equal:
expected: "one content updated double"
actual : ""
Diff:
--- Expected
+++ Actual
@@ -1 +1 @@
-one content updated double
+
Test: TestInternalDoubleWrittenContentMatches
2021/08/20 21:58:03 original size: 23592960
2021/08/20 21:58:03 updated size: 12
In this commit the config system was re-arranged
94dbfa4ea fs: change Config callback into state based callback #3455
This passed the password as a temporary config parameter but forgot to
reveal it in the API call.
At some point some google docs files started having sizes returned in
their listing information.
This then caused rclone to treat the docs as files which caused
downloads to fail.
The API docs now state that google docs may have sizes (whereas I'm
pretty sure it didn't earlier).
This fix removes the check for size, so google docs are identified
solely by not having an MD5 checksum.