Before this change, we would upload files as single part uploads even
if the source MD5SUM was not available.
AWS won't let you upload a file to a locket bucket without some sort
of hash protection of the upload which we don't have with no MD5SUM.
So we switch to multipart upload when the source does not have an
MD5SUM.
This means that if --s3-disable-checksum is set or we are copying from
a source with no MD5SUMs we will copy with multipart uploads.
This patch changes all uploads, not just those to locked buckets
because having no MD5SUM protection on uploads is undesirable.
Fixes#6846
Before this change if an --s3-profile was set which used AWS STS (eg
to assume a role) and --s3-endpoint was set then rclone would use the
value from --s3-endpoint to contact the STS server which did not work.
This fix implements an endpoint resolver which only overrides the "s3"
service if --s3-endpoint is set. It sends the "sts" service (and any
other service) to the default resolver.
Fixes#6443
See: https://forum.rclone.org/t/s3-profile-failing-when-explicit-s3-endpoint-is-present/36063/
Before this change, all types of checkers showed "checking" after the
file name despite the fact that not all of them were checking.
After this change, they can show
- checking
- deleting
- hashing
- importing
- listing
- merging
- moving
- renaming
See: https://forum.rclone.org/t/what-is-rclone-checking-during-a-purge/35931/
Before this change if --s3-no-head was in use rclone didn't check the
multipart upload ETag at all. However the ETag is returned in the
final POST request when completing the object.
This change uses that ETag from the final POST if --s3-no-head is in
use, otherwise it uses the ETag from a fresh HEAD request.
See: https://forum.rclone.org/t/in-some-cases-rclone-does-not-use-etag-to-verify-files/36095/
Storj switched to a single global s3 endpoint backed by a BGP routing.
We want to stop advertizing the former regional endpoints and have the
global one as the only option.
Before this change, when a new object was created s3 returns its
versionID (on a versioned bucket) and rclone recorded it in the
object.
This means that when rclone came to delete the object it would delete
it with the versionID.
However it is common to forbid actions with versionIDs on buckets so
as to preserve the historical record and these operations would fail
whereas they succeeded in pre-v1.60.0 versions.
This patch fixes the problem by not recording versions of objects
supplied by the S3 API on upload unless `--s3-versions` or
`--s3-version-at` is used. This makes rclone behave as it did before
v1.60.0 when version support was introduced.
See: https://forum.rclone.org/t/s3-and-intermittent-403-errors-with-file-renames-and-drag-and-drop-operations-in-windows-explorer/34773
Before this change, we were taking the version ID straight from the
XML blob returned by the SDK and thus pinning the XML into memory
which bulked up the average memory per object from about 400 bytes to
4k.
Copying the string fixes the excess memory usage.
This reverts commit 4f386a1ccd.
It turns out that Alibaba OSS does support list v2 and the detection
code was wrong.
This means that users of the gov version of Alibaba will have to add
`list_version 1` to their config files.
See #6600
In this commit
ab849b3613 s3: fix listing loop when using v2 listing on v1 server
The ContinuationToken was tested for existence, but it is the
NextContinuationToken that we are interested in.
See: #6600
This was caused by
a9bd0c8de6 s3: reduce memory consumption for s3 objects
Which assumed that the StorageClass would always be set, but it isn't
set for Versions.
Before this change, rclone would enter a listing loop if it used v2
listing on a v1 server and the list exceeded 1000 items.
This change detects the problem and gives the user a helpful message.
Fixes#6600
Copying the storageClass string instead of using a pointer to the original string.
This prevents the Go garbage collector from keeping large amounts of
XMLNode structs and references in memory, created by xmlutil.XMLToStruct()
from the aws-sdk-go.
Before this change, some files were giving this error when downloaded
from Cloudflare and other providers.
ERROR corrupted on transfer: sizes differ NNN vs MMM
This is because these providers auto gzips the object when rclone
wasn't expecting it to. (AWS does not gzip objects without their being
uploaded gzipped).
This patch adds a quirk to for fix the problem and a flag to control
it. The quirk `might_gzip` is set to `true` for all providers except
AWS.
See: https://forum.rclone.org/t/s3-error-corrupted-on-transfer-sizes-differ-nnn-vs-mmm/33694/Fixes: #6533
Before this fix it was impossible to stop rclone generating an
X-Amx-Acl: header which is incompatible with GCS with uniform access
control and is generally deprecated at AWS.
The API endpoint GetBucketLocation requires
top level permission.
If we do an authenticated head request to a bucket, the bucket location will be returned in the HTTP headers.
Fixes#5066
Before this change if --user-server-modtime was in use the ModTime
could change for an object as we receive it accurate to the nearest ms
in listings, but only accurate to the nearest second in HEAD and GET
requests.
Normally AWS returns the milliseconds as .000 in listings, but if
versions are in use it may not. Storj S3 also seems to return
milliseconds.
This patch tries to keep the maximum precision in the last modified
time, so it doesn't update a last modified time with a truncated
version if the times were the same to the nearest second.
See: https://forum.rclone.org/t/cache-fingerprint-miss-behavior-leading-to-false-positive-stalen-cache/33404/
Before this fix, the chunksize calculator was using the previous size
of the object, not the new size of the object to calculate the chunk
sizes.
This meant that uploading a replacement object which needed a new
chunk size would fail, using too many parts.
This fix fixes the calculator to take the size explicitly.
Before this change, if an object compressed with "Content-Encoding:
gzip" was downloaded, a length and hash mismatch would occur since the
go runtime automatically decompressed the object on download.
If --s3-decompress is set, this change erases the length and hash on
compressed objects so they can be downloaded successfully, at the cost
of not being able to check the length or the hash of the downloaded
object.
If --s3-decompress is not set the compressed files will be downloaded
as-is providing compressed objects with intact size and hash
information.
See #2658
In
22abd785eb s3: implement reading and writing of metadata #111
The reading information of objects was refactored to use the
s3.HeadObjectOutput structure.
Unfortunately the code branch with `--s3-no-head` was not tested
otherwise this panic would have been discovered.
This shows that this is path is not integration tested, so this adds a
new integration test.
Fixes#6322
The SDK doesn't wrap errors in a Go standard way so they can't be
unwrapped and tested for - eg fatal error.
The code looks for a Serialization or RequestError and returns the
unwrapped underlying error if possible.
This fixes the fs/operations integration tests checking for fatal
errors being returned.
In this commit
e5974ac4b0 s3: use PutObject from the aws SDK to upload single part objects
rclone was made to upload objects to s3 using PUT requests rather than
using signed uploads.
However this change missed the fact that there is a supported way to
do this in the SDK using the SetStreamingBody method on the Request.
This therefore reverts a lot of the previous commit to do with making
an unsigned connection and other complication and uses the SDK
facility.
strings.ReplaceAll(s, old, new) is a wrapper function for
strings.Replace(s, old, new, -1). But strings.ReplaceAll is more
readable and removes the hardcoded -1.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Before this change rclone used presigned requests to upload single
part objects. This was because of a limitation in the SDK which didn't
allow non seekable io.Readers to be passed in.
This is incompatible with some S3 backends, and rclone wasn't adding
the `X-Amz-Content-Sha256: UNSIGNED-PAYLOAD` header which was
incompatible with other S3 backends.
The SDK now allows for this so rclone can use PutObject directly.
This sets the `X-Amz-Content-Sha256: UNSIGNED-PAYLOAD` flag on the PUT
request. However rclone will add a `Content-Md5` header if at all
possible so the body data is still protected.
Note that the old behaviour can still be configured if required with
the `use_presigned_request` config parameter.
Fixes#5422
Before this change the new multipart upload ETag checking code was
failing in the integration tests with Alibaba OSS.
Apparently Alibaba calculate the ETag in a different way to AWS.
This introduces a new provider quirk with a flag to disable the
checking of the ETag for multipart uploads.
Mulpart Etag checking has been enabled for all providers that we can
test for and work, and left disabled for the others.
Before this rclone ignored the ETag on multipart uploads which missed
an opportunity for a whole file integrity check.
This adds that check which means that we now check even harder that
multipart uploads have arrived properly.
See #5993
Before this change a multipart upload with the --no-head flag returned
the MD5SUM as a base64 string rather than a Hex string as the rest of
rclone was expecting.
* Wasabi starts to provide AP Northeast 2 (Osaka) endpoint, so add it to the list
* Rename ap-northeast-1 as "AP Northeast 1 (Tokyo)" from "AP Northeast"
Signed-off-by: lindwurm <lindwurm.q@gmail.com>
This is possible now that we no longer support go1.12 and brings
rclone into line with standard practices in the Go world.
This also removes errors.New and errors.Errorf from lib/errors and
prefers the stdlib errors package over lib/errors.
This removes the checks against the provider throughout the code and
puts them into a single setQuirks function for easy maintenance when
adding a new provider.
It also updates the quirks with the results of testing against
backends we have access to.
This also adds a list_url_encode parameter so that quirk can be
manually set.
This implements a quirks system for providers and notes which
providers we have tested to support ListObjectsV2.
For those providers which don't support ListObjectsV2 we use the
original ListObjects call.
In this commit, released in 1.56.0 we started reading the size of the
object from the Content-Length header as returned by the GET request
to read the object.
4401d180aa s3: add --s3-no-head-object
However some object storage systems, notably Ceph, don't return a
Content-Length header.
The new code correctly calls the setMetaData function with a nil
pointer to the ContentLength.
However due to this commit from 2014, released in v1.18, the
setMetaData function was not ignoring the size as it should have done.
0da6f24221 s3: use official github.com/aws/aws-sdk-go including multipart upload #101
This commit correctly ignores the content length if not set.
Fixes#5732
Before this change the `shared_credentials_file` config option was
being ignored.
The correct value is passed into the SDK but it only sets the
credentials in the default provider. Unfortunately we wipe the default
provider in order to install our own chain if env_auth is true.
This patch restores the shared credentials file in the session
options, exactly the same as how we restore the profile.
Original fix:
1605f9e14d s3: Fix shared_credentials_file auth
This patch reverts this commit
1605f9e14d s3: Fix shared_credentials_file auth
It unfortunately had the side effect of making the s3 SDK ignore the
config in our custom chain and use the default provider. This means
that advanced auth was being ignored such as --s3-profile with
role_arn.
Fixes#5468Fixes#5762
This reverts commit
dc06973796 Revert "s3: use rclone's low level retries instead of AWS SDK to fix listing retries"
Which in turn reverted
5470d34740 "backend/s3: use low-level-retries as the number of SDK retries"
So we are back where we started.
It then modifies it to set the AWS SDK to `--low-level-retries`
retries, but set the rclone retries to 2 so that directory listings
can be retried.
The egress charges while using a CloudFront CDN url is cheaper when
compared to accessing the file directly from S3. So added a download
URL advanced option, which when set downloads the file using it.
Before this change, rclone would always check the root to see if it
was an object.
This change doesn't check to see if the root is an object if the path
ends with a /
This avoids a transaction where rclone HEADs the path to see if it
exists.
See #4990
Includes adding support for additional size input suffix Mi and MiB, treated equivalent to M.
Extends binary suffix output with letter i, e.g. Ki and Mi.
Centralizes creation of bit/byte unit strings.
This code removes the code added in
15d19131bd s3: use aws web identity role provider
This code no longer works because it doesn't initialise the
tokenFetcher - leading to a nil pointer crash.
The proper way to initialise this is with the
NewWebIdentityCredentials but it isn't clear where to get the other
parameters: roleARN, roleSessionName, path.
In the linked issue a user reports rclone working with EKS anyway, so
perhaps this code is no longer needed.
If it is needed, hopefully someone who knows AWS better will come
along and fix it!
See: https://forum.rclone.org/t/add-support-for-aws-sso/23569