When doing a multipart upload or copy, if a InvalidBlobOrBlock error
is received, it can mean that there are uncomitted blocks from a
previous failed attempt with a different length of ID.
This patch makes rclone attempt to clear the uncomitted blocks and
retry if it receives this error.
This implements multipart server side copy to improve copying from one
azure region to another by orders of magnitude (from 30s for a 100M
file to 10s for a 10G file with --azureblob-upload-concurrency 500).
- Add `--azureblob-copy-cutoff` to control the cutoff from single to multipart copy
- Add `--azureblob-copy-concurrency` to control the copy concurrency
- Add ServerSideAcrossConfigs flag as this now works properly
- Implement multipart copy using put block list API
- Shortcut multipart copy for same storage account
- Override with `--azureblob-use-copy-blob`
Fixes#8249
This speeds up server side copies for small files which need the check
the copy status by using an exponential ramp up of time to check the
copy status endpoint.
Before this change, if a multipart upload was aborted, then rclone
would leave uncommitted blocks lying around. Azure has a limit of
100,000 uncommitted blocks per storage account, so when you then try
to upload other stuff into that account, or simply the same file
again, you can run into this limit. This causes errors like the
following:
BlockCountExceedsLimit: The uncommitted block count cannot exceed the
maximum limit of 100,000 blocks.
This change removes the uncommitted blocks if a multipart upload is
aborted or fails.
If there was an existing destination file, it takes care not to
overwrite it by recomitting already comitted blocks.
This means that the scheme for allocating block IDs had to change to
make them different for each block and each upload.
Fixes#5583
`remote` has been converted ToStandardPath a few lines above, so `directory`
needs to be converted the same way in order to be compared properly. This was
spotted on `TestBisyncRemoteRemote/extended_filenames` for
`TestS3,directory_markers:` and `TestGoogleCloudStorage,directory_markers:`
which tripped over a directory name containing a Line Feed symbol.
It was reported that rclone copy occasionally uploaded corrupted data
to azure blob.
This turned out to be a race condition updating the block count which
caused blocks to be duplicated.
This bug was introduced in this commit in v1.64.0 and will be fixed in v1.65.2
0427177857 azureblob: implement OpenChunkWriter and multi-thread uploads #7056
This race only seems to happen if `--checksum` is used but can happen otherwise.
Unfortunately Azure blob does not check the MD5 that we send them so
despite sending incorrect data this corruption is not detected. The
corruption is detected when rclone tries to download the file, so
attempting to copy the files back to local disk will result in errors
such as:
ERROR : file.pokosuf5.partial: corrupted on transfer: md5 hash differ "XXX" vs "YYY"
This adds a check to test the blocklist we upload is as we expected
which would have caught the problem had it been in place earlier.
Before this change the concurrency used for an upload was rather
inconsistent.
- if size below `--backend-upload-cutoff` (default 200M) do single part upload.
- if size below `--multi-thread-cutoff` (default 256M) or using streaming
uploads (eg `rclone rcat) do multipart upload using
`--backend-upload-concurrency` to set the concurrency used by the uploader.
- otherwise do multipart upload using `--multi-thread-streams` to set the
concurrency.
This change makes the default for the concurrency used be the
`--backend-upload-concurrency`. If `--multi-thread-streams` is set and larger
than the `--backend-upload-concurrency` then that will be used instead.
This means that if the user sets `--backend-upload-concurrency` then it will be
obeyed for all multipart/multi-thread transfers and the user can override them
all with `--multi-thread-streams`.
See: #7056
This implements the OpenChunkWriter interface for azureblob which
enables multi-thread uploads.
This makes the memory controls of the s3 backend inoperative; they are
replaced with the global ones.
--azureblob-memory-pool-flush-time
--azureblob-memory-pool-use-mmap
By using the buffered reader this fixes excessive memory use when
uploading large files as it will share memory pages between all
readers.
This introduces a new fs.Option flag, Sensitive and uses this along
with IsPassword to redact the info in the config file for support
purposes.
It adds this flag into backends where appropriate. It was necessary to
add oauthutil.SharedOptions to some backends as they were missing
them.
Fixes#5209
Before this change we were incorrectly identifying the root directory
of the listing and adding it into the listing.
This caused higher layers of rclone to emit the error above.
See #7038
This fixes the azureblob backend so it builds again after the SDK
changes.
This doesn't update bazil.org/fuse because it doesn't build on FreeBSD
https://github.com/bazil/fuse/issues/295
This error was caused by rclone supplying an empty
`x-ms-blob-public-access:` header when creating a container for
private access, rather than omitting it completely.
This is a valid way of specifying containers should be private, but if
the storage account has the flag "Blob public access" unset then it
gives "409 Public access is not permitted on this storage account".
This patch fixes the problem by only supplying the header if the
access is set.
Fixes#6645
This patch implements --use-server-modtime for the Azureblob backend.
It does this by not reading the time from the metadata if the global
flag is set.
When the SDK was upgraded it started delivering metadata where the
keys were not in lower case as per the old SDK.
Rclone normalises the case of the keys for storage in the Object, but
the directory marker check was being done with the unnormalised keys
as it needs to be done before the Object is created.
This fixes the directory marker check to do a case insensitive compare
of the metadata keys.
The updates the authentication to include
- Auth from the environment
1. Environment Variables
2. Managed Service Identity Credentials
3. Azure CLI credentials (as used by the az tool)
- Account and Shared Key
- SAS URL
- Service principal with client secret
- Service principal with certificate
- User with username and password
- Managed Service Identity Credentials
And rationalises the auth order.
Normally rclone will check the container exists before uploading if it
hasn't listed the container yet.
Often rclone will be running with a limited set of permissions which
means rclone can't create the container anyway, so this stops the
check.
This will save a transaction.
This commit switches from using the old Azure go modules
github.com/Azure/azure-pipeline-go/pipeline
github.com/Azure/azure-storage-blob-go/azblob
github.com/Azure/go-autorest/autorest/adal
To the new SDK
github.com/Azure/azure-sdk-for-go/
This stops rclone using deprecated code and enables the full range of
authentication with Azure.
See #6132 and #5284
Before this fix, the chunksize calculator was using the previous size
of the object, not the new size of the object to calculate the chunk
sizes.
This meant that uploading a replacement object which needed a new
chunk size would fail, using too many parts.
This fix fixes the calculator to take the size explicitly.