Commit Graph

78 Commits

Author SHA1 Message Date
Nick Craig-Wood
7e6fac8b1e s3: re-implement multipart upload to fix memory issues
There have been quite a few reports of problems with the multipart
uploader using too much memory and not retrying possible errors.

Before this change the multipart uploader used the s3manager
abstraction in the AWS SDK.  There are numerous bug reports of this
using up too much memory.

This change re-implements a much simplified version of the s3manager
code specialized for rclone's purposes.

This should use much less memory and retry chunks properly.

See: https://forum.rclone.org/t/memory-usage-s3-alike-to-glacier-without-big-directories/13563
See: https://forum.rclone.org/t/copy-from-local-to-s3-has-high-memory-usage/13405
See: https://forum.rclone.org/t/big-file-upload-to-s3-fails/13575
2020-01-03 22:19:28 +00:00
Thomas Kriechbaumer
584e705c0c s3: introduce list_chunk option for bucket listing
The S3 ListObject API returns paginated bucket listings, with
"MaxKeys" items for each GET call.

The default value is 1000 entries, but for buckets with millions of
objects it might make sense to request more elements per request, if
the backend supports it. This commit adds a "list_chunk" option for
the user to specify a lower or higher value.

This commit does not add safe guards around this value - if a user
decides to request a too large list, it might result in connection
timeouts (on the server or client).

In AWS S3, there is a fixed limit of 1000, some other services might
have one too.  In Ceph, this can be configured in RadosGW.
2020-01-02 12:15:01 +00:00
Outvi V
db1c7f9ca8 s3: Add new region Asia Patific (Hong Kong) 2020-01-02 11:10:48 +00:00
Nick Craig-Wood
0ecb8bc2f9 s3: fix url decoding of NextMarker - fixes #3799
Before this patch we were failing to URL decode the NextMarker when
url encoding was used for the listing.

The result of this was duplicated listings entries for directories
with >1000 entries where the NextMarker was a file containing a space.
2019-12-12 13:33:30 +00:00
Nick Craig-Wood
0d10640aaa s3: add --s3-copy-cutoff for size to switch to multipart copy
Before this change we used the same (relatively low limits) for server
side copy as we did for multipart uploads.  It doesn't make sense to
use the same limits since no data is being downloaded or uploaded for
a server side copy.

This change introduces a new parameter --s3-copy-cutoff to control
when the switch from single to multipart server size copy happens and
defaults it to the maximum 5GB.

This makes server side copies much more efficient.

It also fixes the erroneous error when trying to set the modification
time of a file bigger than 5GB.

See #3778
2019-12-03 10:37:55 +00:00
Nick Craig-Wood
f4746f5064 s3: fix multipart copy - fixes #3778
Before this change multipart copies were giving the error

    Range specified is not valid for source object of size

This was due to an off by one error in the range source introduced in
7b1274e29a "s3: support for multipart copy"
2019-12-03 10:37:55 +00:00
Aleksandar Janković
c05bb63f96 s3: fix DisableChecksum condition 2019-12-02 15:15:59 +00:00
Nick Craig-Wood
9b5308144f s3: Reduce memory usage streaming files by reducing max stream upload size
Before this change rclone would allow the user to stream (eg with
rclone mount, rclone rcat or uploading google photos or docs) 5TB
files.  This meant that rclone allocated 4 * 525 MB buffers per
transfer which is way too much memory by default.

This change makes rclone use the configured chunk size for streamed
uploads.  This is 5MB by default which means that rclone can stream
upload files up to 48GB by default staying below the 10,000 chunks
limit.

This can be increased with --s3-chunk-size if necessary.

If rclone detects that a file is being streamed to s3 it will make a
single NOTICE level log stating the limitation.

This fixes the enormous memory usage.

Fixes #3568
See: https://forum.rclone.org/t/how-much-memory-does-rclone-need/12743
2019-11-09 15:55:19 +00:00
Aleksandar Jankovic
4b20afa94a backend/s3: fix ExpiryWindow value
ExpiryWindow accepts duration but it was set to value 3.
This changes it to 3 * time.Minute since default is 5 min.
2019-11-05 13:55:55 +00:00
Nick Craig-Wood
ab895390f4 s3: fix nil pointer reference if no metadata returned for object
Fixes #3651 Fixes #3652
2019-10-25 13:45:47 +01:00
庄天翼
7b1274e29a s3: support for multipart copy
Fixes #2375 Fixes #3579
2019-10-04 16:49:06 +01:00
Aleksandar Jankovic
6b55b8b133 s3: add option for multipart failiure behaviour
This is needed for resuming uploads across different sessions.
2019-10-02 16:49:16 +01:00
Nick Craig-Wood
6e053ecbd0 s3: only ask for URL encoded directory listings if we need them on Ceph
This works around a bug in Ceph which doesn't encode CommonPrefixes
when using URL encoded directory listings.

See: https://tracker.ceph.com/issues/41870
2019-09-30 22:00:24 +01:00
Fabian Möller
33f129fbbc s3: use lib/encoder
Co-authored-by: Nick Craig-Wood <nick@craig-wood.com>
2019-09-30 22:00:24 +01:00
Nick Craig-Wood
a8adce9c59 s3: fix encoding for control characters - Fixes #3345 2019-09-30 22:00:24 +01:00
Anthony Rusdi
899f285319 s3: fix signature v2_auth headers
When used with v2_auth = true, PresignRequest doesn't return
signed headers, so remote dest authentication would be fail.
This commit copying back HTTPRequest.Header to headers.

Tested with RiakCS v2.1.0.

Signed-off-by: Anthony Rusdi <33247310+antrusd@users.noreply.github.com>
2019-09-21 14:38:51 +01:00
Nick Craig-Wood
25786cafd3 s3: fix SetModTime on GLACIER/ARCHIVE objects and implement set/get tier
- Read the storage class for each object
- Implement SetTier/GetTier
- Check the storage class on the **object** before using SetModTime

This updates the fix in 1a2fb52 so that SetModTime works when you are
using objects which have been migrated to GLACIER but you aren't using
GLACIER as a storage class.

Fixes #3522
2019-09-14 09:18:55 +01:00
Nick Craig-Wood
66c23723e3 Add context to all http.NewRequest #3257
When we drop support for go1.12 we can use http.NewRequestWithContext
2019-09-09 23:27:07 +01:00
Nick Craig-Wood
6f16588123 s3,b2,googlecloudstorage,swift,qingstor,azureblob: fixes after code review #3421
- change the interface of listBuckets() removing dir parameter and adding context
- add makeBucket() and use in place of Mkdir("")
    - this fixes some corner cases in Copy/Update
- mark all the listed buckets OK in ListR

Thanks to @yparitcher for the review.
2019-08-22 23:06:59 +01:00
Nick Craig-Wood
eaaf2ded94 s3: make all operations work from the root #3421 2019-08-17 10:30:41 +01:00
Nick Craig-Wood
e502be475a azureblob/b2/dropbox/gcs/koofr/qingstor/s3: fix 0 length files
In 0386d22cc9 we introduced a test for 0 length files read the
way mount does.

This test failed on these backends which we fix up here.
2019-08-06 15:18:08 +01:00
Nick Craig-Wood
57d5de6fba build: fix up package paths after repo move
git grep -l github.com/ncw/rclone | xargs -d'\n' perl -i~ -lpe 's|github.com/ncw/rclone|github.com/rclone/rclone|g'
goimports -w `find . -name \*.go`
2019-07-28 18:47:38 +01:00
Matti Niemenmaa
a6dca4c13f s3: Add INTELLIGENT_TIERING storage class
For Intelligent-Tiering:
https://aws.amazon.com/s3/storage-classes/#Unknown_or_changing_access
2019-07-01 18:17:48 +01:00
Aleksandar Jankovic
f78cd1e043 Add context propagation to rclone
- Change rclone/fs interfaces to accept context.Context
- Update interface implementations to use context.Context
- Change top level usage to propagate context to lover level functions

Context propagation is needed for stopping transfers and passing other
request-scoped values.
2019-06-19 11:59:46 +01:00
Philip Harvey
1a2fb52266 s3: make SetModTime work for GLACIER while syncing - Fixes #3224
Before this change rclone would fail with

    Failed to set modification time: InvalidObjectState: Operation is not valid for the source object's storage class

when attempting to set the modification time of an object in GLACIER.

After this change rclone will re-upload the object as part of a sync if it needs to change the modification time.

See: https://forum.rclone.org/t/suspected-bug-in-s3-or-compatible-sync-logic-to-glacier/10187
2019-06-03 15:28:19 +01:00
Robert Marko
5ccc2dcb8f s3: add config info for Wasabi's EU Central endpoint
Wasabi has a EU Central endpoint for a couple months now, so add it to the list.

Signed-off-by: Robert Marko <robimarko@gmail.com>
2019-05-15 13:35:55 +01:00
Nick Craig-Wood
b68c3ce74d s3: suppport S3 Accelerated endpoints with --s3-use-accelerate-endpoint
Fixes #3123
2019-05-02 14:00:00 +01:00
Manu
6e86526c9d s3: add support for "Glacier Deep Archive" storage class - fixes #3088 2019-04-11 10:21:41 +01:00
Fabian Möller
61616ba864 pacer: make pacer more flexible
Make the pacer package more flexible by extracting the pace calculation
functions into a separate interface. This also allows to move features
that require the fs package like logging and custom errors into the fs
package.

Also add a RetryAfterError sentinel error that can be used to signal a
desired retry time to the Calculator.
2019-02-16 14:38:07 +00:00
Nick Craig-Wood
73f0a67d98 s3: Update Dreamhost endpoint - fixes #2974 2019-02-13 21:10:43 +00:00
Fabian Möller
a0d4c04687
backend: fix misspellings 2019-02-07 19:51:03 +01:00
weetmuts
96f6708461 s3: add aws endpoint eu-north-1 2019-02-03 12:17:15 +00:00
Nick Craig-Wood
e31578e03c s3: Auto detect region for buckets on operation failure - fixes #2915
If an incorrect region error is returned while using a bucket then the
region is updated, the session is remade and the operation is retried.
2019-01-27 21:22:49 +00:00
Nick Craig-Wood
39f5059d48 s3: add --s3-bucket-acl to control bucket ACL - fixes #2918
Before this change buckets were created with the same ACL as objects.

After this change, the user can set just --s3-acl to set the ACL of
buckets and objects, or use --s3-bucket-acl as well to have a
different ACL used for bucket creation.

This also logs at INFO level the creation and deletion of buckets.
2019-01-18 15:12:11 +00:00
Nick Craig-Wood
1318c6aec8 s3: Add Alibaba OSS to integration tests and fix storage classes 2019-01-12 20:41:47 +00:00
Nick Craig-Wood
ff0b8e10af s3: Support Alibaba Cloud (Aliyun) OSS
The existing s3 backend passed all integration tests with OSS provided
`force_path_style = false`.

This makes sure that is so and adds documentation and configuration
for OSS.

Thanks to @luolibin for their work on the OSS backend which we ended
up not needing.

Fixes #1641
Fixes #1237
2019-01-12 17:28:04 +00:00
William Cocker
8575abf599 s3: add GLACIER storage class
Fixes #923
2018-12-06 21:53:05 +00:00
Nick Craig-Wood
d99ffde7c0 s3: change --s3-upload-concurrency default to 4 to increase perfomance #2772
Increasing the --s3-upload-concurrency to 4 (from 2) gives an
additional 45% throughput at the cost of 10MB extra memory per transfer.

After testing the upload perfoc
2018-12-02 17:58:34 +00:00
Nick Craig-Wood
198c34ce21 s3: implement --s3-upload-cutoff for single part uploads below this - fixes #2772
Before this change rclone would use multipart uploads for any size of
file.  However multipart uploads are less efficient for smaller files
and don't have MD5 checksums so it is advantageous to use single part
uploads if possible.

This implements single part uploads for all files smaller than the
upload_cutoff size.  Streamed files must be uploaded as multipart
files though.
2018-12-02 17:58:34 +00:00
Henry Ptasinski
f95c1c61dd s3: add config info for Wasabi's US-West endpoint
Wasabi has two location, US East and US West, with different endpoint URLs.
When configuring S3 to use Wasabi, provide the endpoint information for both
locations.
2018-11-19 13:33:42 +00:00
Erik Swanson
fa0a1e7261 s3: fix role_arn, credential_source, ...
When the env_auth option is enabled, the AWS SDK's session constructor
now loads configuration from ~/.aws/config and environment variables,
and credentials per the selected (or default) AWS_PROFILE's settings.

This is accomplished by **NOT** including any Credential provider in the
aws.Config passed to the session constructor: If the Config.Credentials
is non-nil, that will always be used and the user's configuration re
role_arn, credential_source, source_profile, etc... from the shared
config will be completely ignored.

(The conditional creation and configuration of the stscreds Credential
provider is complicated enough that it is not worth re-creating that
logic.)
2018-11-08 12:58:23 +00:00
Nick Craig-Wood
baba6d67e6 s3: set ACL for server side copies to that provided by the user - fixes #2691
Before this change the ACL for objects which were server side copied
was left at the default "private" settings. S3 doesn't copy the ACL
from the source when you copy an object, you have to set it afresh
which is what this does.
2018-11-02 16:22:31 +00:00
Nick Craig-Wood
dbedf33b9f s3: fix v2 signer on files with spaces - fixes #2438
Before this fix the v2 signer was failing for files with spaces in.
2018-10-14 00:10:29 +01:00
Nick Craig-Wood
0f02c9540c s3: make --s3-v2-auth flag
This is an alternative to setting the region to "other-v2-signature"
which is inconvenient for multi-region providers.
2018-10-14 00:10:29 +01:00
Nick Craig-Wood
06922674c8 drive, s3: review hidden config items 2018-10-13 23:30:13 +01:00
Fabian Möller
98e2746e31 backend: add fstests.ChunkedUploadConfig
- azureblob
- b2
- drive
- dropbox
- onedrive
- s3
- swift
2018-10-11 14:47:58 +01:00
Nick Craig-Wood
a9273c5da5 docs: move documentation for options from docs/content into backends
In the following commit, the documentation will be autogenerated.
2018-10-06 11:47:46 +01:00
Paul Kohout
7826e39fcf s3: use configured server-side-encryption and storace class options when calling CopyObject() - fixes #2610 2018-10-04 08:25:20 +01:00
Craig Miskell
2543278c3f S3: Use (custom) pacer, to retry operations when reasonable - fixes #2503 2018-09-11 07:57:03 +01:00
bsteiss
aaa3d7e63b s3: add support for KMS Key ID - fixes #2217
This code supports aws:kms and the kms key id for the s3 backend.
2018-08-30 17:08:27 +01:00