The S3 ListObject API returns paginated bucket listings, with
"MaxKeys" items for each GET call.
The default value is 1000 entries, but for buckets with millions of
objects it might make sense to request more elements per request, if
the backend supports it. This commit adds a "list_chunk" option for
the user to specify a lower or higher value.
This commit does not add safe guards around this value - if a user
decides to request a too large list, it might result in connection
timeouts (on the server or client).
In AWS S3, there is a fixed limit of 1000, some other services might
have one too. In Ceph, this can be configured in RadosGW.
Before this patch we were failing to URL decode the NextMarker when
url encoding was used for the listing.
The result of this was duplicated listings entries for directories
with >1000 entries where the NextMarker was a file containing a space.
Before this change we used the same (relatively low limits) for server
side copy as we did for multipart uploads. It doesn't make sense to
use the same limits since no data is being downloaded or uploaded for
a server side copy.
This change introduces a new parameter --s3-copy-cutoff to control
when the switch from single to multipart server size copy happens and
defaults it to the maximum 5GB.
This makes server side copies much more efficient.
It also fixes the erroneous error when trying to set the modification
time of a file bigger than 5GB.
See #3778
Before this change multipart copies were giving the error
Range specified is not valid for source object of size
This was due to an off by one error in the range source introduced in
7b1274e29a "s3: support for multipart copy"
Before this change rclone used "Authorization: BEARER token". However
according the the RFC this should be "Bearer"
https://tools.ietf.org/html/rfc6750#section-2.1
This changes it to "Authorization: Bearer token"
Fixes#3751 and interop with Salesforce Webdav server
When using nextcloud, before this change we only uploaded one of SHA1
or MD5 checksum in the OC-Checksum header with preference to SHA1 if
both were set.
This makes the MD5 checksums read as empty string which makes syncing
with checksums less useful than they should be as all the MD5
checksums are blank.
This change makes it so that we only upload the SHA1 to nextcloud.
The behaviour of owncloud is unchanged as owncloud uses the checksum
as an upload integrity check only and calculates its own checksums.
See: https://forum.rclone.org/t/how-to-specify-hash-method-to-checksum/13055
This also corrects the symlink detection logic to only check symlink
files. Previous to this it was checking all directories too which was
making it do more stat calls than was necessary.
Before this change we forgot to URL decode the X-Object-Manifest in a dynamic large object.
This problem was introduced by 2fe8285f89 "swift: reserve
segments of dynamic large object when delete objects in container what
was enabled versioning."
For few commands, RClone counts a error multiple times. This was fixed by
creating a new error type which keeps a flag to remember if the error has
already been counted or not. The CountError function now wraps the original
error eith the above new error type and returns it.
Before this change rclone used the team_drive ID as the root if set
even if the root_folder_id was set too.
This change uses the root_folder_id in preference over the team_drive
which restores the functionality.
This problem was introduced by ba7c2ac443Fixes#3742
We attempt to find the ID of the root folder by doing a GET on the
folder ID "root". With scope "drive.files" this fails with a 404
message.
After this change if we get the 404 message, we just carry on using
"root" as the root folder ID and we cache that for future lookups.
This means that changenotify messages will not work correctly in the
root folder but otherwise has minor consequences.
See: https://forum.rclone.org/t/fresh-raspberry-pi-build-google-drive-404-error-failed-to-ls-googleapi-error-404-file-not-found/12791
Before this change rclone would allow the user to stream (eg with
rclone mount, rclone rcat or uploading google photos or docs) 5TB
files. This meant that rclone allocated 4 * 525 MB buffers per
transfer which is way too much memory by default.
This change makes rclone use the configured chunk size for streamed
uploads. This is 5MB by default which means that rclone can stream
upload files up to 48GB by default staying below the 10,000 chunks
limit.
This can be increased with --s3-chunk-size if necessary.
If rclone detects that a file is being streamed to s3 it will make a
single NOTICE level log stating the limitation.
This fixes the enormous memory usage.
Fixes#3568
See: https://forum.rclone.org/t/how-much-memory-does-rclone-need/12743
Before this fix we neglected to add the shared drive ID to the request
when asking for an initial change notify token and this caused a lot
more results to be returned than was necessary.
When we changed recursive lists to use --fast-list by default this
broke listing with --drive-shared-with-me from the root.
This turned out to be an unwarranted assumption in the ListR code that
all items would have a parent folder that we had searched for - this
isn't true for shared with me items.
This was fixed when using --drive-shared-with-me to give items that
didn't have any parents a synthetic parent.
Fixes#3639
Before this change we used the id "root" as an alias for the root drive ID.
However this causes problems when we receive IDs back from drive which
are not in this format and have been expanded to their canonical ID.
This change looks up the ID "root" and stores it in the
"drive_folder_id" parameter in the config file.
This helps with
- Notifying changes at the root
- Files shared with me at the root
See #3639
Before this change when rclone was compiled with go1.13 it used HTTP/2
to contact drive by default.
This causes lockups and INTERNAL_ERRORs from the HTTP/2 code.
This is a workaround disabling the HTTP/2 code on an option.
It can be re-enabled with `--drive-disable-http2=false`
See #3631
Before this change we silently skipped uploads to dropbox of
disallowed file names. However this then caused "corrupted on
transfer" errors because the sizes were wrong.
After this change we return an no retry error which will mean that the
sync fails (as it should - not all files were uploaded) but no
unecessary retries happened.
This works around a bug in Ceph which doesn't encode CommonPrefixes
when using URL encoded directory listings.
See: https://tracker.ceph.com/issues/41870
changes:
- chunker: remove GetTier and SetTier
- remove wdmrcompat metaformat
- remove fastopen strategy
- make hash_type option non-advanced
- adverise hash support when possible
- add metadata field "ver", run strict checks
- describe internal behavior in comments
- improve documentation
note:
wdmrcompat used to write file name in the metadata, so maximum metadata
size was 1K; removing it allows to cap size by 200 bytes now.
Note: chunker implements many irrelevant methods (UserInfo, Disconnect etc),
but they are required by TestIntegration/FsCheckWrap and cannot be removed.
Dropped API methods: MergeDirs DirCacheFlush PublicLink UserInfo Disconnect OpenWriterAt
Meta formats:
- renamed old simplejson format to wdmrcompat.
- new simplejson format supports hash sums and verification of chunk size/count.
Change list:
- split-chunking overlay for mailru
- add to all
- fix linter errors
- fix integration tests
- support chunks without meta object
- fix package paths
- propagate context
- fix formatting
- implement new required wrapper interfaces
- also test large file uploads
- simplify options
- user friendly name pattern
- set default chunk size 2G
- fix building with golang 1.9
- fix ci/cd on a separate branch
- fix updated object name (SyncUTFNorm failed)
- fix panic in Box overlay
- workaround: Box rename failed if name taken
- enhance comments in unit test
- fix formatting
- embed wrapped remote rather than inherit
- require wrapped remote to support move (or copy)
- implement 3 (keep fstest)
- drop irrelevant file system interfaces
- factor out Object.mainChunk
- refactor TestLargeUpload as InternalTest
- add unit test for chunk name formats
- new improved simplejson meta format
- tricky case in test FsIsFile (fix+ignore)
- remove debugging print
- hide temporary objects from listings
- fix bugs in chunking reader:
- return EOF immediately when all data is sent
- handle case when wrapped remote puts by hash (bug detected by TestRcat)
- chunked file hashing (feature)
- server-side copy across configs (feature)
- robust cleanup of temporary chunks in Put
- linear download strategy (no read-ahead, feature)
- fix unexpected EOF in the box multipart uploader
- throw error if destination ignores data
When used with v2_auth = true, PresignRequest doesn't return
signed headers, so remote dest authentication would be fail.
This commit copying back HTTPRequest.Header to headers.
Tested with RiakCS v2.1.0.
Signed-off-by: Anthony Rusdi <33247310+antrusd@users.noreply.github.com>
- Read the storage class for each object
- Implement SetTier/GetTier
- Check the storage class on the **object** before using SetModTime
This updates the fix in 1a2fb52 so that SetModTime works when you are
using objects which have been migrated to GLACIER but you aren't using
GLACIER as a storage class.
Fixes#3522