Pikpak can accelerate file uploads by leveraging existing content
in its storage (identified by a custom hash called gcid).
Previously, file transfer statistics were incorrect for uploads
without outbound traffic as the input stream remained unchanged.
This commit addresses the issue by:
* Removing unnecessary unwrapping/wrapping of accountings
before/after gcid calculation, leading immediate AccountRead() on buffering.
* Correctly tracking file transfer statistics for uploads
with no incoming/outgoing traffic by marking them as Server Side Copies.
This change ensures correct statistics tracking and improves overall user experience.
This commit optimizes the PikPak upload process by pre-fetching the Global
Content Identifier (gcid) from the API server before calculating it locally.
Previously, a gcid required for uploads was calculated locally. This process was
resource-intensive and time-consuming. By first checking for a cached gcid
on the server, we can potentially avoid the local calculation entirely.
This significantly improves upload speed especially for large files.
fix#7824
Statements like rclone copy <somewhere> . will spontaneously miss
if . expands to a path with a Full Width replacement character.
This is due to the incorrect order in which
relative paths and decoding were handled in the original implementation.
This also
- move in use options (Opt) from vfsflags to vfscommon
- change os.FileMode to vfscommon.FileMode in parameters
- rework vfscommon.FileMode and add tests
The SFTP protocol (and the golang sftp package) internally uses uint32 unix
time for expressing mtime. Hence it is a waste of memory to store it as 24-byte
time.Time data structure in long-lived data structures. So despite that the
golang sftp package uses time.Time as external interface, we can re-encode the
value back to the original format and save memory.
Co-authored-by: Tomasz Melcer <tomasz@melcer.pl>
Previously, the code relied on calling `readMetaData()` after every file move operation.
This introduced an unnecessary API call and potentially impacted performance.
This change removes the redundant `readMetaData()` call, improving efficiency.
Fixes an issue where copied files could not be renamed when using the
`copyto` command. This occurred because the object ID was empty
before calling `readMetaData`. The fix preemptively calls `readMetaData`
to ensure an object ID is available before attempting the rename operation.
Ceph's Swift API emulation does not fully confirm to the API spec.
As a result, it sometimes returns fewer items in a container than
the requested limit, which according to the spec should means
that there are no more objects left in the container. (Note that
python-swiftclient always fetches unless the current page is empty.)
This commit adds a pair of new Swift backend settings to handle this.
Set `fetch_until_empty_page` to true to always fetch another
page of the container listing unless there are no items left.
Alternatively, set `partial_page_fetch_threshold` to an integer
percentage. In this case rclone will fetch a new page only when
the current page is within this percentage of the limit.
Swift API reference: https://docs.openstack.org/swift/latest/api/pagination.html
PR against ncw/swift with research and discussion: https://github.com/ncw/swift/pull/167Fixes#7924
Similar to uploads implemented in commit ce5024bf33,
this change ensures most asynchronous file operations (copy, move, delete,
purge, and cleanup) complete before proceeding with subsequent actions.
This reduces the risk of data inconsistencies and improves overall reliability.
When getting an object by specifying a versionId in the request, if the
specified version is a delete marker, it returns 405 (Method Not Allowed),
instead of 404 (Not Found) which would be returned without a versionId. See
https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeleteMarker.html
Before this change, we were only looking for 404 (and not 405) to determine
whether the object exists. This meant that in some circumstances (ex. when
Versioning is enabled for the bucket and we have a non-null X-Amz-Version-Id), we
deemed the object to exist when we should not have.
After this change, 405 (Method Not Allowed) is treated the same as 404 (Not
Found) for the purposes of headObject.
See https://forum.rclone.org/t/bisync-rename-failed-method-not-allowed/45723/13
Previously, the fixed 10MB chunk size could lead to exceeding the maximum
allowed number of parts for very large files. Similar to other backends, options for
chunk size and upload concurrency are now user-configurable. Additionally,
the internal library is used to automatically adjust chunk size to prevent exceeding
upload part limitations.
Fixes#7850
This lets you, for example, use shared folders without mounting them
into your home namespace first, as long as you know their namespace ID.
(The --dropbox-shared-folders flag could thus be changed to not need to
mount the shared folder first, but I'm not doing that here as it's a
behavior change, who knows, maybe somebody relies on it.)
`remote` has been converted ToStandardPath a few lines above, so `directory`
needs to be converted the same way in order to be compared properly. This was
spotted on `TestBisyncRemoteRemote/extended_filenames` for
`TestS3,directory_markers:` and `TestGoogleCloudStorage,directory_markers:`
which tripped over a directory name containing a Line Feed symbol.
This attempts to resolve upload conflicts by implementing cancel/cleanup on failed
uploads
* fix panic error on defer cancel upload
* increase pacer min sleep from 10 to 100 ms
* stop using uploadByForm()
* introduce force sleep before and after async tasks
* use pacer's retry scheme instead of manual implementation
Fixes#7787
Before this change some of the integration tests were producing this error
panic: runtime error: invalid memory address or nil pointer dereference
This was caused by an `fs.Object` of which the type (`*Object`) was
not `nil`, but the value within was `nil`. These do not compare as
`nil` leading to the panic.
This is a classic Go gotcha: https://go.dev/doc/faq#nil_error
This was easily fixed by changing the type of one function to return
fs.Object instead of *Object.
Before this change we waited a minimum of 10ms between API calls for
mailru.
The tests no longer pass at this rate, so this increases the time to
100ms.
See #7768
For example using
--onedrive-metadata-permissions read,write,failok
Will allow permissions to be read and written but if the writing
fails, then only an ERROR will be written in the log and the transfer
won't fail.
For example using
--drive-metadata-permissions read,write,failok
Will allow metadata to be read and written but if the writing fails,
then only an ERROR will be written in the log and the transfer won't
fail.
Before this change, cache.PinUntilFinalized was called twice if the root pointed
to a composite multi-chunk file without metadata, resulting in a fatal "finalizer
already set" error. This change fixes the issue.
This change adds support for "group" identities, and SharePoint variants
"siteUser" and "siteGroup". It also adds support for using any identity type
(including "application" and "device") as a recipient source when adding
permissions.
Before this change, metadata permissions used the `grantedTo` and
`grantedToIdentities` properties, which are deprecated on OneDrive Business in
favor of `grantedToV2` and `grantedToIdentitiesV2`. After this change, OneDrive
Business uses the new V2 versions, while OneDrive Personal still uses the
originals, as the V2 versions are not available for OneDrive Personal. (see
https://learn.microsoft.com/en-us/answers/questions/1079737/inconsistency-between-grantedtov2-and-grantedto-re)
Previously, `getFile()` was called indiscriminately during uploads, moves,
and download link generation. This could lead to users with download limit
causing subsequent operations like uploads and moves to fail.
This PR optimizes the use of getFile(), by only calling it
when it's strictly necessary.
This switches between storing chunks in a separate container suffixed
with `_segments` (the default) and a directory in the root
`.file-segments`)
By default the `.file-segments` mode will be auto selected if
`auth_url`s that require it are detected.
If the `.file-segments` mode is in use then rclone will omit that
directory from listings.
See: https://forum.rclone.org/t/blomp-unable-to-upload-5gb-files/42498/
Before this change when setting permissions from the metadata rclone
would stop on the first error.
This change causes rclone to attempt to set all the permissions and
return an error summary at the end.
Before this change, chunker would erroneously consider two different paths to be
equal if, due to special characters, they normalized to equal-folding strings in
Standard Encoding, but not otherwise. This caused base objects to get moved when
they should not have been. This change fixes the issue, which was discovered on
the bisync integration tests.
Ideally it should also be fixed when the base Fs is non-local, but there's not an
easy way at the moment to reference the wrapped Fs's encoding, at least without
breaking encapsulation.
Before this change, calling NewFs on a composite multi-chunk file with
--chunker-meta-format "none"
would fail due to f.base pointing to the wrong Fs. This change fixes the issue,
which was discovered on the bisync integration tests.
Before this change when setting permissions from the metadata rclone
would stop on the first error.
This change causes rclone to attempt to set all the permissions and
return an error summary at the end.
Before this change, calling SetModTime on owncloud and nextcloud would
inadvertently erase the object's stored hashes. This change fixes the issue,
which was discovered by the bisync integration tests.
In this commit we merged an unreliable test
e053c8a1c0 copy: fix nil pointer dereference when corrupted on transfer with nil dst
It is a good idea but very hard to implement so it always works.
Hence this disables it for the moment.
Before this change, the --metadata-mapper was called twice if an object was
uploaded via multipart upload with --metadata and --onedrive-metadata-permissions
"write" or "read,write". This change fixes the issue.
Before this change, List would return incorrect directory paths (relative to the
wrong root) if the Fs root pointed to a subdirectory. For example, listing dir
"a/b/c/d" of remote :memory: would work correctly, but listing dir "c/d" of
remote :memory:a/b would not, and would result in "Entry doesn't belong in
directory %q (contains subdir)" errors.
This change fixes the issue and adds a test to detect any other backends that
might have the same issue.
Before this change, the Memory backend had the potential to deadlock under
certain conditions, if the ListR callback required locking the b.mu mutex. This
was the case with operations.Purge, because Memory has no Purge method, and the
fallback option does:
err = DeleteFiles(ctx, listToChan(ctx, f, dir))
which potentially starts removing objects before the listing has completed.
This change fixes the issue by batching all the entries before calling the
callback on them.
Before this change, the Memory backend's Copy method created a dst object that
referenced the src's objectData by pointer instead of making a copy. While this
minimized memory usage, an unintended consequence was that subsequently mutating
the src (such as changing the modtime) would inadvertently also mutate the dst,
and vice versa.
This change fixes the issue and adds a test.
Before this change trying to server side copy an object from a my
drive to a shared drive using --metadata caused this error:
Sharing restrictions cannot be set on a shared drive item., teamDrivesSharingRestrictionNotAllowed
This was because we were setting the "writers-can-share" metadata
which isn't allowed on shared drives
1. The maximum number of objects on a page should be no more than
1000. Currently it is 1024, for this reason the listing always ends on
the first page with the error “object not found”, rclone tries to
upload the file again, Linkbox stores it with the name “filename(N)”,
and so the storage fills up indefinitely.
2. A hyphen is added to the list of allowed characters, that makes
queries more optimized (no need to load all files in a directory for
an entity with a hyphen).
The LinkBox API does not allow searching by more than 25 Unicode
characters in the name, for this reason it is currently impossible to
work with files and folders named longer than 8 Unicode chars (if
encoded in base32).
This fix queries all files in a directory for long names and checks
their names one by one, thus solving the issue.
Fixes#7542
This command executes a list query in Google Drive’s native query
language and returns a JSON dump of matches. It’s useful for locating
files quickly in folders with a large number of files, where rclone’s
normal list command is slow due to client-side filtering.
Before this change, Hasher did not check whether a "passed hash" (hashtype
natively supported by the wrapped backend) returned from a backend was blank,
and would sometimes return a blank hash to the caller even when a non-blank hash
was already stored in the db. This caused issues with, for example, Google
Drive, which has SHA1 / SHA256 hashes for some files but not others
(https://rclone.org/drive/#sha1-or-sha256-hashes-may-be-missing) and sometimes also
does not have hashes for very recently modified files.
After this change, Hasher will check if the received "passed hash" is
unexpectedly blank, and if so, it will continue to try other enabled methods,
such as retrieving a value from the database, or possibly regenerating it.
https://forum.rclone.org/t/hasher-with-gdrive-backend-does-not-return-sha1-sha256-for-old-files/44680/9?u=nielash
This change adds support for metadata on OneDrive. Metadata (including
permissions) is supported for both files and directories.
OneDrive supports System Metadata (not User Metadata, as of this writing.) Much
of the metadata is read-only, and there are some differences between OneDrive
Personal and Business (see table in OneDrive backend docs for details).
Permissions are also supported, if --onedrive-metadata-permissions is set. The
accepted values for --onedrive-metadata-permissions are read, write, read,write, and
off (the default). write supports adding new permissions, updating the "role" of
existing permissions, and removing permissions. Updating and removing require
the Permission ID to be known, so it is recommended to use read,write instead of
write if you wish to update/remove permissions.
Permissions are read/written in JSON format using the same schema as the
OneDrive API, which differs slightly between OneDrive Personal and Business.
(See OneDrive backend docs for examples.)
To write permissions, pass in a "permissions" metadata key using this same
format. The --metadata-mapper tool can be very helpful for this.
When adding permissions, an email address can be provided in the User.ID or
DisplayName properties of grantedTo or grantedToIdentities. Alternatively, an
ObjectID can be provided in User.ID. At least one valid recipient must be
provided in order to add a permission for a user. Creating a Public Link is also
supported, if Link.Scope is set to "anonymous".
Note that adding a permission can fail if a conflicting permission already
exists for the file/folder.
To update an existing permission, include both the Permission ID and the new
roles to be assigned. roles is the only property that can be changed.
To remove permissions, pass in a blob containing only the permissions you wish
to keep (which can be empty, to remove all.)
Note that both reading and writing permissions requires extra API calls, so if
you don't need to read or write permissions it is recommended to omit --onedrive-
metadata-permissions.
Metadata and permissions are supported for Folders (directories) as well as
Files. Note that setting the mtime or btime on a Folder requires one extra API
call on OneDrive Business only.
OneDrive does not currently support User Metadata. When writing metadata, only
writeable system properties will be written -- any read-only or unrecognized keys
passed in will be ignored.
TIP: to see the metadata and permissions for any file or folder, run:
rclone lsjson remote:path --stat -M --onedrive-metadata-permissions read
See the OneDrive backend docs for a table of all the supported metadata
properties.
Before this change, operations.CopyDirMetadata would fail with: `internal error:
expecting directory string from combine root '' to have SetMetadata method:
optional feature not implemented` if the dst was the root directory of a combine
upstream. This is because combine was returning a *fs.Dir, which does not
satisfy the fs.SetMetadataer interface.
While it is true that combine cannot set metadata on the root of an upstream
(see also #7652), this should not be considered an error that causes sync to do
high-level retries, abort without doing deletes, etc.
This change addresses the issue by creating a new type of DirWrapper that is
allowed to fail silently, for exceptional cases such as this where certain
special directories have more limited abilities than what the Fs usually
supports.
It is possible that other similar wrapping backends (Union?) may need this same
fix.
In this commit (2014 for v1.02) Purge was implemented for the local
backend:
1527e64ee7 local: Implement Purger interface
This appeared to be implemented just to make a Purge and doesn't
appear to do anything useful.
It is in fact significatly worse than the rclone fallback purge since
it doesn't operate in parallel or update stats.
This patch removes the Purge routine for a consequent speed up and
showing of stats.
See: https://forum.rclone.org/t/progress-flag-for-rclone-purge/44416
Before this change, undecryptable file names would be skipped very quietly
(there was a log warning, but only at DEBUG level),
failing to alert users of a potentially serious issue that needs attention.
After this change, the log level is raised to NOTICE by default and a new
--crypt-strict-names flag allows raising an error, for users who may prefer not
to proceed if such an issue is detected.
See https://forum.rclone.org/t/skipping-undecryptable-file-name-should-be-an-error/27115https://github.com/rclone/rclone/issues/5787
Before this change this would give errors like this
failed to set metadata on directory: failed to set birth (creation) time: Access is denied.
This was caused by opening the directory in the wrong mode.
A consequence of this is that fs.Directory returned by the local
backend will now have a correct size in (rather than -1). Some tests
depended on this and have been fixed by this commit too.
Before this change, moving (renaming) a file or folder to a different name
within the same parent directory would fail, due to using the wrong API
operation ("/file/move_copy.json" and "/folder/move_copy.json", instead of the
separate "/file/rename.json" and "/folder/rename.json" that opendrive has for
this purpose.)
After this change, Move and DirMove check whether the move is within the same
parent dir. If so, "rename" is used. If not, "move_copy" is used, like before.
GCS gives NotImplemented errors for multi-part server side copies. The
threshold for these is currently set just below 5G so any files bigger
than 5G that rclone attempts to server side copy will fail.
This patch works around the problem by adding a quirk for GCS raising
--s3-copy-cutoff to the maximum. This means that rclone will never use
multi-part copies for files in GCS. This includes files bigger than
5GB which (according to AWS documentation) must be copied with
multi-part copy. However this seems to work with GCS.
See: https://forum.rclone.org/t/chunker-uploads-to-gcs-s3-fail-if-the-chunk-size-is-greater-than-the-max-part-size/44349/
See: https://issuetracker.google.com/issues/323465186
A seafile server can be configured to use a relative URL as
FILE_SERVER_ROOT in order to support more than one hostname/ip. (see
https://github.com/haiwen/seahub/issues/3398#issuecomment-506920360 )
The previous backend implementation always expected an absolute
download/upload URL, resulting in an "unsupported protocol scheme"
error.
With this commit it supports both absolute and relative.
It was reported that rclone copy occasionally uploaded corrupted data
to azure blob.
This turned out to be a race condition updating the block count which
caused blocks to be duplicated.
This bug was introduced in this commit in v1.64.0 and will be fixed in v1.65.2
0427177857 azureblob: implement OpenChunkWriter and multi-thread uploads #7056
This race only seems to happen if `--checksum` is used but can happen otherwise.
Unfortunately Azure blob does not check the MD5 that we send them so
despite sending incorrect data this corruption is not detected. The
corruption is detected when rclone tries to download the file, so
attempting to copy the files back to local disk will result in errors
such as:
ERROR : file.pokosuf5.partial: corrupted on transfer: md5 hash differ "XXX" vs "YYY"
This adds a check to test the blocklist we upload is as we expected
which would have caught the problem had it been in place earlier.
Similar to
acf1e2df84,
go1.21.4 appears to have broken sync.MoveDir on Windows because
filepath.VolumeName() returns `\\?` instead of `\\?\C:` in cleanRootPath. It
looks like the Go team is aware of the issue and planning a fix, so this may
only be needed temporarily.