swift: add workarounds for bad listings in Ceph RGW

Ceph's Swift API emulation does not fully confirm to the API spec.
As a result, it sometimes returns fewer items in a container than
the requested limit, which according to the spec should means
that there are no more objects left in the container.  (Note that
python-swiftclient always fetches unless the current page is empty.)

This commit adds a pair of new Swift backend settings to handle this.

Set `fetch_until_empty_page` to true to always fetch another
page of the container listing unless there are no items left.

Alternatively, set `partial_page_fetch_threshold` to an integer
percentage.  In this case rclone will fetch a new page only when
the current page is within this percentage of the limit.

Swift API reference: https://docs.openstack.org/swift/latest/api/pagination.html

PR against ncw/swift with research and discussion: https://github.com/ncw/swift/pull/167

Fixes #7924
This commit is contained in:
Paul Collins 2021-04-09 11:07:49 +12:00 committed by Nick Craig-Wood
parent 754e53dbcc
commit 8d80df2c85

View File

@ -278,6 +278,36 @@ provider.`,
Value: "pca", Value: "pca",
Help: "OVH Public Cloud Archive", Help: "OVH Public Cloud Archive",
}}, }},
}, {
Name: "fetch_until_empty_page",
Help: `When paginating, always fetch unless we received an empty page.
Consider using this option if rclone listings show fewer objects
than expected, or if repeated syncs copy unchanged objects.
It is safe to enable this, but rclone may make more API calls than
necessary.
This is one of a pair of workarounds to handle implementations
of the Swift API that do not implement pagination as expected. See
also "partial_page_fetch_threshold".`,
Default: false,
Advanced: true,
}, {
Name: "partial_page_fetch_threshold",
Help: `When paginating, fetch if the current page is within this percentage of the limit.
Consider using this option if rclone listings show fewer objects
than expected, or if repeated syncs copy unchanged objects.
It is safe to enable this, but rclone may make more API calls than
necessary.
This is one of a pair of workarounds to handle implementations
of the Swift API that do not implement pagination as expected. See
also "fetch_until_empty_page".`,
Default: 0,
Advanced: true,
}}, SharedOptions...), }}, SharedOptions...),
}) })
} }
@ -308,6 +338,8 @@ type Options struct {
NoLargeObjects bool `config:"no_large_objects"` NoLargeObjects bool `config:"no_large_objects"`
UseSegmentsContainer fs.Tristate `config:"use_segments_container"` UseSegmentsContainer fs.Tristate `config:"use_segments_container"`
Enc encoder.MultiEncoder `config:"encoding"` Enc encoder.MultiEncoder `config:"encoding"`
FetchUntilEmptyPage bool `config:"fetch_until_empty_page"`
PartialPageFetchThreshold int `config:"partial_page_fetch_threshold"`
} }
// Fs represents a remote swift server // Fs represents a remote swift server
@ -462,6 +494,8 @@ func swiftConnection(ctx context.Context, opt *Options, name string) (*swift.Con
ConnectTimeout: 10 * ci.ConnectTimeout, // Use the timeouts in the transport ConnectTimeout: 10 * ci.ConnectTimeout, // Use the timeouts in the transport
Timeout: 10 * ci.Timeout, // Use the timeouts in the transport Timeout: 10 * ci.Timeout, // Use the timeouts in the transport
Transport: fshttp.NewTransport(ctx), Transport: fshttp.NewTransport(ctx),
FetchUntilEmptyPage: opt.FetchUntilEmptyPage,
PartialPageFetchThreshold: opt.PartialPageFetchThreshold,
} }
if opt.EnvAuth { if opt.EnvAuth {
err := c.ApplyEnvironment() err := c.ApplyEnvironment()