2018-08-22 00:52:46 +02:00
// Package endpoint implements replication endpoints for use with package replication.
2018-08-22 00:43:58 +02:00
package endpoint
2017-05-20 17:08:18 +02:00
import (
2020-04-10 23:24:14 +02:00
"bytes"
2018-07-08 23:31:46 +02:00
"context"
2018-08-30 11:51:47 +02:00
"fmt"
2020-04-10 21:58:28 +02:00
"io"
2018-12-11 22:01:50 +01:00
"path"
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
"sync"
2018-12-11 22:01:50 +01:00
2018-08-25 21:30:25 +02:00
"github.com/pkg/errors"
2020-04-11 15:49:41 +02:00
"github.com/zrepl/zrepl/daemon/logging/trace"
2019-03-22 19:41:12 +01:00
2019-02-22 11:40:27 +01:00
"github.com/zrepl/zrepl/replication/logic/pdu"
2020-04-10 23:24:14 +02:00
"github.com/zrepl/zrepl/util/chainedio"
2019-03-20 18:52:40 +01:00
"github.com/zrepl/zrepl/util/chainlock"
2019-03-28 21:22:22 +01:00
"github.com/zrepl/zrepl/util/envconst"
"github.com/zrepl/zrepl/util/semaphore"
2018-08-25 21:30:25 +02:00
"github.com/zrepl/zrepl/zfs"
2017-05-20 17:08:18 +02:00
)
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
type SenderConfig struct {
FSF zfs . DatasetFilter
Encrypt * zfs . NilBool
JobID JobID
}
func ( c * SenderConfig ) Validate ( ) error {
c . JobID . MustValidate ( )
if err := c . Encrypt . Validate ( ) ; err != nil {
return errors . Wrap ( err , "`Encrypt` field invalid" )
}
if _ , err := StepHoldTag ( c . JobID ) ; err != nil {
return fmt . Errorf ( "JobID cannot be used for hold tag: %s" , err )
}
return nil
}
2018-08-22 00:52:46 +02:00
// Sender implements replication.ReplicationEndpoint for a sending side
type Sender struct {
2018-12-11 22:01:50 +01:00
FSFilter zfs . DatasetFilter
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
encrypt * zfs . NilBool
jobId JobID
2017-12-26 21:37:48 +01:00
}
2017-05-20 17:08:18 +02:00
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
func NewSender ( conf SenderConfig ) * Sender {
if err := conf . Validate ( ) ; err != nil {
panic ( "invalid config" + err . Error ( ) )
}
return & Sender {
FSFilter : conf . FSF ,
encrypt : conf . Encrypt ,
jobId : conf . JobID ,
}
2018-09-06 04:49:26 +02:00
}
func ( s * Sender ) filterCheckFS ( fs string ) ( * zfs . DatasetPath , error ) {
dp , err := zfs . NewDatasetPath ( fs )
if err != nil {
return nil , err
}
if dp . Length ( ) == 0 {
return nil , errors . New ( "empty filesystem not allowed" )
}
pass , err := s . FSFilter . Filter ( dp )
if err != nil {
return nil , err
}
if ! pass {
2019-02-22 11:40:27 +01:00
return nil , fmt . Errorf ( "endpoint does not allow access to filesystem %s" , fs )
2018-09-06 04:49:26 +02:00
}
return dp , nil
2017-12-26 21:37:48 +01:00
}
2017-05-20 17:08:18 +02:00
2018-12-11 22:01:50 +01:00
func ( s * Sender ) ListFilesystems ( ctx context . Context , r * pdu . ListFilesystemReq ) ( * pdu . ListFilesystemRes , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
2018-12-11 22:01:50 +01:00
fss , err := zfs . ZFSListMapping ( ctx , s . FSFilter )
2018-06-20 20:20:37 +02:00
if err != nil {
return nil , err
}
2018-08-16 21:05:21 +02:00
rfss := make ( [ ] * pdu . Filesystem , len ( fss ) )
2018-06-20 20:20:37 +02:00
for i := range fss {
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
encEnabled , err := zfs . ZFSGetEncryptionEnabled ( ctx , fss [ i ] . ToString ( ) )
if err != nil {
return nil , errors . Wrap ( err , "cannot get filesystem encryption status" )
}
2018-08-16 21:05:21 +02:00
rfss [ i ] = & pdu . Filesystem {
2018-06-20 20:20:37 +02:00
Path : fss [ i ] . ToString ( ) ,
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
// ResumeToken does not make sense from Sender
2019-03-13 18:33:20 +01:00
IsPlaceholder : false , // sender FSs are never placeholders
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
IsEncrypted : encEnabled ,
2017-05-20 17:08:18 +02:00
}
}
2019-03-13 18:33:20 +01:00
res := & pdu . ListFilesystemRes { Filesystems : rfss }
2018-12-11 22:01:50 +01:00
return res , nil
2017-12-26 21:37:48 +01:00
}
2017-05-20 17:08:18 +02:00
2018-12-11 22:01:50 +01:00
func ( s * Sender ) ListFilesystemVersions ( ctx context . Context , r * pdu . ListFilesystemVersionsReq ) ( * pdu . ListFilesystemVersionsRes , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
2018-12-11 22:01:50 +01:00
lp , err := s . filterCheckFS ( r . GetFilesystem ( ) )
2018-06-20 20:20:37 +02:00
if err != nil {
return nil , err
2017-07-30 14:56:16 +02:00
}
2020-04-15 16:11:16 +02:00
fsvs , err := zfs . ZFSListFilesystemVersions ( ctx , lp , zfs . ListFilesystemVersionsOptions { } )
2018-06-20 20:20:37 +02:00
if err != nil {
return nil , err
}
2018-08-16 21:05:21 +02:00
rfsvs := make ( [ ] * pdu . FilesystemVersion , len ( fsvs ) )
2018-06-20 20:20:37 +02:00
for i := range fsvs {
2018-08-30 11:51:47 +02:00
rfsvs [ i ] = pdu . FilesystemVersionFromZFS ( & fsvs [ i ] )
2017-12-26 21:37:48 +01:00
}
2018-12-11 22:01:50 +01:00
res := & pdu . ListFilesystemVersionsRes { Versions : rfsvs }
return res , nil
2018-06-20 20:20:37 +02:00
}
2017-05-20 17:08:18 +02:00
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
func ( p * Sender ) HintMostRecentCommonAncestor ( ctx context . Context , r * pdu . HintMostRecentCommonAncestorReq ) ( * pdu . HintMostRecentCommonAncestorRes , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
fsp , err := p . filterCheckFS ( r . GetFilesystem ( ) )
if err != nil {
return nil , err
}
fs := fsp . ToString ( )
log := getLogger ( ctx ) . WithField ( "fs" , fs ) . WithField ( "hinted_most_recent" , fmt . Sprintf ( "%#v" , r . GetSenderVersion ( ) ) )
log . WithField ( "full_hint" , r ) . Debug ( "full hint" )
if r . GetSenderVersion ( ) == nil {
// no common ancestor found, likely due to failed prior replication attempt
// => release stale step holds to prevent them from accumulating
// (they can accumulate on initial replication because each inital replication step might hold a different `to`)
// => replication cursors cannot accumulate because we always _move_ the replication cursor
log . Debug ( "releasing all step holds on the filesystem" )
TryReleaseStepStaleFS ( ctx , fs , p . jobId )
return & pdu . HintMostRecentCommonAncestorRes { } , nil
}
// we were hinted a specific common ancestor
mostRecentVersion , err := sendArgsFromPDUAndValidateExistsAndGetVersion ( ctx , fs , r . GetSenderVersion ( ) )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
if err != nil {
msg := "HintMostRecentCommonAncestor rpc with nonexistent most recent version"
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
log . Warn ( msg )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
return nil , errors . Wrap ( err , msg )
}
// move replication cursor to this position
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
destroyedCursors , err := MoveReplicationCursor ( ctx , fs , mostRecentVersion , p . jobId )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
if err == zfs . ErrBookmarkCloningNotSupported {
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
log . Debug ( "not creating replication cursor from bookmark because ZFS does not support it" )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
// fallthrough
} else if err != nil {
return nil , errors . Wrap ( err , "cannot set replication cursor to hinted version" )
}
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
// take care of stale step holds
log . WithField ( "step-holds-cleanup-mode" , senderHintMostRecentCommonAncestorStepCleanupMode ) .
Debug ( "taking care of possibly stale step holds" )
doStepCleanup := false
var stepCleanupSince * CreateTXGRangeBound
switch senderHintMostRecentCommonAncestorStepCleanupMode {
case StepCleanupNoCleanup :
doStepCleanup = false
case StepCleanupRangeSinceUnbounded :
doStepCleanup = true
stepCleanupSince = nil
case StepCleanupRangeSinceReplicationCursor :
doStepCleanup = true
// Use the destroyed replication cursors as indicator how far the previous replication got.
// To be precise: We limit the amount of visisted snapshots to exactly those snapshots
// created since the last successful replication cursor movement (i.e. last successful replication step)
//
// If we crash now, we'll leak the step we are about to release, but the performance gain
// of limiting the amount of snapshots we visit makes up for that.
// Users have the `zrepl holds release-stale` command to cleanup leaked step holds.
for _ , destroyed := range destroyedCursors {
if stepCleanupSince == nil {
stepCleanupSince = & CreateTXGRangeBound {
CreateTXG : destroyed . GetCreateTXG ( ) ,
Inclusive : & zfs . NilBool { B : true } ,
}
} else if destroyed . GetCreateTXG ( ) < stepCleanupSince . CreateTXG {
stepCleanupSince . CreateTXG = destroyed . GetCreateTXG ( )
}
}
default :
panic ( senderHintMostRecentCommonAncestorStepCleanupMode )
}
if ! doStepCleanup {
log . Info ( "skipping cleanup of prior invocations' step holds due to environment variable setting" )
} else {
if err := ReleaseStepCummulativeInclusive ( ctx , fs , stepCleanupSince , mostRecentVersion , p . jobId ) ; err != nil {
return nil , errors . Wrap ( err , "cannot cleanup prior invocation's step holds and bookmarks" )
} else {
log . Info ( "step hold cleanup done" )
}
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
}
return & pdu . HintMostRecentCommonAncestorRes { } , nil
}
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
type HintMostRecentCommonAncestorStepCleanupMode struct { string }
var (
StepCleanupRangeSinceReplicationCursor = HintMostRecentCommonAncestorStepCleanupMode { "range-since-replication-cursor" }
StepCleanupRangeSinceUnbounded = HintMostRecentCommonAncestorStepCleanupMode { "range-since-unbounded" }
StepCleanupNoCleanup = HintMostRecentCommonAncestorStepCleanupMode { "no-cleanup" }
)
func ( m HintMostRecentCommonAncestorStepCleanupMode ) String ( ) string { return string ( m . string ) }
func ( m * HintMostRecentCommonAncestorStepCleanupMode ) Set ( s string ) error {
switch s {
case StepCleanupRangeSinceReplicationCursor . String ( ) :
* m = StepCleanupRangeSinceReplicationCursor
case StepCleanupRangeSinceUnbounded . String ( ) :
* m = StepCleanupRangeSinceUnbounded
case StepCleanupNoCleanup . String ( ) :
* m = StepCleanupNoCleanup
default :
return fmt . Errorf ( "unknown step cleanup mode %q" , s )
}
return nil
}
var senderHintMostRecentCommonAncestorStepCleanupMode = * envconst . Var ( "ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE" , & StepCleanupRangeSinceReplicationCursor ) . ( * HintMostRecentCommonAncestorStepCleanupMode )
2019-03-28 21:22:22 +01:00
var maxConcurrentZFSSendSemaphore = semaphore . New ( envconst . Int64 ( "ZREPL_ENDPOINT_MAX_CONCURRENT_SEND" , 10 ) )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
func uncheckedSendArgsFromPDU ( fsv * pdu . FilesystemVersion ) * zfs . ZFSSendArgVersion {
if fsv == nil {
return nil
}
return & zfs . ZFSSendArgVersion { RelName : fsv . GetRelName ( ) , GUID : fsv . Guid }
}
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
func sendArgsFromPDUAndValidateExistsAndGetVersion ( ctx context . Context , fs string , fsv * pdu . FilesystemVersion ) ( v zfs . FilesystemVersion , err error ) {
sendArgs := uncheckedSendArgsFromPDU ( fsv )
if sendArgs == nil {
return v , errors . New ( "must not be nil" )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
}
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
version , err := sendArgs . ValidateExistsAndGetVersion ( ctx , fs )
if err != nil {
return v , err
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
}
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
return version , nil
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
}
2020-04-10 21:58:28 +02:00
func ( s * Sender ) Send ( ctx context . Context , r * pdu . SendReq ) ( * pdu . SendRes , io . ReadCloser , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
2018-12-11 22:01:50 +01:00
_ , err := s . filterCheckFS ( r . Filesystem )
if err != nil {
return nil , nil , err
}
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
switch r . Encrypted {
case pdu . Tri_DontCare :
// use s.encrypt setting
// ok, fallthrough outer
case pdu . Tri_False :
if s . encrypt . B {
2020-02-23 23:24:12 +01:00
return nil , nil , errors . New ( "only encrypted sends allowed (send -w + encryption!= off), but unencrypted send requested" )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
}
// fallthrough outer
case pdu . Tri_True :
if ! s . encrypt . B {
return nil , nil , errors . New ( "only unencrypted sends allowed, but encrypted send requested" )
}
// fallthrough outer
default :
return nil , nil , fmt . Errorf ( "unknown pdu.Tri variant %q" , r . Encrypted )
}
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
sendArgsUnvalidated := zfs . ZFSSendArgsUnvalidated {
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
FS : r . Filesystem ,
From : uncheckedSendArgsFromPDU ( r . GetFrom ( ) ) , // validated by zfs.ZFSSendDry / zfs.ZFSSend
To : uncheckedSendArgsFromPDU ( r . GetTo ( ) ) , // validated by zfs.ZFSSendDry / zfs.ZFSSend
Encrypted : s . encrypt ,
ResumeToken : r . ResumeToken , // nil or not nil, depending on decoding success
}
2018-12-11 22:01:50 +01:00
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
sendArgs , err := sendArgsUnvalidated . Validate ( ctx )
if err != nil {
return nil , nil , errors . Wrap ( err , "validate send arguments" )
}
2019-03-28 21:22:22 +01:00
getLogger ( ctx ) . Debug ( "acquire concurrent send semaphore" )
// TODO use try-acquire and fail with resource-exhaustion rpc status
// => would require handling on the client-side
// => this is a dataconn endpoint, doesn't have the status code semantics of gRPC
guard , err := maxConcurrentZFSSendSemaphore . Acquire ( ctx )
if err != nil {
return nil , nil , err
}
defer guard . Release ( )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
si , err := zfs . ZFSSendDry ( ctx , sendArgs )
2018-06-20 20:20:37 +02:00
if err != nil {
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
return nil , nil , errors . Wrap ( err , "zfs send dry failed" )
2018-06-20 20:20:37 +02:00
}
2019-03-28 21:22:22 +01:00
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
// From now on, assume that sendArgs has been validated by ZFSSendDry
2020-02-23 23:24:12 +01:00
// (because validation involves shelling out, it's actually a little expensive)
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
2018-12-11 22:01:50 +01:00
var expSize int64 = 0 // protocol says 0 means no estimate
if si . SizeEstimate != - 1 { // but si returns -1 for no size estimate
expSize = si . SizeEstimate
}
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
res := & pdu . SendRes {
ExpectedSize : expSize ,
UsedResumeToken : r . ResumeToken != "" ,
}
2018-08-29 23:29:45 +02:00
2018-08-30 12:52:08 +02:00
if r . DryRun {
2018-12-11 22:01:50 +01:00
return res , nil , nil
}
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
// update replication cursor
if sendArgs . From != nil {
// For all but the first replication, this should always be a no-op because SendCompleted already moved the cursor
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
_ , err = MoveReplicationCursor ( ctx , sendArgs . FS , sendArgs . FromVersion , s . jobId )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
if err == zfs . ErrBookmarkCloningNotSupported {
getLogger ( ctx ) . Debug ( "not creating replication cursor from bookmark because ZFS does not support it" )
// fallthrough
} else if err != nil {
return nil , nil , errors . Wrap ( err , "cannot set replication cursor to `from` version before starting send" )
}
}
// make sure `From` doesn't go away in order to make this step resumable
if sendArgs . From != nil {
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
_ , err := HoldStep ( ctx , sendArgs . FS , * sendArgs . FromVersion , s . jobId )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
if err == zfs . ErrBookmarkCloningNotSupported {
getLogger ( ctx ) . Debug ( "not creating step bookmark because ZFS does not support it" )
// fallthrough
} else if err != nil {
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
return nil , nil , errors . Wrapf ( err , "cannot hold `from` version %q before starting send" , * sendArgs . FromVersion )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
}
}
// make sure `To` doesn't go away in order to make this step resumable
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
_ , err = HoldStep ( ctx , sendArgs . FS , sendArgs . ToVersion , s . jobId )
2018-12-11 22:01:50 +01:00
if err != nil {
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
return nil , nil , errors . Wrapf ( err , "cannot hold `to` version %q before starting send" , sendArgs . ToVersion )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
}
// step holds & replication cursor released / moved forward in s.SendCompleted => s.moveCursorAndReleaseSendHolds
2020-04-10 21:58:28 +02:00
sendStream , err := zfs . ZFSSend ( ctx , sendArgs )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
if err != nil {
return nil , nil , errors . Wrap ( err , "zfs send failed" )
2018-06-20 20:20:37 +02:00
}
2020-04-10 21:58:28 +02:00
return res , sendStream , nil
2018-06-20 20:20:37 +02:00
}
2017-12-26 21:37:48 +01:00
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
func ( p * Sender ) SendCompleted ( ctx context . Context , r * pdu . SendCompletedReq ) ( * pdu . SendCompletedRes , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
orig := r . GetOriginalReq ( ) // may be nil, always use proto getters
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
fsp , err := p . filterCheckFS ( orig . GetFilesystem ( ) )
if err != nil {
return nil , err
}
fs := fsp . ToString ( )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
var from * zfs . FilesystemVersion
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
if orig . GetFrom ( ) != nil {
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
f , err := sendArgsFromPDUAndValidateExistsAndGetVersion ( ctx , fs , orig . GetFrom ( ) ) // no shadow
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
if err != nil {
return nil , errors . Wrap ( err , "validate `from` exists" )
}
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
from = & f
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
}
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
to , err := sendArgsFromPDUAndValidateExistsAndGetVersion ( ctx , fs , orig . GetTo ( ) )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
if err != nil {
return nil , errors . Wrap ( err , "validate `to` exists" )
}
2020-04-11 15:49:41 +02:00
log := func ( ctx context . Context ) Logger {
log := getLogger ( ctx ) . WithField ( "to_guid" , to . Guid ) .
WithField ( "fs" , fs ) .
WithField ( "to" , to . RelName )
if from != nil {
log = log . WithField ( "from" , from . RelName ) . WithField ( "from_guid" , from . Guid )
}
return log
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
}
2020-04-11 15:49:41 +02:00
log ( ctx ) . Debug ( "move replication cursor to most recent common version" )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
destroyedCursors , err := MoveReplicationCursor ( ctx , fs , to , p . jobId )
if err != nil {
if err == zfs . ErrBookmarkCloningNotSupported {
2020-04-11 15:49:41 +02:00
log ( ctx ) . Debug ( "not setting replication cursor, bookmark cloning not supported" )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
} else {
msg := "cannot move replication cursor, keeping hold on `to` until successful"
2020-04-11 15:49:41 +02:00
log ( ctx ) . WithError ( err ) . Error ( msg )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
err = errors . Wrap ( err , msg )
// it is correct to not release the hold if we can't move the cursor!
return & pdu . SendCompletedRes { } , err
}
} else {
2020-04-11 15:49:41 +02:00
log ( ctx ) . Info ( "successfully moved replication cursor" )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
}
// kick off releasing of step holds / bookmarks
// if we fail to release them, don't bother the caller:
// they are merely an implementation detail on the sender for better resumability
var wg sync . WaitGroup
wg . Add ( 2 )
go func ( ) {
defer wg . Done ( )
2020-04-11 15:49:41 +02:00
ctx , endTask := trace . WithTask ( ctx , "release-step-hold-to" )
defer endTask ( )
log ( ctx ) . Debug ( "release step-hold of or step-bookmark on `to`" )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
err = ReleaseStep ( ctx , fs , to , p . jobId )
if err != nil {
2020-04-11 15:49:41 +02:00
log ( ctx ) . WithError ( err ) . Error ( "cannot release step-holds on or destroy step-bookmark of `to`" )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
} else {
2020-04-11 15:49:41 +02:00
log ( ctx ) . Info ( "successfully released step-holds on or destroyed step-bookmark of `to`" )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
}
} ( )
go func ( ) {
defer wg . Done ( )
2020-04-11 15:49:41 +02:00
ctx , endTask := trace . WithTask ( ctx , "release-step-hold-from" )
defer endTask ( )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
if from == nil {
return
}
2020-04-11 15:49:41 +02:00
log ( ctx ) . Debug ( "release step-hold of or step-bookmark on `from`" )
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
err := ReleaseStep ( ctx , fs , * from , p . jobId )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
if err != nil {
if dne , ok := err . ( * zfs . DatasetDoesNotExist ) ; ok {
// If bookmark cloning is not supported, `from` might be the old replication cursor
// and thus have already been destroyed by MoveReplicationCursor above
// In that case, nonexistence of `from` is not an error, otherwise it is.
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
for _ , c := range destroyedCursors {
if c . GetFullPath ( ) == dne . Path {
2020-04-11 15:49:41 +02:00
log ( ctx ) . Info ( "`from` was a replication cursor and has already been destroyed" )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
return
}
}
// fallthrough
}
2020-04-11 15:49:41 +02:00
log ( ctx ) . WithError ( err ) . Error ( "cannot release step-holds on or destroy step-bookmark of `from`" )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
} else {
2020-04-11 15:49:41 +02:00
log ( ctx ) . Info ( "successfully released step-holds on or destroyed step-bookmark of `from`" )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
}
} ( )
wg . Wait ( )
return & pdu . SendCompletedRes { } , nil
}
2018-08-30 11:51:47 +02:00
func ( p * Sender ) DestroySnapshots ( ctx context . Context , req * pdu . DestroySnapshotsReq ) ( * pdu . DestroySnapshotsRes , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
2018-09-06 04:49:26 +02:00
dp , err := p . filterCheckFS ( req . Filesystem )
2018-08-30 11:51:47 +02:00
if err != nil {
return nil , err
}
return doDestroySnapshots ( ctx , dp , req . Snapshots )
}
2019-03-11 13:46:36 +01:00
func ( p * Sender ) Ping ( ctx context . Context , req * pdu . PingReq ) ( * pdu . PingRes , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
2019-03-11 13:46:36 +01:00
res := pdu . PingRes {
Echo : req . GetMessage ( ) ,
}
return & res , nil
}
func ( p * Sender ) PingDataconn ( ctx context . Context , req * pdu . PingReq ) ( * pdu . PingRes , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
2019-03-11 13:46:36 +01:00
return p . Ping ( ctx , req )
}
2019-03-13 18:33:20 +01:00
func ( p * Sender ) WaitForConnectivity ( ctx context . Context ) error {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
2019-03-11 13:46:36 +01:00
return nil
}
2018-09-06 03:24:15 +02:00
func ( p * Sender ) ReplicationCursor ( ctx context . Context , req * pdu . ReplicationCursorReq ) ( * pdu . ReplicationCursorRes , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
2018-09-06 04:49:26 +02:00
dp , err := p . filterCheckFS ( req . Filesystem )
2018-08-30 11:51:47 +02:00
if err != nil {
return nil , err
}
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
cursor , err := GetMostRecentReplicationCursorOfJob ( ctx , dp . ToString ( ) , p . jobId )
if err != nil {
return nil , err
2018-08-30 11:51:47 +02:00
}
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
if cursor == nil {
return & pdu . ReplicationCursorRes { Result : & pdu . ReplicationCursorRes_Notexist { Notexist : true } } , nil
}
return & pdu . ReplicationCursorRes { Result : & pdu . ReplicationCursorRes_Guid { Guid : cursor . Guid } } , nil
2018-08-30 11:51:47 +02:00
}
2020-04-10 21:58:28 +02:00
func ( p * Sender ) Receive ( ctx context . Context , r * pdu . ReceiveReq , _ io . ReadCloser ) ( * pdu . ReceiveRes , error ) {
2018-12-11 22:01:50 +01:00
return nil , fmt . Errorf ( "sender does not implement Receive()" )
}
2018-09-06 04:49:26 +02:00
type FSFilter interface { // FIXME unused
2018-08-22 00:43:58 +02:00
Filter ( path * zfs . DatasetPath ) ( pass bool , err error )
}
// FIXME: can we get away without error types here?
2018-09-06 04:49:26 +02:00
type FSMap interface { // FIXME unused
2018-08-22 00:43:58 +02:00
FSFilter
2018-08-25 21:30:25 +02:00
Map ( path * zfs . DatasetPath ) ( * zfs . DatasetPath , error )
Invert ( ) ( FSMap , error )
AsFilter ( ) FSFilter
2018-08-22 00:43:58 +02:00
}
2017-05-20 17:08:18 +02:00
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
type ReceiverConfig struct {
JobID JobID
RootWithoutClientComponent * zfs . DatasetPath // TODO use
AppendClientIdentity bool
UpdateLastReceivedHold bool
}
func ( c * ReceiverConfig ) copyIn ( ) {
c . RootWithoutClientComponent = c . RootWithoutClientComponent . Copy ( )
}
func ( c * ReceiverConfig ) Validate ( ) error {
c . JobID . MustValidate ( )
if c . RootWithoutClientComponent . Length ( ) <= 0 {
return errors . New ( "RootWithoutClientComponent must not be an empty dataset path" )
}
return nil
}
2018-08-22 00:52:46 +02:00
// Receiver implements replication.ReplicationEndpoint for a receiving side
type Receiver struct {
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
conf ReceiverConfig // validated
2019-03-20 18:52:40 +01:00
recvParentCreationMtx * chainlock . L
2018-09-06 04:49:26 +02:00
}
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
func NewReceiver ( config ReceiverConfig ) * Receiver {
config . copyIn ( )
if err := config . Validate ( ) ; err != nil {
panic ( err )
2018-09-06 04:49:26 +02:00
}
2019-03-20 18:52:40 +01:00
return & Receiver {
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
conf : config ,
recvParentCreationMtx : chainlock . New ( ) ,
2019-03-20 18:52:40 +01:00
}
2018-12-11 22:01:50 +01:00
}
func TestClientIdentity ( rootFS * zfs . DatasetPath , clientIdentity string ) error {
_ , err := clientRoot ( rootFS , clientIdentity )
return err
}
func clientRoot ( rootFS * zfs . DatasetPath , clientIdentity string ) ( * zfs . DatasetPath , error ) {
rootFSLen := rootFS . Length ( )
clientRootStr := path . Join ( rootFS . ToString ( ) , clientIdentity )
clientRoot , err := zfs . NewDatasetPath ( clientRootStr )
if err != nil {
return nil , err
}
if rootFSLen + 1 != clientRoot . Length ( ) {
return nil , fmt . Errorf ( "client identity must be a single ZFS filesystem path component" )
}
return clientRoot , nil
}
func ( s * Receiver ) clientRootFromCtx ( ctx context . Context ) * zfs . DatasetPath {
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
if ! s . conf . AppendClientIdentity {
return s . conf . RootWithoutClientComponent . Copy ( )
2018-12-11 22:01:50 +01:00
}
clientIdentity , ok := ctx . Value ( ClientIdentityKey ) . ( string )
if ! ok {
panic ( fmt . Sprintf ( "ClientIdentityKey context value must be set" ) )
}
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
clientRoot , err := clientRoot ( s . conf . RootWithoutClientComponent , clientIdentity )
2018-12-11 22:01:50 +01:00
if err != nil {
panic ( fmt . Sprintf ( "ClientIdentityContextKey must have been validated before invoking Receiver: %s" , err ) )
}
return clientRoot
2018-06-20 20:20:37 +02:00
}
2017-05-20 17:08:18 +02:00
2018-09-06 04:49:26 +02:00
type subroot struct {
localRoot * zfs . DatasetPath
}
var _ zfs . DatasetFilter = subroot { }
// Filters local p
func ( f subroot ) Filter ( p * zfs . DatasetPath ) ( pass bool , err error ) {
return p . HasPrefix ( f . localRoot ) && ! p . Equal ( f . localRoot ) , nil
}
func ( f subroot ) MapToLocal ( fs string ) ( * zfs . DatasetPath , error ) {
p , err := zfs . NewDatasetPath ( fs )
2018-06-20 20:20:37 +02:00
if err != nil {
return nil , err
}
2018-09-06 04:49:26 +02:00
if p . Length ( ) == 0 {
return nil , errors . Errorf ( "cannot map empty filesystem" )
}
c := f . localRoot . Copy ( )
c . Extend ( p )
return c , nil
2018-06-20 20:20:37 +02:00
}
2017-05-20 17:08:18 +02:00
2018-12-11 22:01:50 +01:00
func ( s * Receiver ) ListFilesystems ( ctx context . Context , req * pdu . ListFilesystemReq ) ( * pdu . ListFilesystemRes , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
2018-12-11 22:01:50 +01:00
root := s . clientRootFromCtx ( ctx )
filtered , err := zfs . ZFSListMapping ( ctx , subroot { root } )
2018-06-20 20:20:37 +02:00
if err != nil {
2018-09-06 04:49:26 +02:00
return nil , err
2018-06-20 20:20:37 +02:00
}
2019-03-13 18:33:20 +01:00
// present filesystem without the root_fs prefix
2018-09-06 04:49:26 +02:00
fss := make ( [ ] * pdu . Filesystem , 0 , len ( filtered ) )
for _ , a := range filtered {
2019-03-19 17:43:28 +01:00
l := getLogger ( ctx ) . WithField ( "fs" , a )
2020-03-27 00:57:33 +01:00
ph , err := zfs . ZFSGetFilesystemPlaceholderState ( ctx , a )
2018-06-20 20:20:37 +02:00
if err != nil {
2019-03-19 17:43:28 +01:00
l . WithError ( err ) . Error ( "error getting placeholder state" )
return nil , errors . Wrapf ( err , "cannot get placeholder state for fs %q" , a )
}
l . WithField ( "placeholder_state" , fmt . Sprintf ( "%#v" , ph ) ) . Debug ( "placeholder state" )
if ! ph . FSExists {
l . Error ( "inconsistent placeholder state: filesystem must exists" )
err := errors . Errorf ( "inconsistent placeholder state: filesystem %q must exist in this context" , a . ToString ( ) )
return nil , err
2018-09-06 04:49:26 +02:00
}
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
token , err := zfs . ZFSGetReceiveResumeTokenOrEmptyStringIfNotSupported ( ctx , a )
if err != nil {
l . WithError ( err ) . Error ( "cannot get receive resume token" )
return nil , err
}
encEnabled , err := zfs . ZFSGetEncryptionEnabled ( ctx , a . ToString ( ) )
if err != nil {
l . WithError ( err ) . Error ( "cannot get encryption enabled status" )
return nil , err
}
l . WithField ( "receive_resume_token" , token ) . Debug ( "receive resume token" )
2018-12-11 22:01:50 +01:00
a . TrimPrefix ( root )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
fs := & pdu . Filesystem {
Path : a . ToString ( ) ,
IsPlaceholder : ph . IsPlaceholder ,
ResumeToken : token ,
IsEncrypted : encEnabled ,
}
fss = append ( fss , fs )
2018-06-20 20:20:37 +02:00
}
2018-12-11 22:01:50 +01:00
if len ( fss ) == 0 {
2019-03-13 18:33:20 +01:00
getLogger ( ctx ) . Debug ( "no filesystems found" )
return & pdu . ListFilesystemRes { } , nil
2018-12-11 22:01:50 +01:00
}
return & pdu . ListFilesystemRes { Filesystems : fss } , nil
2018-06-20 20:20:37 +02:00
}
2017-12-26 21:37:48 +01:00
2018-12-11 22:01:50 +01:00
func ( s * Receiver ) ListFilesystemVersions ( ctx context . Context , req * pdu . ListFilesystemVersionsReq ) ( * pdu . ListFilesystemVersionsRes , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
2018-12-11 22:01:50 +01:00
root := s . clientRootFromCtx ( ctx )
lp , err := subroot { root } . MapToLocal ( req . GetFilesystem ( ) )
2018-06-20 20:20:37 +02:00
if err != nil {
return nil , err
}
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
// TODO share following code with sender
2017-05-20 17:08:18 +02:00
2020-04-15 16:11:16 +02:00
fsvs , err := zfs . ZFSListFilesystemVersions ( ctx , lp , zfs . ListFilesystemVersionsOptions { } )
2018-06-20 20:20:37 +02:00
if err != nil {
return nil , err
}
2017-12-26 21:37:48 +01:00
2018-08-16 21:05:21 +02:00
rfsvs := make ( [ ] * pdu . FilesystemVersion , len ( fsvs ) )
2018-06-20 20:20:37 +02:00
for i := range fsvs {
2018-08-30 11:51:47 +02:00
rfsvs [ i ] = pdu . FilesystemVersionFromZFS ( & fsvs [ i ] )
2018-06-20 20:20:37 +02:00
}
2017-07-30 14:56:16 +02:00
2018-12-11 22:01:50 +01:00
return & pdu . ListFilesystemVersionsRes { Versions : rfsvs } , nil
2018-06-20 20:20:37 +02:00
}
2017-05-20 17:08:18 +02:00
2019-03-11 13:46:36 +01:00
func ( s * Receiver ) Ping ( ctx context . Context , req * pdu . PingReq ) ( * pdu . PingRes , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
2019-03-11 13:46:36 +01:00
res := pdu . PingRes {
Echo : req . GetMessage ( ) ,
}
return & res , nil
}
func ( s * Receiver ) PingDataconn ( ctx context . Context , req * pdu . PingReq ) ( * pdu . PingRes , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
2019-03-11 13:46:36 +01:00
return s . Ping ( ctx , req )
}
2019-03-13 18:33:20 +01:00
func ( s * Receiver ) WaitForConnectivity ( ctx context . Context ) error {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
2019-03-11 13:46:36 +01:00
return nil
}
2020-04-11 15:49:41 +02:00
func ( s * Receiver ) ReplicationCursor ( ctx context . Context , _ * pdu . ReplicationCursorReq ) ( * pdu . ReplicationCursorRes , error ) {
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
2018-12-11 22:01:50 +01:00
return nil , fmt . Errorf ( "ReplicationCursor not implemented for Receiver" )
}
2018-07-08 23:31:46 +02:00
2020-04-10 21:58:28 +02:00
func ( s * Receiver ) Send ( ctx context . Context , req * pdu . SendReq ) ( * pdu . SendRes , io . ReadCloser , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
2018-12-11 22:01:50 +01:00
return nil , nil , fmt . Errorf ( "receiver does not implement Send()" )
}
2017-05-20 17:08:18 +02:00
2019-03-28 21:22:22 +01:00
var maxConcurrentZFSRecvSemaphore = semaphore . New ( envconst . Int64 ( "ZREPL_ENDPOINT_MAX_CONCURRENT_RECV" , 10 ) )
2020-04-10 21:58:28 +02:00
func ( s * Receiver ) Receive ( ctx context . Context , req * pdu . ReceiveReq , receive io . ReadCloser ) ( * pdu . ReceiveRes , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
2018-08-31 16:26:11 +02:00
getLogger ( ctx ) . Debug ( "incoming Receive" )
2018-12-11 22:01:50 +01:00
defer receive . Close ( )
root := s . clientRootFromCtx ( ctx )
lp , err := subroot { root } . MapToLocal ( req . Filesystem )
if err != nil {
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
return nil , errors . Wrap ( err , "`Filesystem` invalid" )
}
to := uncheckedSendArgsFromPDU ( req . GetTo ( ) )
if to == nil {
return nil , errors . New ( "`To` must not be nil" )
}
if ! to . IsSnapshot ( ) {
return nil , errors . New ( "`To` must be a snapshot" )
2018-12-11 22:01:50 +01:00
}
2018-08-31 16:26:11 +02:00
2018-06-20 20:20:37 +02:00
// create placeholder parent filesystems as appropriate
2019-03-20 18:52:40 +01:00
//
// Manipulating the ZFS dataset hierarchy must happen exclusively.
// TODO: Use fine-grained locking to allow separate clients / requests to pass
// through the following section concurrently when operating on disjoint
// ZFS dataset hierarchy subtrees.
2018-06-20 20:20:37 +02:00
var visitErr error
2019-03-20 18:52:40 +01:00
func ( ) {
2020-02-23 23:24:12 +01:00
getLogger ( ctx ) . Debug ( "begin acquire recvParentCreationMtx" )
2019-03-20 18:52:40 +01:00
defer s . recvParentCreationMtx . Lock ( ) . Unlock ( )
2020-02-23 23:24:12 +01:00
getLogger ( ctx ) . Debug ( "end acquire recvParentCreationMtx" )
2019-03-20 18:52:40 +01:00
defer getLogger ( ctx ) . Debug ( "release recvParentCreationMtx" )
f := zfs . NewDatasetPathForest ( )
f . Add ( lp )
getLogger ( ctx ) . Debug ( "begin tree-walk" )
f . WalkTopDown ( func ( v zfs . DatasetPathVisit ) ( visitChildTree bool ) {
if v . Path . Equal ( lp ) {
return false
}
2020-03-27 00:57:33 +01:00
ph , err := zfs . ZFSGetFilesystemPlaceholderState ( ctx , v . Path )
2019-03-20 18:52:40 +01:00
getLogger ( ctx ) .
WithField ( "fs" , v . Path . ToString ( ) ) .
WithField ( "placeholder_state" , fmt . Sprintf ( "%#v" , ph ) ) .
WithField ( "err" , fmt . Sprintf ( "%s" , err ) ) .
WithField ( "errType" , fmt . Sprintf ( "%T" , err ) ) .
Debug ( "placeholder state for filesystem" )
2019-03-19 17:43:28 +01:00
if err != nil {
2018-06-20 20:20:37 +02:00
visitErr = err
2017-12-26 21:37:48 +01:00
return false
2017-05-20 17:08:18 +02:00
}
2017-07-08 13:13:16 +02:00
2019-03-20 18:52:40 +01:00
if ! ph . FSExists {
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
if s . conf . RootWithoutClientComponent . HasPrefix ( v . Path ) {
2019-09-07 20:01:15 +02:00
if v . Path . Length ( ) == 1 {
visitErr = fmt . Errorf ( "pool %q not imported" , v . Path . ToString ( ) )
} else {
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
visitErr = fmt . Errorf ( "root_fs %q does not exist" , s . conf . RootWithoutClientComponent . ToString ( ) )
2019-09-07 20:01:15 +02:00
}
getLogger ( ctx ) . WithError ( visitErr ) . Error ( "placeholders are only created automatically below root_fs" )
return false
}
2019-03-20 18:52:40 +01:00
l := getLogger ( ctx ) . WithField ( "placeholder_fs" , v . Path )
l . Debug ( "create placeholder filesystem" )
2020-03-27 00:57:33 +01:00
err := zfs . ZFSCreatePlaceholderFilesystem ( ctx , v . Path )
2019-03-20 18:52:40 +01:00
if err != nil {
l . WithError ( err ) . Error ( "cannot create placeholder filesystem" )
visitErr = err
return false
}
return true
}
getLogger ( ctx ) . WithField ( "filesystem" , v . Path . ToString ( ) ) . Debug ( "exists" )
return true // leave this fs as is
} )
} ( )
getLogger ( ctx ) . WithField ( "visitErr" , visitErr ) . Debug ( "complete tree-walk" )
2018-06-20 20:20:37 +02:00
if visitErr != nil {
2019-03-16 14:46:07 +01:00
return nil , visitErr
2018-06-20 20:20:37 +02:00
}
2017-05-20 17:08:18 +02:00
2020-04-10 22:55:45 +02:00
log := getLogger ( ctx ) . WithField ( "proto_fs" , req . GetFilesystem ( ) ) . WithField ( "local_fs" , lp . ToString ( ) )
2019-03-19 17:43:28 +01:00
// determine whether we need to rollback the filesystem / change its placeholder state
2019-03-13 18:33:20 +01:00
var clearPlaceholderProperty bool
var recvOpts zfs . RecvOptions
2020-03-27 00:57:33 +01:00
ph , err := zfs . ZFSGetFilesystemPlaceholderState ( ctx , lp )
2020-04-10 22:55:45 +02:00
if err != nil {
return nil , errors . Wrap ( err , "cannot get placeholder state" )
}
log . WithField ( "placeholder_state" , fmt . Sprintf ( "%#v" , ph ) ) . Debug ( "placeholder state" )
if ph . FSExists && ph . IsPlaceholder {
2019-03-19 17:43:28 +01:00
recvOpts . RollbackAndForceRecv = true
clearPlaceholderProperty = true
2018-06-20 20:20:37 +02:00
}
2020-04-10 22:55:45 +02:00
2019-03-13 18:33:20 +01:00
if clearPlaceholderProperty {
2020-04-10 22:55:45 +02:00
log . Info ( "clearing placeholder property" )
2020-03-27 00:57:33 +01:00
if err := zfs . ZFSSetPlaceholder ( ctx , lp , false ) ; err != nil {
2019-03-13 18:33:20 +01:00
return nil , fmt . Errorf ( "cannot clear placeholder property for forced receive: %s" , err )
}
2018-06-20 20:20:37 +02:00
}
2017-05-20 17:08:18 +02:00
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
if req . ClearResumeToken && ph . FSExists {
2020-04-10 22:55:45 +02:00
log . Info ( "clearing resume token" )
2020-03-27 00:57:33 +01:00
if err := zfs . ZFSRecvClearResumeToken ( ctx , lp . ToString ( ) ) ; err != nil {
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
return nil , errors . Wrap ( err , "cannot clear resume token" )
}
}
recvOpts . SavePartialRecvState , err = zfs . ResumeRecvSupported ( ctx , lp )
if err != nil {
return nil , errors . Wrap ( err , "cannot determine whether we can use resumable send & recv" )
}
2020-04-10 22:55:45 +02:00
log . Debug ( "acquire concurrent recv semaphore" )
2019-03-28 21:22:22 +01:00
// TODO use try-acquire and fail with resource-exhaustion rpc status
// => would require handling on the client-side
// => this is a dataconn endpoint, doesn't have the status code semantics of gRPC
guard , err := maxConcurrentZFSRecvSemaphore . Acquire ( ctx )
if err != nil {
return nil , err
}
defer guard . Release ( )
2020-04-10 23:24:14 +02:00
var peek bytes . Buffer
var MaxPeek = envconst . Int64 ( "ZREPL_ENDPOINT_RECV_PEEK_SIZE" , 1 << 20 )
log . WithField ( "max_peek_bytes" , MaxPeek ) . Info ( "peeking incoming stream" )
if _ , err := io . Copy ( & peek , io . LimitReader ( receive , MaxPeek ) ) ; err != nil {
log . WithError ( err ) . Error ( "cannot read peek-buffer from send stream" )
}
var peekCopy bytes . Buffer
if n , err := peekCopy . Write ( peek . Bytes ( ) ) ; err != nil || n != peek . Len ( ) {
panic ( peek . Len ( ) )
}
2020-04-10 22:55:45 +02:00
log . WithField ( "opts" , fmt . Sprintf ( "%#v" , recvOpts ) ) . Debug ( "start receive command" )
2017-05-20 17:08:18 +02:00
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
snapFullPath := to . FullPath ( lp . ToString ( ) )
2020-04-10 23:24:14 +02:00
if err := zfs . ZFSRecv ( ctx , lp . ToString ( ) , to , chainedio . NewChainedReader ( & peek , receive ) , recvOpts ) ; err != nil {
2020-04-10 22:55:45 +02:00
// best-effort rollback of placeholder state if the recv didn't start
_ , resumableStatePresent := err . ( * zfs . RecvFailedWithResumeTokenErr )
disablePlaceholderRestoration := envconst . Bool ( "ZREPL_ENDPOINT_DISABLE_PLACEHOLDER_RESTORATION" , false )
2020-04-10 23:24:14 +02:00
placeholderRestored := ! ph . IsPlaceholder
2020-04-10 22:55:45 +02:00
if ! disablePlaceholderRestoration && ! resumableStatePresent && recvOpts . RollbackAndForceRecv && ph . FSExists && ph . IsPlaceholder && clearPlaceholderProperty {
log . Info ( "restoring placeholder property" )
if phErr := zfs . ZFSSetPlaceholder ( ctx , lp , true ) ; phErr != nil {
log . WithError ( phErr ) . Error ( "cannot restore placeholder property after failed receive, subsequent replications will likely fail with a different error" )
// fallthrough
2020-04-10 23:24:14 +02:00
} else {
placeholderRestored = true
2020-04-10 22:55:45 +02:00
}
// fallthrough
}
2020-04-10 23:24:14 +02:00
// deal with failing initial encrypted send & recv
if _ , ok := err . ( * zfs . RecvDestroyOrOverwriteEncryptedErr ) ; ok && ph . IsPlaceholder && placeholderRestored {
msg := ` cannot automatically replace placeholder filesystem with incoming send stream - please see receive-side log for details `
err := errors . New ( msg )
log . Error ( msg )
log . Error ( ` zrepl creates placeholder filesystems on the receiving side of a replication to match the sending side's dataset hierarchy ` )
log . Error ( ` zrepl uses zfs receive -F to replace those placeholders with incoming full sends ` )
log . Error ( ` OpenZFS native encryption prohibits zfs receive -F for encrypted filesystems ` )
log . Error ( ` the current zrepl placeholder filesystem concept is thus incompatible with OpenZFS native encryption ` )
tempStartFullRecvFS := lp . Copy ( ) . ToString ( ) + ".zrepl.initial-recv"
tempStartFullRecvFSDP , dpErr := zfs . NewDatasetPath ( tempStartFullRecvFS )
if dpErr != nil {
log . WithError ( dpErr ) . Error ( "cannot determine temporary filesystem name for initial encrypted recv workaround" )
return nil , err // yes, err, not dpErr
}
log := log . WithField ( "temp_recv_fs" , tempStartFullRecvFS )
log . Error ( ` as a workaround, zrepl will now attempt to re-receive the beginning of the stream into a temporary filesystem temp_recv_fs ` )
log . Error ( ` if that step succeeds: shut down zrepl and use 'zfs rename' to swap temp_recv_fs with local_fs, then restart zrepl ` )
log . Error ( ` replication will then resume using resumable send+recv ` )
tempPH , phErr := zfs . ZFSGetFilesystemPlaceholderState ( ctx , tempStartFullRecvFSDP )
if phErr != nil {
log . WithError ( phErr ) . Error ( "cannot determine placeholder state of temp_recv_fs" )
return nil , err // yes, err, not dpErr
}
if tempPH . FSExists {
log . Error ( "temp_recv_fs already exists, assuming a (partial) initial recv to that filesystem has already been done" )
return nil , err
}
recvOpts . RollbackAndForceRecv = false
recvOpts . SavePartialRecvState = true
rerecvErr := zfs . ZFSRecv ( ctx , tempStartFullRecvFS , to , chainedio . NewChainedReader ( & peekCopy ) , recvOpts )
if _ , isResumable := rerecvErr . ( * zfs . RecvFailedWithResumeTokenErr ) ; rerecvErr == nil || isResumable {
log . Error ( "completed re-receive into temporary filesystem temp_recv_fs, now shut down zrepl and use zfs rename to swap temp_recv_fs with local_fs" )
} else {
log . WithError ( rerecvErr ) . Error ( "failed to receive the beginning of the stream into temporary filesystem temp_recv_fs" )
log . Error ( "we advise you to collect the error log and current configuration, open an issue on GitHub, and revert to your previous configuration in the meantime" )
}
log . Error ( ` if you would like to see improvements to this situation, please open an issue on GitHub ` )
return nil , err
}
log .
WithError ( err ) .
WithField ( "opts" , fmt . Sprintf ( "%#v" , recvOpts ) ) .
Error ( "zfs receive failed" )
2018-12-11 22:01:50 +01:00
return nil , err
2018-06-20 20:20:37 +02:00
}
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
// validate that we actually received what the sender claimed
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
toRecvd , err := to . ValidateExistsAndGetVersion ( ctx , lp . ToString ( ) )
if err != nil {
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
msg := "receive request's `To` version does not match what we received in the stream"
2020-04-10 22:55:45 +02:00
log . WithError ( err ) . WithField ( "snap" , snapFullPath ) . Error ( msg )
log . Error ( "aborting recv request, but keeping received snapshot for inspection" )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
return nil , errors . Wrap ( err , msg )
}
if s . conf . UpdateLastReceivedHold {
2020-04-10 22:55:45 +02:00
log . Debug ( "move last-received-hold" )
endpoint: refactor, fix stale holds on initial replication failure, zfs-abstractions subcmd, more efficient ZFS queries
The motivation for this recatoring are based on two independent issues:
- @JMoVS found that the changes merged as part of #259 slowed his OS X
based installation down significantly.
Analysis of the zfs command logging introduced in #296 showed that
`zfs holds` took most of the execution time, and they pointed out
that not all of those `zfs holds` invocations were actually necessary.
I.e.: zrepl was inefficient about retrieving information from ZFS.
- @InsanePrawn found that failures on initial replication would lead
to step holds accumulating on the sending side, i.e. they would never
be cleaned up in the HintMostRecentCommonAncestor RPC handler.
That was because we only sent that RPC if there was a most recent
common ancestor detected during replication planning.
@InsanePrawn prototyped an implementation of a `zrepl zfs-abstractions release`
command to mitigate the situation.
As part of that development work and back-and-forth with @problame,
it became evident that the abstractions that #259 built on top of
zfs in package endpoint (step holds, replication cursor,
last-received-hold), were not well-represented for re-use in the
`zrepl zfs-abstractions release` subocommand prototype.
This commit refactors package endpoint to address both of these issues:
- endpoint abstractions now share an interface `Abstraction` that, among
other things, provides a uniform `Destroy()` method.
However, that method should not be destroyed directly but instead
the package-level `BatchDestroy` function should be used in order
to allow for a migration to zfs channel programs in the future.
- endpoint now has a query facitilty (`ListAbstractions`) which is
used to find on-disk
- step holds and bookmarks
- replication cursors (v1, v2)
- last-received-holds
By describing the query in a struct, we can centralized the retrieval
of information via the ZFS CLI and only have to be clever once.
We are "clever" in the following ways:
- When asking for hold-based abstractions, we only run `zfs holds` on
snapshot that have `userrefs` > 0
- To support this functionality, add field `UserRefs` to zfs.FilesystemVersion
and retrieve it anywhere we retrieve zfs.FilesystemVersion from ZFS.
- When asking only for bookmark-based abstractions, we only run
`zfs list -t bookmark`, not with snapshots.
- Currently unused (except for CLI) per-filesystem concurrent lookup
- Option to only include abstractions with CreateTXG in a specified range
- refactor `endpoint`'s various ZFS info retrieval methods to use
`ListAbstractions`
- rename the `zrepl holds list` command to `zrepl zfs-abstractions list`
- make `zrepl zfs-abstractions list` consume endpoint.ListAbstractions
- Add a `ListStale` method which, given a query template,
lists stale holds and bookmarks.
- it uses replication cursor has different modes
- the new `zrepl zfs-abstractions release-{all,stale}` commands can be used
to remove abstractions of package endpoint
- Adjust HintMostRecentCommonAncestor RPC for stale-holds cleanup:
- send it also if no most recent common ancestor exists between sender and receiver
- have the sender clean up its abstractions when it receives the RPC
with no most recent common ancestor, using `ListStale`
- Due to changed semantics, bump the protocol version.
- Adjust HintMostRecentCommonAncestor RPC for performance problems
encountered by @JMoVS
- by default, per (job,fs)-combination, only consider cleaning
step holds in the createtxg range
`[last replication cursor,conservatively-estimated-receive-side-version)`
- this behavior ensures resumability at cost proportional to the
time that replication was donw
- however, as explained in a comment, we might leak holds if
the zrepl daemon stops running
- that trade-off is acceptable because in the presumably rare
this might happen the user has two tools at their hand:
- Tool 1: run `zrepl zfs-abstractions release-stale`
- Tool 2: use env var `ZREPL_ENDPOINT_SENDER_HINT_MOST_RECENT_STEP_HOLD_CLEANUP_MODE`
to adjust the lower bound of the createtxg range (search for it in the code).
The env var can also be used to disable hold-cleanup on the
send-side entirely.
supersedes closes #293
supersedes closes #282
fixes #280
fixes #278
Additionaly, we fixed a couple of bugs:
- zfs: fix half-nil error reporting of dataset-does-not-exist for ZFSListChan and ZFSBookmark
- endpoint: Sender's `HintMostRecentCommonAncestor` handler would not
check whether access to the specified filesystem was allowed.
2020-03-26 23:43:17 +01:00
if err := MoveLastReceivedHold ( ctx , lp . ToString ( ) , toRecvd , s . conf . JobID ) ; err != nil {
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
return nil , errors . Wrap ( err , "cannot move last-received-hold" )
}
}
2018-12-11 22:01:50 +01:00
return & pdu . ReceiveRes { } , nil
2018-06-20 20:20:37 +02:00
}
2017-05-20 17:08:18 +02:00
2018-12-11 22:01:50 +01:00
func ( s * Receiver ) DestroySnapshots ( ctx context . Context , req * pdu . DestroySnapshotsReq ) ( * pdu . DestroySnapshotsRes , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
2018-12-11 22:01:50 +01:00
root := s . clientRootFromCtx ( ctx )
lp , err := subroot { root } . MapToLocal ( req . Filesystem )
2018-08-30 11:51:47 +02:00
if err != nil {
return nil , err
}
return doDestroySnapshots ( ctx , lp , req . Snapshots )
}
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
func ( p * Receiver ) HintMostRecentCommonAncestor ( ctx context . Context , r * pdu . HintMostRecentCommonAncestorReq ) ( * pdu . HintMostRecentCommonAncestorRes , error ) {
2020-04-11 15:49:41 +02:00
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
// we don't move last-received-hold as part of this hint
// because that wouldn't give us any benefit wrt resumability.
//
// Other reason: the replication logic that issues this RPC would require refactoring
// to include the receiver's FilesystemVersion in the request)
return & pdu . HintMostRecentCommonAncestorRes { } , nil
}
2020-04-11 15:49:41 +02:00
func ( p * Receiver ) SendCompleted ( ctx context . Context , _ * pdu . SendCompletedReq ) ( * pdu . SendCompletedRes , error ) {
defer trace . WithSpanFromStackUpdateCtx ( & ctx ) ( )
new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
The counterpart to the replication cursor bookmark on the send-side.
Ensures that incremental replication will always be possible between a sender and receiver.
Design Doc
----------
`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.
The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.
Protocol Version Bump
---------------------
This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.
Replication Cursor Format Change
--------------------------------
The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.
Changes in This Commit
----------------------
- package zfs
- infrastructure for holds
- infrastructure for resume token decoding
- implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
- ZFSSendArgs to specify a ZFS send operation
- validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
- RecvOptions support for `recv -s` flag
- convert a bunch of ZFS operations to be idempotent
- achieved through more differentiated error message scraping / additional pre-/post-checks
- package replication/pdu
- add field for encryption to send request messages
- add fields for resume handling to send & recv request messages
- receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
- can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
- used to set `last-received-hold`
- package replication/logic
- introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
- integrate encryption and resume token support into `Step` struct
- package endpoint
- move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
- step-holds + step-bookmarks
- last-received-hold
- new replication cursor + old replication cursor compat code
- adjust `endpoint/endpoint.go` handlers for
- encryption
- resumability
- new replication cursor
- last-received-hold
- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2019-09-11 17:19:17 +02:00
return & pdu . SendCompletedRes { } , nil
}
2018-08-30 11:51:47 +02:00
func doDestroySnapshots ( ctx context . Context , lp * zfs . DatasetPath , snaps [ ] * pdu . FilesystemVersion ) ( * pdu . DestroySnapshotsRes , error ) {
2019-08-12 01:19:08 +02:00
reqs := make ( [ ] * zfs . DestroySnapOp , len ( snaps ) )
ress := make ( [ ] * pdu . DestroySnapshotRes , len ( snaps ) )
errs := make ( [ ] error , len ( snaps ) )
2018-08-30 11:51:47 +02:00
for i , fsv := range snaps {
if fsv . Type != pdu . FilesystemVersion_Snapshot {
return nil , fmt . Errorf ( "version %q is not a snapshot" , fsv . Name )
}
2019-08-12 01:19:08 +02:00
ress [ i ] = & pdu . DestroySnapshotRes {
Snapshot : fsv ,
// Error set after batch operation
2018-08-30 11:51:47 +02:00
}
2019-08-12 01:19:08 +02:00
reqs [ i ] = & zfs . DestroySnapOp {
Filesystem : lp . ToString ( ) ,
Name : fsv . Name ,
ErrOut : & errs [ i ] ,
2018-08-30 11:51:47 +02:00
}
2019-08-12 01:19:08 +02:00
}
2020-04-11 15:49:41 +02:00
zfs . ZFSDestroyFilesystemVersions ( ctx , reqs )
2019-08-12 01:19:08 +02:00
for i := range reqs {
if errs [ i ] != nil {
if de , ok := errs [ i ] . ( * zfs . DestroySnapshotsError ) ; ok && len ( de . Reason ) == 1 {
ress [ i ] . Error = de . Reason [ 0 ]
} else {
ress [ i ] . Error = errs [ i ] . Error ( )
}
2018-08-30 11:51:47 +02:00
}
}
2019-08-12 01:19:08 +02:00
return & pdu . DestroySnapshotsRes {
Results : ress ,
} , nil
2018-08-30 11:51:47 +02:00
}