Percona XtraBackup moved to https://jira.percona.com/projects/PXB

Code review comment for lp://staging/~gl-az/percona-xtrabackup/BT-23557-2.1-encrypted_stream

BT-23557-2.1-encrypted_stream
Merge into 2.1

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2013-01-23:

Hi George,

Looks great in general. A bunch of minor comments and questions below:

   - I don't like that we use xb_a() (i.e. release build assertions)
     unnecessarily in some cases. The rule of thumb followed in the
     InnoDB code is: assertions that verify code correctness should be
     debug-only, and assertions that verify data correctness should be
     release ones. Some of xb_a() assertions in the existing code also
     verify code correctness, but that's only because we had neither
     debug builds nor Jenkins at the time they were implemented.

     For example, archive_write() checks that archive_ctxt->dest_file is
     not NULL with a release assertion. And then the same check is in
     the callback function called by archive_write_date(). That wouldn't
     be a concern for debug builds, but do we really want those checks
     in release ones?

- many lines have trailing whitespace. see the output of:

: bzr diff -c-1 | egrep "^\+.*[[:space:]]+$"

- a couple of error messages use '\r' instead of '\n'

   - I wonder if we can avoid passing encrypt_chunk_size /
     compress_chunk_size from innobackupex to xtrabackup if they were
     not specified on the innobackupex command line, i.e. if the default
     values should be used by xtrabackup. Nothing is wrong
     with passing them unconditionally, but makes the logs a bit more
     verbose unnecessarily.

- are the defaults for *_chunk_size reasonable? AFAIR you did some
benchmarks, so perhaps we know they are suboptimal?

   - assigning default values for *_chunk_size in innobackupex (and
     then passing them to xtrabackup) also makes checks like the
     following pointless:

> + if ($option_encrypt_chunk_size) {
> + $encrypt_cmd = $encrypt_cmd . " --encrypt-chunk-size=$option_encrypt_chunk_size";
> + }

- introducing a stream_encrypt_file(basedir, relpath) function would
help to avoid multiple variations of the following code:

> + if ($encrypt_cmd) {
> + $ret = system("cd $source_dir; $stream_cmd $database/$file_name | $encrypt_cmd") >> 8;
> + } else {
> + $ret = system("cd $source_dir; $stream_cmd $database/$file_name") >> 8;
> + }

   - in xtrabackup *_chunk_size can be defined as GET_ULL rather than
     GET_UINT. in which case it will also be possible to use
     e.g. --encrypt-chunk-size=64K. And the corresponding macros can be
     defined as follows:

#define XB_CRYPT_CHUNK_SIZE ((size_t) (xtrabackup_encrypt_chunk_size))

If you implement it like this, you probably also want to adjust
minimum values and block size in the option declaration.

- the following extern declarations in ds_encrypt.c should go to
xtrabackup.h:

> extern char *xtrabackup_encrypt_algo;
> extern char *xtrabackup_encrypt_key;
> extern char *xtrabackup_encrypt_key_file;
> extern uint xtrabackup_encrypt_threads;
> extern uint xtrabackup_encrypt_chunk_size;

(the same applies to extern declarations in ds_compress.c, it's
just that that code was written before xtrabackup.h was introduced)

   - the BP, MP and revision comments say "Similar to compression,
     encryption is not supported on streamed tar backups." But why do
     we need callback in ds_archive then?

   - the --encrypt option can be defined as GET_ENUM rather than
     GET_STR, which would remove some trivial code from encrypt_init()
     and xbcrypt.c

- encrypt_mode is a constant, so should be either 'const' or
a #define?

   - if the key file contains some trailing newlines,
     those will also be read and considered a part of the key. And the
     newlines are different on Unix/Windows, so even identically
     looking keyfiles may be different, depending on where they were
     created.

   - in case xb_crypt_read_key_file() is used, memory for the key is
     allocated but never freed. this can be fixed by declaring
     xtrabackup_encrypt_key as GET_STR_ALLOC and then freeing it
     regardless of how it was allocated.

   - we can fix the above 2 issues by having a fixed-size buffer
     corresponding to the maximum possible key length, and then reading
     one line from the key file (and making sure it's not longer than
     that).

- should we also document the key requirements for different
algorithms, i.e. minimum and maximum length?

- why the ".xx" suffix? would something like ".xbcrypt" be better?

- does libgrypt impose any limits on the minimum block size being
encrypted?

- why do we need gcry_cipher_reset()/gcry_cipher_setiv() in
encrypt_worker_thread_func()?

- If we really need those, can the first initial call to
gcry_cipher_setiv() in encrypt_init() be removed?

   - would be nice to have assert that the total number of datasinks
     does not exceed XTRABACKUP_MAX_DATASINKS in
     XTRABACKUP_ADD_DATASINK()

Hi George,

Looks great in general. A bunch of minor comments and questions below:

- many lines have trailing whitespace. see the output of:

:     bzr diff -c-1 | egrep "^\+.*[[:space:]]+$" 
   
   - a couple of error messages use '\r' instead of '\n'

- are the defaults for *_chunk_size reasonable? AFAIR you did some
     benchmarks, so perhaps we know they are suboptimal?

- assigning default values for *_chunk_size in innobackupex (and
     then passing them to xtrabackup) also makes checks like the
     following pointless:

> +        if ($option_encrypt_chunk_size) {
> +          $encrypt_cmd = $encrypt_cmd . " --encrypt-chunk-size=$option_encrypt_chunk_size";
> +        }

- introducing a stream_encrypt_file(basedir, relpath) function would
     help to avoid multiple variations of the following code:

> +                if ($encrypt_cmd) {
> +                    $ret = system("cd $source_dir; $stream_cmd $database/$file_name | $encrypt_cmd") >> 8;
> +                } else {
> +                    $ret = system("cd $source_dir; $stream_cmd $database/$file_name") >> 8;
> +                }

#define XB_CRYPT_CHUNK_SIZE ((size_t) (xtrabackup_encrypt_chunk_size))

If you implement it like this, you probably also want to adjust
     minimum values and block size in the option declaration.

- the following extern declarations in ds_encrypt.c should go to
     xtrabackup.h:

> extern char	*xtrabackup_encrypt_algo;
> extern char	*xtrabackup_encrypt_key;
> extern char	*xtrabackup_encrypt_key_file;
> extern uint	xtrabackup_encrypt_threads;
> extern uint	xtrabackup_encrypt_chunk_size;

(the same applies to extern declarations in ds_compress.c, it's
     just that that code was written before xtrabackup.h was introduced)

- the BP, MP and revision comments say "Similar to compression,
     encryption is not supported on streamed tar backups." But why do
     we need callback in ds_archive then?

- the --encrypt option can be defined as GET_ENUM rather than
     GET_STR, which would remove some trivial code from encrypt_init()
     and xbcrypt.c

- encrypt_mode is a constant, so should be either 'const' or
     a #define?

- should we also document the key requirements for different
     algorithms, i.e. minimum and maximum length?

- why the ".xx" suffix? would something like ".xbcrypt" be better?

- does libgrypt impose any limits on the minimum block size being
     encrypted?

- why do we need  gcry_cipher_reset()/gcry_cipher_setiv() in
     encrypt_worker_thread_func()?

- If we really need those, can the first initial call to
     gcry_cipher_setiv() in encrypt_init() be removed?

- I'm not sure fsync()ing the output file at the end of xbcrypt.c is
     a good idea. If a crash happens during the backup, it will be
     invalid anyway, what's the point of ensuring data is on disk after
     each file? And fsync()ing the input file doesn't make sense.
   
   - would be nice to have assert that the total number of datasinks
     does not exceed XTRABACKUP_MAX_DATASINKS in
     XTRABACKUP_ADD_DATASINK()

review: Needs Fixing

« Back to merge proposal