Merge lp://staging/~gl-az/percona-xtrabackup/BT-23557-2.1-encrypted_stream into lp://staging/percona-xtrabackup/2.1

Proposed by George Ormond Lorch III
Status: Merged
Approved by: Alexey Kopytov
Approved revision: no longer in the source branch.
Merged at revision: 520
Proposed branch: lp://staging/~gl-az/percona-xtrabackup/BT-23557-2.1-encrypted_stream
Merge into: lp://staging/percona-xtrabackup/2.1
Diff against target: 3979 lines (+2797/-329)
42 files modified
innobackupex (+116/-23)
src/Makefile (+22/-9)
src/datasink.c (+15/-3)
src/datasink.h (+4/-1)
src/ds_archive.c (+268/-0)
src/ds_archive.h (+28/-0)
src/ds_compress.c (+7/-6)
src/ds_encrypt.c (+583/-0)
src/ds_encrypt.h (+28/-0)
src/ds_stdout.c (+121/-0)
src/ds_stdout.h (+28/-0)
src/ds_xbstream.c (+88/-144)
src/ds_xbstream.h (+3/-3)
src/xbcrypt.c (+659/-0)
src/xbcrypt.h (+77/-0)
src/xbcrypt_common.c (+46/-0)
src/xbcrypt_read.c (+180/-0)
src/xbcrypt_write.c (+94/-0)
src/xbstream.c (+5/-2)
src/xbstream.h (+6/-1)
src/xbstream_read.c (+2/-2)
src/xbstream_write.c (+35/-19)
src/xtrabackup.cc (+159/-79)
test/inc/xb_local.sh (+50/-0)
test/t/bug972169.sh (+1/-1)
test/t/ib_stream_compress.sh (+1/-1)
test/t/ib_stream_compress_encrypt.sh (+18/-0)
test/t/ib_stream_encrypt.sh (+11/-0)
test/t/ib_stream_parallel_encrypt.sh (+12/-0)
test/t/xb_basic.sh (+5/-32)
test/t/xb_compress.sh (+14/-0)
test/t/xb_compress_encrypt.sh (+26/-0)
test/t/xb_encrypt.sh (+18/-0)
test/t/xb_parallel_compress.sh (+14/-0)
test/t/xb_parallel_compress_encrypt.sh (+25/-0)
test/t/xb_parallel_encrypt.sh (+17/-0)
utils/build-binary.sh (+3/-1)
utils/build.sh (+2/-0)
utils/debian-dummy-rules.patch (+1/-0)
utils/debian/percona-xtrabackup.install (+1/-0)
utils/debian/rules (+1/-1)
utils/xtrabackup.spec (+3/-1)
To merge this branch: bzr merge lp://staging/~gl-az/percona-xtrabackup/BT-23557-2.1-encrypted_stream
Reviewer Review Type Date Requested Status
Alexey Kopytov (community) Approve
Sergei Glushchenko g2 Pending
Review via email: mp+152784@code.staging.launchpad.net

This proposal supersedes a proposal from 2013-02-27.

Description of the change

Introducing xtrabackup with encryption.

Encryption will be done through the libgcrypt library which can be found documented here: http://www.gnupg.org/documentation/manuals/gcrypt

Addition of libgcrypt requires now that libgcrypt and libgpgerror packages be installed.

Many package repositories still contain older versions of libgcrypt. The current stable version of 1.5.0 is reccomended and will take advantage of the AES-NI instruction set if available.

Similar to compression, encryption is not supported on streamed tar backups.

New options:
    --encrypt=<algorithm> : algorithm may be 'NONE', 'AES128', 'AES192' or 'AES256'. 'NONE' is used mainly for testing and is simply pass-through. Please refer to the libgcrypt manual for more information on these ciphers.

    --encrypt_key=<key> : a proper length encryption key to use. It is not reccomended to use this option where there is uncontrolled access to the machine as the command line and thus the key can be viewed as part of the process info. See option --encrypt_key_file.

    --encrypt_key_file=<keyfile> : the name of a file where the raw key of the appropriate length can be read from. The file must be a simple binary (or text) file that contains exactly the key to be used.

    --encrypt_threads=<threadcount> : number of threads used to encrypt in parallel (default=1).

    --encrypt_chunk_size=<size> : size (in bytes) of the working encryption buffer size for each encryption thread (default=64K).

    --compress_chunk_size=<size> : size (in bytes) of the working compression buffer size for each encryption thread (default=64K).

encrypt_key and encrypt_key_file are mutually exclusive. If --encrypt is specified, one of these MUST be specified else an error will occur.

encryption key lengths must be precise:
  AES128=128 bits or 16 bytes
  AES192=192 bits or 24 bytes
  AES256=256 bits or 32 bytes

When using the encrypt_key_file option, the file specified may have binary or textual content but must be _exactly_ the correct size for the algorithm being used. If using text, caution must be taken so that there are no extra spaces, tabs, carriage returns or line feeds within the file. These will cause the key to be read as an incorrect size and an error will be generated.

The coding task was fairly straight forward with some restructure/refactoring approved by Alexey K:
    - Update/set copyright notice in new and touched files to Copyright (c) 2009-2013 Percona Ireland Ltd.
    - Implemented new ds_stdout data sink.
    - Split the ds_stream data sink into two new, more specific data sinks, ds_archive and ds_xbstream.
    - Implement write callback model for ds_xbstream similar to libarchive.
    - Change ds_archive and ds_xbstream over to using callback write models.
    - Add new option to compression --compress_chunk_size to allow user specified chunk size management.
    - Implement xbcrypt format reader/writer. Format encapsulated as follows:
        8 bytes - magic string "XBCRYP01"
        8 bytes - reserved
        8 bytes - original size
        8 bytes - encrypted size
        4 bytes - checksum
        'encrypted size' bytes - encrypted data
    - Implement a new ds_encrypt datasync modeled after the existing compression datasync which will use libgcrypt for actual encryption task and write callbacks.
    - Add new options, options validations and pass through innobackupex.
    - Implement new utility xbcrypt modeled after xbstream to perform encryption outside of xtrabackup for metadata and to offer a means of decrypting an encrypted backup.
    - Implement new tests and refactor test script structure a little to make adding new, similar test cases a little easier.

To post a comment you must log in.
Revision history for this message
George Ormond Lorch III (gl-az) wrote : Posted in a previous version of this proposal
Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote : Posted in a previous version of this proposal

Hi George,
First of all I'd like to say that it is really great work. I've been looking at your branch some time ago when I was going to implement decompression, so now I decided to review it again. Here are my comments:
1. We mostly use space for indentation in innobackupex script. Some lines of this patch use tabs (for example 54-65 or 138-143).
2. As I can see encryption doesn't work for tar streamed backups. However blueprint keep silence about this fact. It should be properly documented (probably note for Hrvoje).
3. (c) in file headers are bit outdated (happy 2013!).
4. I found that blueprint lacks some low-level implementation details (for example encrypted file format is not documented).
5. Are the default values for number of threads and chunk size suite well for majority of users? If they are not optimal "out of the box", would be nice to document some simple formula to approximate them.
6. As I can see xbcrypt doesn't support parallel encryption/decyption. Is there any reason why this
been done?
7. Please modify testcase to use --no-timestamp instead of grep backup directory name.

review: Needs Fixing (g2)
Revision history for this message
George Ormond Lorch III (gl-az) wrote : Posted in a previous version of this proposal
Download full text (3.8 KiB)

Thanks Sergei, all good input and questions...

> Hi George,
> First of all I'd like to say that it is really great work. I've been looking
> at your branch some time ago when I was going to implement decompression, so
> now I decided to review it again. Here are my comments:
> 1. We mostly use space for indentation in innobackupex script. Some lines of
> this patch use tabs (for example 54-65 or 138-143).

Fixed. There seem to be several places though in innobackupex where this was not detected during review. I fixed some where I saw them adjacent to any of my changes.

> 2. As I can see encryption doesn't work for tar streamed backups. However
> blueprint keep silence about this fact. It should be properly documented
> (probably note for Hrvoje).

Duly noted in blueprint and commit note.

> 3. (c) in file headers are bit outdated (happy 2013!).

Fixed in all new and touched files using new Percona copyright tag "Copyright (c) 2009-2013 Percona Ireland Ltd."

> 4. I found that blueprint lacks some low-level implementation details (for
> example encrypted file format is not documented).

Added some detail on format, if you can identify any other are where you think I should be more specific, please say so and I will gladly expand. This is quite a large change set and could definitely use some deeper explanation but I worry about getting too verbose in the blueprint if there is such a thing.

> 5. Are the default values for number of threads and chunk size suite well for
> majority of users? If they are not optimal "out of the box", would be nice to
> document some simple formula to approximate them.

Since the encryption was based on the compression, I followed the same defaults for thread counts (1) and simply made the original working buffer size which was fixed at 64K into program options, defaulting at 64K. Added a little clarity on that into the blueprint and commit note. They are very likely not perfectly ideal for every use since hardware and resource availability and desired load is going to be different for every deployment. This exactly thought is what lead me to come up with my PLMCE 13 talk, how to balance all of these new options (and newer ones coming) to get your desired performance curve. I think as my research progresses on PLMCE talk I can offer some guidelines or blog post on tuning everything.

> 6. As I can see xbcrypt doesn't support parallel encryption/decyption. Is
> there any reason why this
> been done?

xbcrypt for encryption us usually just going to be encrypting meta-data files and other small-ish bits outside of xtrabackup. For now I just found it easier to keep it simple and single threaded. If the need arises it can be made to use the ds_xbcrypt and support parallel encryption. For decompression, well, I really didn't think it was necessary. Parallel decryption can become quite difficult to keep the incoming and outgoing blocks in the same order. Since decryption measures to be quite a bit faster than encryption and again, trying to keep some things simple for now, I just left it single threaded. If the need comes up, I can look at it again and try to come up with some scheme to read multiple blocks at a time, ...

Read more...

Revision history for this message
George Ormond Lorch III (gl-az) wrote : Posted in a previous version of this proposal
Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote : Posted in a previous version of this proposal

Hi George,

All fixes you've done are look good to me. As for the test case, you can usage of "--no-timespamp" switch for example in bug483827.sh. By default innobackupex create timestamp named directory in the backup target directory. When --no-timestamp is specified backup is placed exactly where target directory points to, so no grep is needed. This is strongly recommended to use in all modern test cases.

Thanks,
Sergei

Revision history for this message
George Ormond Lorch III (gl-az) wrote : Posted in a previous version of this proposal

> Hi George,
>
> All fixes you've done are look good to me. As for the test case, you can usage
> of "--no-timespamp" switch for example in bug483827.sh. By default
> innobackupex create timestamp named directory in the backup target directory.
> When --no-timestamp is specified backup is placed exactly where target
> directory points to, so no grep is needed. This is strongly recommended to use
> in all modern test cases.
>
> Thanks,
> Sergei

OK, I am aware of --no-timestamp and what it does, I was unaware of any directive that we should be using it in this way and allowing the backup to just go to the specified backupdir. IMHO it does make much more sense to use it if it is safe to assume that there would be no collisions from tests executing in parallel and proper cleanup before and after each test run. I will go ahead and change this test and maybe even scan other tests and post minor bugs to fix them.

Revision history for this message
George Ormond Lorch III (gl-az) wrote : Posted in a previous version of this proposal

For some reason, lp is not updating the merge diff with the actual changes in the branch when I 'resubmit proposal'. I wonder if I need to just delete the MP and start over or is this OK?

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote : Posted in a previous version of this proposal

George,

I don't agree that xbcrypt will deal only with tiny files as it also will crypt MyISAM tables. But I agree that it's unnecessary to implement parallel encryption as eventually our goal is to replace innobackupex script with xtrabackup. Only thing that sounds not very good in your arguments is that "decryption measures to be quite a bit faster than encryption". We are using AES, which is a symmetric encryption, and is the same speed whether encrypting or decrypting. So decryption shouldn't be *BIT* faster then encryption.

Approved.

Thanks,
Sergei

review: Approve (g2)
Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal
Download full text (5.1 KiB)

Hi George,

Looks great in general. A bunch of minor comments and questions below:

   - I don't like that we use xb_a() (i.e. release build assertions)
     unnecessarily in some cases. The rule of thumb followed in the
     InnoDB code is: assertions that verify code correctness should be
     debug-only, and assertions that verify data correctness should be
     release ones. Some of xb_a() assertions in the existing code also
     verify code correctness, but that's only because we had neither
     debug builds nor Jenkins at the time they were implemented.

     For example, archive_write() checks that archive_ctxt->dest_file is
     not NULL with a release assertion. And then the same check is in
     the callback function called by archive_write_date(). That wouldn't
     be a concern for debug builds, but do we really want those checks
     in release ones?

   - many lines have trailing whitespace. see the output of:

: bzr diff -c-1 | egrep "^\+.*[[:space:]]+$"

   - a couple of error messages use '\r' instead of '\n'

   - I wonder if we can avoid passing encrypt_chunk_size /
     compress_chunk_size from innobackupex to xtrabackup if they were
     not specified on the innobackupex command line, i.e. if the default
     values should be used by xtrabackup. Nothing is wrong
     with passing them unconditionally, but makes the logs a bit more
     verbose unnecessarily.

   - are the defaults for *_chunk_size reasonable? AFAIR you did some
     benchmarks, so perhaps we know they are suboptimal?

   - assigning default values for *_chunk_size in innobackupex (and
     then passing them to xtrabackup) also makes checks like the
     following pointless:

> + if ($option_encrypt_chunk_size) {
> + $encrypt_cmd = $encrypt_cmd . " --encrypt-chunk-size=$option_encrypt_chunk_size";
> + }

   - introducing a stream_encrypt_file(basedir, relpath) function would
     help to avoid multiple variations of the following code:

> + if ($encrypt_cmd) {
> + $ret = system("cd $source_dir; $stream_cmd $database/$file_name | $encrypt_cmd") >> 8;
> + } else {
> + $ret = system("cd $source_dir; $stream_cmd $database/$file_name") >> 8;
> + }

   - in xtrabackup *_chunk_size can be defined as GET_ULL rather than
     GET_UINT. in which case it will also be possible to use
     e.g. --encrypt-chunk-size=64K. And the corresponding macros can be
     defined as follows:

#define XB_CRYPT_CHUNK_SIZE ((size_t) (xtrabackup_encrypt_chunk_size))

     If you implement it like this, you probably also want to adjust
     minimum values and block size in the option declaration.

   - the following extern declarations in ds_encrypt.c should go to
     xtrabackup.h:

> extern char *xtrabackup_encrypt_algo;
> extern char *xtrabackup_encrypt_key;
> extern char *xtrabackup_encrypt_key_file;
> extern uint xtrabackup_encrypt_threads;
> extern uint xtrabackup_encrypt_chunk_size;

     (the same applies to extern declarations in ds_compress.c, it's
     just that that code was written before xtrabackup.h was introduced)

   - the BP, MP and revision comments s...

Read more...

review: Needs Fixing
Revision history for this message
George Ormond Lorch III (gl-az) wrote : Posted in a previous version of this proposal
Download full text (9.5 KiB)

> - I don't like that we use xb_a() (i.e. release build assertions)

Good point, I think a lot of that was just a result of copy/paste without really looking at what was going on within the code. I changed all of the instances in all of the files that I have touched to be xb_ad since I can find no compelling reason for any of these to be xb_a, they are all pretty much testing code correctness not spot testing data correctness.

> - many lines have trailing whitespace. see the output of:

Fixed.

> - a couple of error messages use '\r' instead of '\n'

Fixed.

> - I wonder if we can avoid passing encrypt_chunk_size /
> compress_chunk_size

Sure, I was just following the precedent set by the compress_threads but I guess I can see that the difference is that compress_threads really _should_ be specified (even if it is not enforced that way) where as the chunk sizes really are optional and the defaults in xtrabackup should be just fine. Removed the defaults in innobackupex and added conditional tests when building the various tool call command lines.

> - are the defaults for *_chunk_size reasonable? AFAIR you did some
> benchmarks, so perhaps we know they are suboptimal?

The current defaults are the same as the old hard coded compression chunk size (64K). They are not 100% optimal. They optimal values really depend on what other options are being specified to xtrabackup (such as compress/encrypt threads, parallel), the server load, hardware configuration, etc. It has been a while since I last ran my benchmarks but I don't recall there being any compelling reason to change the defaults. In fact, this is core to my PLMCE '13 talk, specifically, how to tune these options to achieve your desired result (speed vs load). I will be re-starting my benchmarks again very shortly as I prepare this talk. If I discover anything that makes me believe that these defaults are not ideal, I will issue a bug and change them.

> - introducing a stream_encrypt_file(basedir, relpath) function would
> help to avoid multiple variations of the following code:

Good idea! Done, but you may want to closely check my syntax as I am no Perl expert. It seems to work but that could just be an accident :-\

> - in xtrabackup *_chunk_size can be defined as GET_ULL rather than GET_UINT.

Another good idea, done and updated blueprint as well.

> - the following extern declarations in ds_encrypt.c should go to
> xtrabackup.h:

Done and also moved compression options over there as well since I already have some other mods in the ds_compress.* files.

> - the BP, MP and revision comments say "Similar to compression,
> encryption is not supported on streamed tar backups." But why do
> we need callback in ds_archive then?

Actually, that is a good point. By introducing the callback, we in fact _can_ do compression and encryption to streamed tar backups now as long as they are not parallel. I will leave this up to you. If you want, I can go ahead and add this functionality and new test cases for it or we can save this for later in a new blueprint. IIRC the reason I used the callback was from when ds_archive and ds_stream were jo...

Read more...

Revision history for this message
George Ormond Lorch III (gl-az) wrote : Posted in a previous version of this proposal
Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal

George,

Everything looks good except that:

   - the following change in percona-xtrabackup.install does not have a
     traling slash like other binaries. No idea if that's important as
     packaging is not my area, but better safe than sorry:

> --- utils/debian/percona-xtrabackup.install 2012-04-18 02:19:57 +0000
> +++ utils/debian/percona-xtrabackup.install 2013-02-21 19:01:25 +0000
> @@ -2,4 +2,5 @@
> xtrabackup_51 usr/bin/
> xtrabackup_55 usr/bin/
> xbstream usr/bin/
> +xbcrypt usr/bin
> innobackupex usr/bin/
>

   - encrypt_chunk_size in xtrabackup.c is now GET_LL so it accepts K/M
     suffixes, but the same option in xbcrypt is GET_UINT, so it
     doesn't. Also why is it GET_LL rather than GET_ULL in xtrabackup.c?

   - please either wrap the body of XTRABACKUP_ADD_DATASINK() into "do
     ... while(0)" or better yet make it a static inline function. In
     the current form it is impossible to use it in the "if ... else"
     constructs.

review: Needs Fixing
Revision history for this message
George Ormond Lorch III (gl-az) wrote : Posted in a previous version of this proposal

> - the following change in percona-xtrabackup.install does not have a
> traling slash like other binaries. No idea if that's important as
> packaging is not my area, but better safe than sorry:

Good catch. I know nothing about packaging either.

> - encrypt_chunk_size in xtrabackup.c is now GET_LL so it accepts K/M
> suffixes, but the same option in xbcrypt is GET_UINT, so it
> doesn't. Also why is it GET_LL rather than GET_ULL in xtrabackup.c?

Well, I was following the --use-memory option as a guide and it uses a longlong and GET_LL. It really doesn't make sense for these to be signed as negative values are completely invalid and even prevented by the min/max specifications. I went ahead and changed all of these to ulonglong and GET_ULL and fixed the xbcrypt.c, I knew I needed to change that too but forgot about it when doing the last fix.

> - please either wrap the body of XTRABACKUP_ADD_DATASINK() into "do
> ... while(0)" or better yet make it a static inline function. In
> the current form it is impossible to use it in the "if ... else"
> constructs.

Just made it a static inline void xtrabackup_add_datasink(ds_ctxt_t *ds)

Have you given any more thought on what you would like done re: compression and encryption to streamed tar backups?

Revision history for this message
George Ormond Lorch III (gl-az) wrote : Posted in a previous version of this proposal
Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal

Hi George,

On Wed, 27 Feb 2013 21:01:20 -0000, George Ormond Lorch III wrote:

> Have you given any more thought on what you would like done re: compression and encryption to streamed tar backups?
>

Compression will never work with tar, as we don't know the size of the
resulting file in advance.

Though yes, it can be fixed to work with encryption. Which as I
understand is a trivial thing to implement?

Revision history for this message
Alexey Kopytov (akopytov) : Posted in a previous version of this proposal
review: Needs Information
Revision history for this message
George Ormond Lorch III (gl-az) wrote : Posted in a previous version of this proposal

Ahh, yeah. I forgot that libarchive needs the final size when creating a new archive (ds_open).

Hmm, now that I look at it, encryption can't work and for a similar reason as compression, i.e. the absolute final output size can not be predetermined at the ds_open call.

The reason is that during the encryption ds_open, I just pass the my_stat straight through to the ds_open of the next data sink in the chain without accounting for the extra overhead added by the xbcrypt formatting. Since this formatting gets introduced during the ds_write, and not knowing how many of and what shape/size these writes will be coming into the ds_encrypt data sink, I can't account for/add any extra to the my_stat before passing it along to the next data sink. So there you have it, compression and encryption both have the same reason for not being compatible with the tar stream.

Revision history for this message
Alexey Kopytov (akopytov) : Posted in a previous version of this proposal
review: Approve
Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal

This should be rebased on the latest trunk. It doesn't build with the "xtrabackup.c -> xtrabackup.cc" conversion by Laurynas.

review: Needs Fixing
Revision history for this message
George Ormond Lorch III (gl-az) wrote : Posted in a previous version of this proposal

Yeah, I didn't think they would play well together from the review I did of the change. I'll get it fixed up, retested and a resubmitted MP.

Revision history for this message
George Ormond Lorch III (gl-az) wrote :
Revision history for this message
Alexey Kopytov (akopytov) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches