Thanks Sergei, all good input and questions... > Hi George, > First of all I'd like to say that it is really great work. I've been looking > at your branch some time ago when I was going to implement decompression, so > now I decided to review it again. Here are my comments: > 1. We mostly use space for indentation in innobackupex script. Some lines of > this patch use tabs (for example 54-65 or 138-143). Fixed. There seem to be several places though in innobackupex where this was not detected during review. I fixed some where I saw them adjacent to any of my changes. > 2. As I can see encryption doesn't work for tar streamed backups. However > blueprint keep silence about this fact. It should be properly documented > (probably note for Hrvoje). Duly noted in blueprint and commit note. > 3. (c) in file headers are bit outdated (happy 2013!). Fixed in all new and touched files using new Percona copyright tag "Copyright (c) 2009-2013 Percona Ireland Ltd." > 4. I found that blueprint lacks some low-level implementation details (for > example encrypted file format is not documented). Added some detail on format, if you can identify any other are where you think I should be more specific, please say so and I will gladly expand. This is quite a large change set and could definitely use some deeper explanation but I worry about getting too verbose in the blueprint if there is such a thing. > 5. Are the default values for number of threads and chunk size suite well for > majority of users? If they are not optimal "out of the box", would be nice to > document some simple formula to approximate them. Since the encryption was based on the compression, I followed the same defaults for thread counts (1) and simply made the original working buffer size which was fixed at 64K into program options, defaulting at 64K. Added a little clarity on that into the blueprint and commit note. They are very likely not perfectly ideal for every use since hardware and resource availability and desired load is going to be different for every deployment. This exactly thought is what lead me to come up with my PLMCE 13 talk, how to balance all of these new options (and newer ones coming) to get your desired performance curve. I think as my research progresses on PLMCE talk I can offer some guidelines or blog post on tuning everything. > 6. As I can see xbcrypt doesn't support parallel encryption/decyption. Is > there any reason why this > been done? xbcrypt for encryption us usually just going to be encrypting meta-data files and other small-ish bits outside of xtrabackup. For now I just found it easier to keep it simple and single threaded. If the need arises it can be made to use the ds_xbcrypt and support parallel encryption. For decompression, well, I really didn't think it was necessary. Parallel decryption can become quite difficult to keep the incoming and outgoing blocks in the same order. Since decryption measures to be quite a bit faster than encryption and again, trying to keep some things simple for now, I just left it single threaded. If the need comes up, I can look at it again and try to come up with some scheme to read multiple blocks at a time, decrypt in parallel and serialize their output. > 7. Please modify testcase to use --no-timestamp instead of grep backup > directory name. I am not sure what you mean here. The test cases were modeled after existing test cases (xb_basic.sh) which uses `grep "innobackupex: Backup created in directory" $OUTFILE | awk -F\' '{ print $2}'` to determine what the actual backup path was. I agree that it is kind of hackish but it is what was there before. Now, if we have a new preference from Alexey to use another method such as explicitly targeting a location then I would gladly adopt the new paradigm if an example can be shown. Thanks again for looking and for the feedback.