Merge lp://staging/~anotherjesse/nova/zero-volumes into lp://staging/~hudson-openstack/nova/trunk

Proposed by Jesse Andrews
Status: Merged
Approved by: Jesse Andrews
Approved revision: 975
Merged at revision: 986
Proposed branch: lp://staging/~anotherjesse/nova/zero-volumes
Merge into: lp://staging/~hudson-openstack/nova/trunk
Diff against target: 28 lines (+7/-0)
2 files modified
.mailmap (+1/-0)
nova/volume/driver.py (+6/-0)
To merge this branch: bzr merge lp://staging/~anotherjesse/nova/zero-volumes
Reviewer Review Type Date Requested Status
Devin Carlen (community) Approve
Soren Hansen (community) Needs Fixing
Jesse Andrews (community) Approve
Sandy Walsh (community) Needs Information
Review via email: mp+57025@code.staging.launchpad.net

Description of the change

Zero out volumes during deletion to prevent data leaking between users

To post a comment you must log in.
Revision history for this message
Jesse Andrews (anotherjesse) wrote :

dd will return 1 even if it succeeds - since we don't give dd a count and dd reports:

dd: writing to `/dev/mapper/nova--volumes-vol-123124': No space left on device

review: Needs Fixing
Revision history for this message
Erica Windisch (ewindisch) wrote :

I'm not sure that try_execute should be used here. It will use a global
value for retrying the command. By default, this is 3 times. This is a
high-impact event from a performance standpoint. If dd partially succeeds
and then fails, it will do so in duplicate.

It may be better to execute this directly via utils.execute than to use
try_execute. Also, a bit of a side-note... utils.execute now has a retry
feature built-in (kwarg 'attempts'). It might be worthwhile to modify
try_execute to curry utils.execute, changing the default. Then,
delete_volume could use try_execute, passing its own setting for the
attempts kwarg.

Finally, note that when GrokCloud used LVM, the decision was made to wipe at
creation, rather than deletion. This resulted in delayed creation, a
significant caveat, but it assured that a volume could NEVER be assigned to
a customer without first being wiped.

--
Regards,
Eric Windisch

Revision history for this message
Erica Windisch (ewindisch) wrote :

| feature built-in (kwarg 'attempts'). It might be worthwhile to modify

> try_execute to curry utils.execute, changing the default. Then,--
>

Wrap, not curry. Lets just say it was a lack of coffee ;-)

Regards,
Eric Windisch

Revision history for this message
Jesse Andrews (anotherjesse) wrote :

Eric - for Cactus do you think we should switch to zeroing on create?

I agree we should switch to: utils.execute

Revision history for this message
Erica Windisch (ewindisch) wrote :

On Sun, Apr 10, 2011 at 6:45 PM, anotherjesse <email address hidden>wrote:

> Eric - for Cactus do you think we should switch to zeroing on create?
>

I think it warrants further discussion, or a configurable option.

The best option would be to create all volumes as a snapshot of dm-zero. You
can see an example of this in Linux's device-mapper/zero.txt. This would
have some performance overhead, but will be fast and effective. There could
be backwards compatibility issues with earlier versions of Nova unless we
device a naming-scheme/workaround to support older volumes.

As far as zeroing volumes, if it is decided to stay with this, then I
suggest either wiping on create or using a hybrid approach (wipe on delete,
verify on create). Note that wiping on delete is largely incompatible with
LVM snapshots.

--
Regards,
Eric Windisch

Revision history for this message
Sandy Walsh (sandy-walsh) wrote :

Is the problem with wipe-on-delete that a new instance could use the volume before the wipe has completed? If so, shouldn't we be marking the volume as "unavailable" until the wipe is completed?

There should be no real advantage to doing it before or after.

Also, if the sudo prompts for a password we've just locked up. I don't know if there is a way to check sudo permissions proactively (sudo -l prompts as well).

review: Needs Information
Revision history for this message
Sandy Walsh (sandy-walsh) wrote :

Note: pvo had an interesting idea of using sudo -n which will error out of a password prompt is required.

Revision history for this message
Sandy Walsh (sandy-walsh) wrote :

s/of/is

Revision history for this message
Erica Windisch (ewindisch) wrote :

On Mon, Apr 11, 2011 at 9:45 AM, Sandy Walsh <email address hidden>wrote:

> Review: Needs Information
> Is the problem with wipe-on-delete that a new instance could use the volume
> before the wipe has completed? If so, shouldn't we be marking the volume as
> "unavailable" until the wipe is completed?
>

No, it is not. Until the wipe is completed, the volume would not be deleted.
(Note though that it appears the currently proposed patch may not do error
handling correctly)

The problem is that to do things this way we would have to:
1) Not use LVM snapshots and make it clear to those deploying Nova that they
cannot use this feature.
2) Make sure the code is tight. It needs to error where it should, fail
where it should, and not leave dangling volumes out there.

This adds complexity and room for bugs to surface. Even if we get it right,
it will not be difficult for someone to make a change that opens a security
hole. That bug might not be in this method.

In my opinion, the core issue is that newly created volumes should be
initialized with zero'ed data. If we need to make sure that new volumes have
zero data, you can only assure this is the case if you're starting from the
the point of creation.

> There should be no real advantage to doing it before or after.
>

Yes, in a perfect world where only Nova touches the volume manager, where
Nova decides to never use snapshots, where Nova has no bugs, and where Nova
developers program with a security-focused mindset. We don't live in a
perfect world.

I say that for Cactus, we put deletion on create and seek to investigate &
implement dm-zero snapshots for the next release.

--
Regards,
Eric Windisch

Revision history for this message
Erica Windisch (ewindisch) wrote :

It is worth considering, however, that there will be deployments of Nova
where people will want wipe-on-delete for legal/SLA purposes. In that case,
I suggest using dm-crypt. Rather than wiping the volumes, we can just zero
& delete the keyfile.

I wouldn't advocate using dm-crypt as a default due to the performance
impact, whereas dm-zero/dm-snapshot is sufficient for most cases.

I'm particularly interested in these and can write the blueprints.

--
Regards,
Eric Windisch

Revision history for this message
Vish Ishaya (vishvananda) wrote :

Please do write some blueprints for these. I'm good with the simple zero-on-delete for cactus.

Vish

On Apr 11, 2011, at 8:00 AM, Eric Windisch wrote:

> It is worth considering, however, that there will be deployments of Nova
> where people will want wipe-on-delete for legal/SLA purposes. In that case,
> I suggest using dm-crypt. Rather than wiping the volumes, we can just zero
> & delete the keyfile.
>
> I wouldn't advocate using dm-crypt as a default due to the performance
> impact, whereas dm-zero/dm-snapshot is sufficient for most cases.
>
> I'm particularly interested in these and can write the blueprints.
>
> --
> Regards,
> Eric Windisch
>
> https://code.launchpad.net/~anotherjesse/nova/zero-volumes/+merge/57025
> You are subscribed to branch lp:nova.

Revision history for this message
Jesse Andrews (anotherjesse) wrote :

fixed requested issues

review: Approve
Revision history for this message
Soren Hansen (soren) wrote :

I think we should keep the changes to the meaning of a 0 volume size out of this patch and focus on the problem at hand. Other than that, I'm fine with the change.

review: Needs Fixing
Revision history for this message
Devin Carlen (devcamcar) wrote :

sudo -n is good suggestion in general for our of use of sudo throughout nova. We should probably have both an execute and execute_sudo methods to abstract this.

That said, this patch solves the original problem.

@soren: I'm sorry - I don't follow what you mean about 0 volume size. Can you clarify?

review: Approve
Revision history for this message
Vish Ishaya (vishvananda) wrote :

devin: there is a hack for 0 size volumes to make them 100M. This currently isn't used by anyone, so jesse had removed it. Soren was suggesting to revert those changes and do it in a different patch.

Revision history for this message
Devin Carlen (devcamcar) wrote :

Ok, Soren's concerns are addressed. Going to merge this.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :
Download full text (46.7 KiB)

The attempt to merge lp:~anotherjesse/nova/zero-volumes into lp:nova failed. Below is the output from the failed tests.

AccountsTest
    test_account_create OK
    test_account_delete OK
    test_account_update OK
    test_get_account OK
AdminAPITest
    test_admin_disabled OK
    test_admin_enabled OK
APITest
    test_exceptions_are_converted_to_faults OK
Test
    test_authorize_token OK
    test_authorize_user OK
    test_bad_token OK
    test_bad_user_bad_key OK
    test_bad_user_good_key OK
    test_no_user OK
    test_token_expiry OK
TestFunctional
    test_token_doesnotexist OK
    test_token_expiry OK
TestLimiter
    test_authorize_token OK
LimiterTest
    test_limiter_custom_max_limit OK
    test_limiter_limit_and_offset OK
    test_limiter_limit_medium OK
    test_limiter_limit_over_max OK
    test_limiter_limit_zero OK
    test_limiter_negative_limit OK
    test_limiter_negative_offset OK
    test_limiter_nothing OK
    test_limiter_offset_bad OK
    test_limiter_offset_blank OK
    test_limiter_offset_medium OK
    test_limiter_offset_over_max OK
    test_limiter_offset_zero OK
ActionExtensionTest
    test_extended_action OK
    test_invalid_action OK
    test_invalid_action_body OK
ExtensionControllerTest
    test_get_by_alias OK
    test_index OK
ExtensionManagerTest
    test_get_resources OK
ResourceExtensionTest
    test_get_resources OK
    test_get_resources_with_controller OK
    test_no_extension_present OK
ResponseExtensionTest
    test_get_resources_with_mgr OK
    test_get_resources_with_stub_mgr OK
TestFaults
    test_400_fault_json OK
    test_400_fault_xml OK
...

975. By Jesse Andrews <email address hidden>

<email address hidden> to mailmap

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.