Merge into trunk : tpc-support : Code : Storm

Reviewer	Date Requested	Status
Thomas Herve (community)	2012-03-01	Approve on 2012-03-16
Stuart Bishop (community)	2012-03-01	Approve on 2012-03-05
Review via email: mp+95357@code.staging.launchpad.net

Revision history for this message

Stuart Bishop (stub) wrote on 2012-03-02:

#

=== modified file 'NEWS'
+- Add support for two-phase commits, using the DB API version 2.0, which
+ for now is only supported by psycopg2.

'for now is only supported by psycopg2' is better written as 'is only supported by PostgreSQL'. It isn't so much a driver issue, but that PostgreSQL is the only backend that provides the TPC feature... and not our fault :-)

=== modified file 'storm/store.py'
+ def begin(self, xid):
+ """Start a new two-phase transaction.

Should xid be an optional parameter, and we generate one using a UUID if it isn't provided? If we do make xid optional, the xid needs to be returned from this method. Optional parameter means people can be lazy, but this may not be a good thing as ideally your global transaction id should be shared between all the TPC participants, making it easier to resolve a failed TPC dance.

=== added file 'storm/xid.py'
+class Xid(object):
+ """
+ Represent a transaction identifier compliant with the XA specification.
+ """
+
+ def __init__(self, format_id, global_transaction_id, branch_qualifier):
+ self.format_id = format_id
+ self.global_transaction_id = global_transaction_id
+ self.branch_qualifier = branch_qualifier

Should any or all of the constructor's parameters be optional?

=== modified file 'storm/zope/zstorm.py'

+from random import randint

+ if global_transaction_id is None:
+ # The the global transaction doesn't have an ID yet, let's create
+ # one in a way that it will be unique
+ global_transaction_id = "_storm_%032x" % randint(0, 2 ** 128)
+ txn.__storm_transaction_id__ = global_transaction_id

You should be using a UUID here rather than randint. Random methods try to provide a random distribution. UUIDs guarantee (as much as possible) to provide unique value. We are after the UUID behavior here.

+ xid = Xid(0, global_transaction_id, zstorm.get_name(store))
+ store.begin(xid)

This seems good. We get a unique global transaction id each transaction that is shared between stores, helping us result transactions lost during TPC failure. I think store names will always be valid XID components, but we might want to check this. In any case, psycopg2 should raise an exception if it is given invalid input and it is nice we have a meaningful name.

The commit/tpc_* method changes look correct for PostgreSQL. However, for backends that don't support TPC we are moving the actual commit from happening in the commit() method to the tpc_finish() method. This is not ideal, but maybe it is good enough.

The use case here is if your application is mixing PostgreSQL and SQLLite. Provided you have only a single SQLLite Store, you can actually get TPC reliability. You first need to prepare() all TPC capable backends, then commit() the single TPC incapable backend, then tpc_commit() the TPC capable backends.

I think we can do this easily enough if we added an attribute to the Store that lets us check if that Store is TPC capable. We then change the sortKey in zstorm.py to list TPC capable Stores first in the commit order, and then adjust the logic in the commit/tpc_* methods to only c...

=== modified file 'NEWS'
+- Add support for two-phase commits, using the DB API version 2.0, which
+  for now is only supported by psycopg2.

'for now is only supported by psycopg2' is better written as 'is only supported by PostgreSQL'. It isn't so much a driver issue, but that PostgreSQL is the only backend that provides the TPC feature... and not our fault :-)

=== modified file 'storm/store.py'
+    def begin(self, xid):
+        """Start a new two-phase transaction.

Should xid be an optional parameter, and we generate one using a UUID if it isn't provided? If we do make xid optional, the xid needs to be returned from this method. Optional parameter means people can be lazy, but this may not be a good thing as ideally your global transaction id should be shared between all the TPC participants, making it easier to resolve a failed TPC dance.

=== added file 'storm/xid.py'
+class Xid(object):
+    """
+    Represent a transaction identifier compliant with the XA specification.
+    """
+
+    def __init__(self, format_id, global_transaction_id, branch_qualifier):
+        self.format_id = format_id
+        self.global_transaction_id = global_transaction_id
+        self.branch_qualifier = branch_qualifier

Should any or all of the constructor's parameters be optional?

=== modified file 'storm/zope/zstorm.py'
 
+from random import randint

+        if global_transaction_id is None:
+            # The the global transaction doesn't have an ID yet, let's create
+            # one in a way that it will be unique
+            global_transaction_id = "_storm_%032x" % randint(0, 2 ** 128)
+            txn.__storm_transaction_id__ = global_transaction_id

You should be using a UUID here rather than randint. Random methods try to provide a random distribution. UUIDs guarantee (as much as possible) to provide unique value. We are after the UUID behavior here.

+        xid = Xid(0, global_transaction_id, zstorm.get_name(store))
+        store.begin(xid)

This seems good. We get a unique global transaction id each transaction that is shared between stores, helping us result transactions lost during TPC failure. I think store names will always be valid XID components, but we might want to check this. In any case, psycopg2 should raise an exception if it is given invalid input and it is nice we have a meaningful name.

The commit/tpc_* method changes look correct for PostgreSQL. However, for backends that don't support TPC we are moving the actual commit from happening in the commit() method to the tpc_finish() method. This is not ideal, but maybe it is good enough.

The use case here is if your application is mixing PostgreSQL and SQLLite. Provided you have only a single SQLLite Store, you can actually get TPC reliability. You first need to prepare() all TPC capable backends, then commit() the single TPC incapable backend, then tpc_commit() the TPC capable backends.

I think we can do this easily enough if we added an attribute to the Store that lets us check if that Store is TPC capable. We then change the sortKey in zstorm.py to list TPC capable Stores first in the commit order, and then adjust the logic in the commit/tpc_* methods to only call the TPC methods if the backend is TPC capable.The sortOrder was created for this, back with Zope2, where the ZODB supported TPC but the none of the relational databases did. I think the rdb's all got 'zz_' prefixes to ensure they got committed last.

(and ignore the case where there are multiple TPC incapable stores - there are no guarantees there)

=== modified file 'tests/databases/base.py'
 
+class TwoPhaseCommitTest(object):
+
+    def test_begin(self):
+        """
+        begin() starts a transaction that can be ended with a two-phase commit.
+        """
+        xid = Xid(0, "foo", "bar")
+        self.connection.begin(xid)
+        self.connection.execute("INSERT INTO test VALUES (30, 'Title 30')")
+        self.connection.prepare()
+        self.connection.commit()
+        self.connection.rollback()
+        result = self.connection.execute("SELECT id FROM test WHERE id=30")
+        self.assertTrue(result.get_one())

We are missing a test for rollback after prepare (begin, execute, prepare, rollback, confirm changes were rolled back).

We are missing tests for mixing tpc and standard modes:
    execute, commit, begin, execute, prepare, commit, execute
    execute, rollback, begin, execute, prepare, rollback, execute

+    def test_begin_inside_a_two_phase_transaction(self):
+        """
+        begin() can't be used if a two-phase transaction has already started.
+        """
+        xid = Xid(0, "foo", "bar")
+        self.connection.begin(xid)
+        self.assertRaises(ProgrammingError, self.connection.begin, xid)

You should generate a fresh xid before the second begin() call. There is a chance this failed because you are attempting to use a non-unique xid. Both cases are probably worth testing.

+class TwoPhaseCommitDisconnectionTest(object):
+
+    def test_begin_after_rollback_with_disconnection_error(self):
+        """
+        If a rollback fails because of a disconnection error, the two-phase
+        transaction should be properly reset.
+        """
+        xid = Xid(0, "foo", "bar")
+        self.connection.begin(xid)
+        self.connection.execute("SELECT 1")
+        self.proxy.stop()
+        self.connection.rollback()
+        self.proxy.start()
+        self.connection.begin(xid)
+        result = self.connection.execute("SELECT 1")
+        self.assertTrue(result.get_one())

You seem to be relying on the fact that self.connection.begin(xid) will fail if the previous transaction using that xid was not rolled back. This is fragile, because that transaction may not have been rolled back yet; at least in the case of PostgreSQL it might take some time for the server to notice that the connection was dropped and roll it back. This might only be working because the only statement issued was a SELECT. I think you should generate a fresh xid for the second begin() to avoid this potential problem.

+    def test_begin_after_with_statement_disconnection_error_and_rollback(self):
+        """
+        The two-phase transaction state is properly reset if a disconnection
+        happens before the rollback.
+        """
+        xid = Xid(0, "foo", "bar")
+        self.connection.begin(xid)
+        self.proxy.close()
+        self.assertRaises(DisconnectionError,
+                          self.connection.execute, "SELECT 1")
+        self.connection.rollback()
+        self.proxy.start()
+        self.connection.begin(xid)
+        result = self.connection.execute("SELECT 1")
+        self.assertTrue(result.get_one())

Here too.

And a test to demonstrate how things fail if you try to reuse an uncommitted xid could be beneficial. It does fail, doesn't it?

=== modified file 'tests/databases/postgres.py'
 
+class PostgresTest(DatabaseTest, TwoPhaseCommitTest, TestHelper):

We are running the TwoPhaseCommitTests with PostgreSQL. Shouldn't we also be running them under the other backends? I think all the previous tests should pass with the other backends despite the fact they don't actually do TPC.

We should have a test that TPC is actually happening with PostgreSQL. I think this is important as I don't think we have anything else ensuring we actually get around to invoking the real connection.tpc_* methods. To do this we need to 'begin, update, prepare'. Then open a new psycopg2 connection and make the connection.tpc_commit(xid) call. That should succeed, and if you force the original connection to reconnect the changes should be visible.

=== modified file 'tests/zope/zstorm.py'

+    def test_transaction_with_two_phase_commit(self):
+        """
+        Two stores in two-phase-commit mode joining the same transaction share
+        the same global transaction ID.
+        """

This docstring is incorrect; it is a copy of the previous test's docstring.

Revision history for this message

Stuart Bishop (stub) on 2012-03-02:

#

review: Needs Fixing

lp://staging/~free.ekanayaka/storm/tpc-support updated on 2012-03-02

438. By Free Ekanayaka on 2012-03-02: Address review comments

Revision history for this message

Free Ekanayaka (free.ekanayaka) wrote on 2012-03-02:

#

Download full text (11.3 KiB)

Hi Stuart, thanks a lot for the insightful review. I think I have addressed everything except the single-non-tpc-store use case for which I need some clarification, see below.

=== modified file 'NEWS'

> 'for now is only supported by psycopg2' is better written as
> 'is only supported by PostgreSQL'

Fixed.

=== modified file 'storm/store.py'

> Should xid be an optional parameter, and we generate one using a
> UUID if it isn't provided?

I don't think this is a good idea for the reason you
mention. Note that in the DB API it's *not* optional as well.

Doing a two-phase commit when you have only one store doesn't
make sense, I think, and if you have to writing something like:

xid = store1.begin()
xid.branch_qualifier = "store2"
store2.begin(xid)

doesn't feel good nor convenient.

=== added file 'storm/xid.py'

+class Xid(object):
+ """
+ Represent a transaction identifier compliant with the XA specification.
+ """
+
+ def __init__(self, format_id, global_transaction_id, branch_qualifier):
+ self.format_id = format_id
+ self.global_transaction_id = global_transaction_id
+ self.branch_qualifier = branch_qualifier

> Should any or all of the constructor's parameters be optional?

I don't really see a way that we could really provide sane
defaults or generate them if they are not provided, but I'm open
to ideas. In particular, global_transaction_id is expected to be
passed to all the Xid objects that you build for the various
store that you have (each store should have a different
branch_qualifier though).

=== modified file 'storm/zope/zstorm.py'

+from random import randint

+ if global_transaction_id is None:
+ # The the global transaction doesn't have an ID yet, let's create
+ # one in a way that it will be unique
+ global_transaction_id = "_storm_%032x" % randint(0, 2 ** 128)
+ txn.__storm_transaction_id__ = global_transaction_id

> You should be using a UUID here rather than randint. Random
> methods try to provide a random distribution. UUIDs guarantee (as
> much as possible) to provide unique value. We are after the UUID
> behavior here.

Good point, fixed.

+ xid = Xid(0, global_transaction_id, zstorm.get_name(store))
+ store.begin(xid)

> This seems good. We get a unique global transaction id each
> transaction that is shared between stores, helping us result
> transactions lost during TPC failure. I think store names will
> always be valid XID components, but we might want to check
> this.

They should, according to the XA specification the are just
strings. Note that PostgreSQL doesn't really follow the XA
specification, it just have the notion of a "transaction_id" that
must be less than 200 bytes long. The psycopg2 driver will just
compose the global_transaction_id and branch_qualifier to build
the transaction_id.

> In any case, psycopg2 should raise an exception if it is
> given invalid input and it is nice we have a meaningful name.

Right.

> However, for backends that don't support TPC we are
> moving the actual commit from happening in the commit() method
> to the tpc_finish() method.

Uhm, the branch is *not* moving that. The StoreDataManager.commit
method reads:

    def commit(self, txn):
        if self._store._tpc:
   ...

Hi Stuart, thanks a lot for the insightful review. I think I have addressed everything except the single-non-tpc-store use case for which I need some clarification, see below.

=== modified file 'NEWS'

> 'for now is only supported by psycopg2' is better written as
> 'is only supported by PostgreSQL'

Fixed.

=== modified file 'storm/store.py'

> Should xid be an optional parameter, and we generate one using a
> UUID if it isn't provided?

I don't think this is a good idea for the reason you
mention. Note that in the DB API it's *not* optional as well.

Doing a two-phase commit when you have only one store doesn't
make sense, I think, and if you have to writing something like:

xid = store1.begin()
xid.branch_qualifier = "store2"
store2.begin(xid)

doesn't feel good nor convenient.

=== added file 'storm/xid.py'

+class Xid(object):
+ """
+ Represent a transaction identifier compliant with the XA specification.
+ """
+
+ def __init__(self, format_id, global_transaction_id, branch_qualifier):
+ self.format_id = format_id
+ self.global_transaction_id = global_transaction_id
+ self.branch_qualifier = branch_qualifier

> Should any or all of the constructor's parameters be optional?

I don't really see a way that we could really provide sane
defaults or generate them if they are not provided, but I'm open
to ideas. In particular, global_transaction_id is expected to be
passed to all the Xid objects that you build for the various
store that you have (each store should have a different
branch_qualifier though).

=== modified file 'storm/zope/zstorm.py'

+from random import randint

+ if global_transaction_id is None:
+ # The the global transaction doesn't have an ID yet, let's create
+ # one in a way that it will be unique
+ global_transaction_id = "_storm_%032x" % randint(0, 2 ** 128)
+ txn.__storm_transaction_id__ = global_transaction_id

> You should be using a UUID here rather than randint. Random
> methods try to provide a random distribution. UUIDs guarantee (as
> much as possible) to provide unique value. We are after the UUID
> behavior here.

Good point, fixed.

+ xid = Xid(0, global_transaction_id, zstorm.get_name(store))
+ store.begin(xid)

> This seems good. We get a unique global transaction id each
> transaction that is shared between stores, helping us result
> transactions lost during TPC failure. I think store names will
> always be valid XID components, but we might want to check
> this.

They should, according to the XA specification the are just
strings. Note that PostgreSQL doesn't really follow the XA
specification, it just have the notion of a "transaction_id" that
must be less than 200 bytes long. The psycopg2 driver will just
compose the global_transaction_id and branch_qualifier to build
the transaction_id.

> In any case, psycopg2 should raise an exception if it is
> given invalid input and it is nice we have a meaningful name.

Right.

> However, for backends that don't support TPC we are
> moving the actual commit from happening in the commit() method
> to the tpc_finish() method.

Uhm, the branch is *not* moving that. The StoreDataManager.commit
method reads:

def commit(self, txn):
        if self._store._tpc:
            self._store.prepare()
        else:
            try:
                self._store.commit()
            finally:
                self._hook_register_transaction_event()

so it explicitly keeps the old behavior and we *do* commit there
stores which are not set for TPC.

Perhaps you mean that the branch *should* change this? That would
indeed support the use case that you point about a single store
not capable of TPC and all others set for TPC.

By the way, note that according to the documention of
IDataManager commiting the changes in tpc_finish() should be
actually the correct behavior, whereas the one we have
now for stores not set for TPC (running Store.commit in
StoreDataManager.commit) is not:

def commit(transaction):
        """Commit modifications to registered objects.

Save changes to be made persistent if the transaction commits (if
        tpc_finish is called later).  If tpc_abort is called later, changes
        must not persist.

This includes conflict detection and handling.  If no conflicts or
        errors occur, the data manager should be prepared to make the
        changes persist when tpc_finish is called.
        """

def tpc_finish(transaction):
        """Indicate confirmation that the transaction is done.

Make all changes to objects modified by this transaction persist.

transaction is the ITransaction instance associated with the
        transaction being committed.

This should never fail.  If this raises an exception, the
        database is not expected to maintain consistency; it's a
        serious error.
        """

So if we want to support the use case you mention, I can change
the current behavior and run store.commit() in tpc_finish() for
stores not set for TPC, and change sortKey() to have non-TPC
stores commit first.

This would change the current behavior (though in a way more
compliant with the IDataManager contract), but I can't think of
bad consequences.

=== modified file 'tests/databases/base.py'

+class TwoPhaseCommitTest(object):
+
+ def test_begin(self):
+ """
+ begin() starts a transaction that can be ended with a two-phase commit.
+ """
+ xid = Xid(0, "foo", "bar")
+ self.connection.begin(xid)
+ self.connection.execute("INSERT INTO test VALUES (30, 'Title 30')")
+ self.connection.prepare()
+ self.connection.commit()
+ self.connection.rollback()
+ result = self.connection.execute("SELECT id FROM test WHERE id=30")
+ self.assertTrue(result.get_one())

> We are missing a test for rollback after prepare (begin, execute,
> prepare, rollback, confirm changes were rolled back).

Yeah, I'd consider it a test at the edge between unit-test and
functional-test, as Storm or my branch don't have (or need) any
specific logic for that to work, and we rely on the backend. But
I agree it's nice to have such a test, so I added it.

> We are missing tests for mixing tpc and standard modes:
>    execute, commit, begin, execute, prepare, commit, execute
>    execute, rollback, begin, execute, prepare, rollback, execute

Added them as well.

+ def test_begin_inside_a_two_phase_transaction(self):
+ """
+ begin() can't be used if a two-phase transaction has already started.
+ """
+ xid = Xid(0, "foo", "bar")
+ self.connection.begin(xid)
+ self.assertRaises(ProgrammingError, self.connection.begin, xid)

> You should generate a fresh xid before the second begin()
> call. There is a chance this failed because you are attempting to
> use a non-unique xid. Both cases are probably worth testing.

Good point, fixed.

I just didn't add a test for re-using the same xid as PostgreSQL
seems just happy with passing the same transaction_id string to
two different PREPARE TRANSACTION commands in two different
transactions. It feels like a questionable behavior, but
apparently that's the way it is. It's surely not something we
want to rely on or suggest users to rely on.

+class TwoPhaseCommitDisconnectionTest(object):
+
+ def test_begin_after_rollback_with_disconnection_error(self):
+ """
+ If a rollback fails because of a disconnection error, the two-phase
+ transaction should be properly reset.
+ """
+ xid = Xid(0, "foo", "bar")
+ self.connection.begin(xid)
+ self.connection.execute("SELECT 1")
+ self.proxy.stop()
+ self.connection.rollback()
+ self.proxy.start()
+ self.connection.begin(xid)
+ result = self.connection.execute("SELECT 1")
+ self.assertTrue(result.get_one())

> You seem to be relying on the fact that self.connection.begin(xid)
> will fail if the previous transaction using that xid was not
> rolled back. This is fragile, because that transaction may not
> have been rolled back yet;

Yes, I'm relying on it, but I think it's safe because here we are
not testing backend-related behavior, but rather Storm behavior.

In particular, if you comment the self._two_phase_transaction = False
line here (in Connection.rollback):

except Error, exc:
                if self.is_disconnection_error(exc):
                    self._raw_connection = None
                    self._state = STATE_RECONNECT
                    self._two_phase_transaction = False

then this test will fail because the second begin() will fail,
regardless of the xid and of what the backend does. Hope
I'm not missing something.

> at least in the case of PostgreSQL
> it might take some time for the server to notice that the connection
> was dropped and roll it back. This might only be working because
> the only statement issued was a SELECT. I think you should generate
> a fresh xid for the second begin() to avoid this potential problem.

That's nevertheless a good idea, so fixed it.

+ def test_begin_after_with_statement_disconnection_error_and_rollback(self):
+ """
+ The two-phase transaction state is properly reset if a disconnection
+ happens before the rollback.
+ """
+ xid = Xid(0, "foo", "bar")
+ self.connection.begin(xid)
+ self.proxy.close()
+ self.assertRaises(DisconnectionError,
+ self.connection.execute, "SELECT 1")
+ self.connection.rollback()
+ self.proxy.start()
+ self.connection.begin(xid)
+ result = self.connection.execute("SELECT 1")
+ self.assertTrue(result.get_one())

> Here too.

Fixed.

> And a test to demonstrate how things fail if you try to reuse an
> uncommitted xid could be beneficial. It does fail, doesn't it?

It doesn't fail for the reason above, afaict PostgreSQL seems happy
to reuse any transaction_id. I think Storm should discourage doing
so and consider the resulting behavior undefined. Nobody should rely
rely on the fact that it doesn't fail, so I'm a bit reluctant adding
such a test.

=== modified file 'tests/databases/postgres.py'

+class PostgresTest(DatabaseTest, TwoPhaseCommitTest, TestHelper):

> We are running the TwoPhaseCommitTests with PostgreSQL. Shouldn't we
> also be running them under the other backends? I think all the previous
> tests should pass with the other backends despite the fact they don't
> actually do TPC.

So the reason why I didn't add "dummy" implementations of begin/prepare
for the backend that don't support TPC, is basically this one (from
the second paragraph on):

https://bugs.launchpad.net/storm/+bug/132485/comments/4

That being said, we could add an UnsupportedTwoPhaseCommitTests mixin
to add the the non-TPC backend test classes and raise a NotSupportedError
error for those backends in begin/prepare.

> We should have a test that TPC is actually happening with PostgreSQL.
> I think this is important as I don't think we have anything else
> ensuring we actually get around to invoking the real connection.tpc_*
> methods. To do this we need to 'begin, update, prepare'. Then open
> a new psycopg2 connection and make the connection.tpc_commit(xid)
> call. That should succeed, and if you force the original connection
> to reconnect the changes should be visible.

Sounds good. I ended up adding a Connection.recover() method, and optional
xid parameters to Connection.commit() and Connection.rollback() in order
to support the full story at the Connection level, and test it in unit-tests.

We might possibly want to expose this at the Store level too.

=== modified file 'tests/zope/zstorm.py'

+ def test_transaction_with_two_phase_commit(self):
+ """
+ Two stores in two-phase-commit mode joining the same transaction share
+ the same global transaction ID.
+ """

> This docstring is incorrect; it is a copy of the previous test's docstring.

Fixed.

lp://staging/~free.ekanayaka/storm/tpc-support updated on 2012-03-02

439. By Free Ekanayaka on 2012-03-02: More assertions
440. By Free Ekanayaka on 2012-03-02: Comments
441. By Free Ekanayaka on 2012-03-02: Use _raw_xid

Revision history for this message

Stuart Bishop (stub) wrote on 2012-03-05:

#

Download full text (7.3 KiB)

On Sat, Mar 3, 2012 at 2:56 AM, Free Ekanayaka
<email address hidden> wrote:

>> However, for backends that don't support TPC we are
>> moving the actual commit from happening in the commit() method
>> to the tpc_finish() method.
>
> Uhm, the branch is *not* moving that. The StoreDataManager.commit
> method reads:
>
> def commit(self, txn):
> if self._store._tpc:
> self._store.prepare()
> else:
> try:
> self._store.commit()
> finally:
> self._hook_register_transaction_event()
>
> so it explicitly keeps the old behavior and we *do* commit there
> stores which are not set for TPC.
>
> Perhaps you mean that the branch *should* change this? That would
> indeed support the use case that you point about a single store
> not capable of TPC and all others set for TPC.

What I meant was that if you enable TPC on the store, the commit
happens at a different stage even if your backend does not support
TPC.

If you are mixing TPC capable and non-TPC capable backends, the 'best'
ordering seems to be:

transaction.commit
  1) prepare TPC capable backends for commit
transaction.tpc_finish
  1) commit non-TPC capable backends
  2) commit TPC capable backends

> By the way, note that according to the documention of
> IDataManager commiting the changes in tpc_finish() should be
> actually the correct behavior, whereas the one we have
> now for stores not set for TPC (running Store.commit in
> StoreDataManager.commit) is not:

Agreed.

> So if we want to support the use case you mention, I can change
> the current behavior and run store.commit() in tpc_finish() for
> stores not set for TPC, and change sortKey() to have non-TPC
> stores commit first.
>
> This would change the current behavior (though in a way more
> compliant with the IDataManager contract), but I can't think of
> bad consequences.

Right. It is an edge case that may never actually apply to anyone, so
I'm not going to mandate it for landing. It does seem correct to me.
We do need to ensure that the flush happens earlier, but you are
already doing that.

<update>See below, changing this seems pointless as ZStorm will not
support mixing TPC and non-TPC stores without further
extension</update>

> Good point, fixed.
>
> I just didn't add a test for re-using the same xid as PostgreSQL
> seems just happy with passing the same transaction_id string to
> two different PREPARE TRANSACTION commands in two different
> transactions. It feels like a questionable behavior, but
> apparently that's the way it is. It's surely not something we
> want to rely on or suggest users to rely on.

So PostgreSQL does raise an exception if you attempt to reuse an xid:

postgres=# begin;
BEGIN
postgres=# prepare transaction 'tid';
PREPARE TRANSACTION
postgres=#
postgres=# begin;
BEGIN
postgres=# prepare transaction 'tid';
ERROR: transaction identifier "tid" is already in use
postgres=#

What is happening is apparent from Python - the exception does not get
raised about a duplicate xid until the tpc_prepare().

Python 2.7.2+ (default, Oct 4 2011, 20:06:09)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more...

On Sat, Mar 3, 2012 at 2:56 AM, Free Ekanayaka
<free.ekanayaka@canonical.com> wrote:

>> However, for backends that don't support TPC we are
>> moving the actual commit from happening in the commit() method
>> to the tpc_finish() method.
>
> Uhm, the branch is *not* moving that. The StoreDataManager.commit
> method reads:
>
>    def commit(self, txn):
>        if self._store._tpc:
>            self._store.prepare()
>        else:
>            try:
>                self._store.commit()
>            finally:
>                self._hook_register_transaction_event()
>
> so it explicitly keeps the old behavior and we *do* commit there
> stores which are not set for TPC.
>
> Perhaps you mean that the branch *should* change this? That would
> indeed support the use case that you point about a single store
> not capable of TPC and all others set for TPC.

What I meant was that if you enable TPC on the store, the commit
happens at a different stage even if your backend does not support
TPC.

If you are mixing TPC capable and non-TPC capable backends, the 'best'
ordering seems to be:

transaction.commit
  1) prepare TPC capable backends for commit
transaction.tpc_finish
  1) commit non-TPC capable backends
  2) commit TPC capable backends

> By the way, note that according to the documention of
> IDataManager commiting the changes in tpc_finish() should be
> actually the correct behavior, whereas the one we have
> now for stores not set for TPC (running Store.commit in
> StoreDataManager.commit) is not:

Agreed.

> So if we want to support the use case you mention, I can change
> the current behavior and run store.commit() in tpc_finish() for
> stores not set for TPC, and change sortKey() to have non-TPC
> stores commit first.
>
> This would change the current behavior (though in a way more
> compliant with the IDataManager contract), but I can't think of
> bad consequences.

Right. It is an edge case that may never actually apply to anyone, so
I'm not going to mandate it for landing. It does seem correct to me.
We do need to ensure that the flush happens earlier, but you are
already doing that.

<update>See below, changing this seems pointless as ZStorm will not
support mixing TPC and non-TPC stores without further
extension</update>

> Good point, fixed.
>
> I just didn't add a test for re-using the same xid as PostgreSQL
> seems just happy with passing the same transaction_id string to
> two different PREPARE TRANSACTION commands in two different
> transactions. It feels like a questionable behavior, but
> apparently that's the way it is. It's surely not something we
> want to rely on or suggest users to rely on.

So PostgreSQL does raise an exception if you attempt to reuse an xid:

postgres=# begin;
BEGIN
postgres=# prepare transaction 'tid';
PREPARE TRANSACTION
postgres=#
postgres=# begin;
BEGIN
postgres=# prepare transaction 'tid';
ERROR:  transaction identifier "tid" is already in use
postgres=#

What is happening is apparent from Python - the exception does not get
raised about a duplicate xid until the tpc_prepare().

Python 2.7.2+ (default, Oct  4 2011, 20:06:09)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import psycopg2
>>> con = psycopg2.connect('')
>>> xid = con.xid(0,'foo','bar')
>>> con.tpc_begin(xid)
>>> con.tpc_prepare()
>>> con.tpc_begin(xid)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
psycopg2.ProgrammingError: tpc_begin cannot be used inside a transaction
>>> con.rollback()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
psycopg2.ProgrammingError: rollback cannot be used during a two-phase
transaction
>>> con = psycopg2.connect('')
>>> con.tpc_begin(xid)
>>> con.tpc_prepare()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
psycopg2.ProgrammingError: transaction identifier "0_Zm9v_YmFy" is
already in use

One other issue that I just noticed - PostgreSQL defaults to having
two phase commit transactions disabled (max_prepared_transactions ==
0). store.execute('show max_prepared_transactions') will return the
current setting - is it worth skipping tests if
max_prepared_transactions is 0? Or just adding this setting to the
docs on running the test suite?

> +class TwoPhaseCommitDisconnectionTest(object):
> +
> + def test_begin_after_rollback_with_disconnection_error(self):
> + """
> + If a rollback fails because of a disconnection error, the two-phase
> + transaction should be properly reset.
> + """
> + xid = Xid(0, "foo", "bar")
> + self.connection.begin(xid)
> + self.connection.execute("SELECT 1")
> + self.proxy.stop()
> + self.connection.rollback()
> + self.proxy.start()
> + self.connection.begin(xid)
> + result = self.connection.execute("SELECT 1")
> + self.assertTrue(result.get_one())
>
>> You seem to be relying on the fact that self.connection.begin(xid)
>> will fail if the previous transaction using that xid was not
>> rolled back. This is fragile, because that transaction may not
>> have been rolled back yet;
>
> Yes, I'm relying on it, but I think it's safe because here we are
> not testing backend-related behavior, but rather Storm behavior.

So my mistake was thinking that a TPC transaction had been started,
but it hasn't as no PREPARE has been issued. In fact, this test
doesn't create any TPC transactions at all at the PostgreSQL level -
it is just exercising the Python level which all seems fine.

> === modified file 'tests/databases/postgres.py'
>
> +class PostgresTest(DatabaseTest, TwoPhaseCommitTest, TestHelper):
>
>> We are running the TwoPhaseCommitTests with PostgreSQL. Shouldn't we
>> also be running them under the other backends? I think all the previous
>> tests should pass with the other backends despite the fact they don't
>> actually do TPC.
>
> So the reason why I didn't add "dummy" implementations of begin/prepare
> for the backend that don't support TPC, is basically this one (from
> the second paragraph on):
>
> https://bugs.launchpad.net/storm/+bug/132485/comments/4

Ok. I understand this reasoning. It does have the side effect though
of not allowing you to manage both TPC capable and TPC incapable
backends with a single ZStorm instance. It also makes some of my
earlier comments invalid - no need to ensure order of TPC and non-TPC
stores as we can't mix them.

So for now I'd suggest that TPC support in ZStorm requires all stores
to support TPC, which gets this landed sooner. Mix & match can be
added later if anyone actually wants to do that.

>> We should have a test that TPC is actually happening with PostgreSQL.
>> I think this is important as I don't think we have anything else
>> ensuring we actually get around to invoking the real connection.tpc_*
>> methods. To do this we need to 'begin, update, prepare'. Then open
>> a new psycopg2 connection and make the connection.tpc_commit(xid)
>> call. That should succeed, and if you force the original connection
>> to reconnect the changes should be visible.
>
> Sounds good. I ended up adding a Connection.recover() method, and optional
> xid parameters to Connection.commit() and Connection.rollback() in order
> to support the full story at the Connection level, and test it in unit-tests.
>
> We might possibly want to expose this at the Store level too.

Meh. Adding them to the Connection class is already overkill IMO :-)

-- 
Stuart Bishop <stuart.bishop@canonical.com>

Revision history for this message

Stuart Bishop (stub) wrote on 2012-03-05:

#

The new Connection.rollback(xid) has some broken behavior - if the rollback fails because of a disconnection error, the method succeeds.In this case, it should instead re-raise the exception.

Apart from this problem, I see nothing else blocking this from landing (beyond a second review :) )

review: Approve

Revision history for this message

Free Ekanayaka (free-ekanayaka) wrote on 2012-03-06:

#

Download full text (6.9 KiB)

  >>>However, for backends that don't support TPC we are
  >>>moving the actual commit from happening in the commit() method
  >>>to the tpc_finish() method.
  >>
  >>Uhm, the branch is *not* moving that. The StoreDataManager.commit
  >>method reads:
  >>
  >> def commit(self, txn):
  >> if self._store._tpc:
  >> self._store.prepare()
  >> else:
  >> try:
  >> self._store.commit()
  >> finally:
  >> self._hook_register_transaction_event()
  >>
  >>so it explicitly keeps the old behavior and we *do* commit there
  >>stores which are not set for TPC.
  >>
  >>Perhaps you mean that the branch *should* change this? That would
  >>indeed support the use case that you point about a single store
  >>not capable of TPC and all others set for TPC.

  > What I meant was that if you enable TPC on the store, the commit
  > happens at a different stage even if your backend does not support
  > TPC.

I see, but if your backend does not support TPC, then
transaction.commit() will fail, because if you enable TPC on the store
with zstorm.set_default_tpc('my-store', False), then the data manager
will try to call begin()/prepare(), which would fail if the backend does
not support that.

> If you are mixing TPC capable and non-TPC capable backends, the 'best'
> ordering seems to be:

  > transaction.commit
  > 1) prepare TPC capable backends for commit
  > transaction.tpc_finish
  > 1) commit non-TPC capable backends
  > 2) commit TPC capable backends

Agreed (see also the following comment).

  >>So if we want to support the use case you mention, I can change
  >>the current behavior and run store.commit() in tpc_finish() for
  >>stores not set for TPC, and change sortKey() to have non-TPC
  >>stores commit first.
  >>
  >>This would change the current behavior (though in a way more
  >>compliant with the IDataManager contract), but I can't think of
  >>bad consequences.

  > Right. It is an edge case that may never actually apply to anyone, so
  > I'm not going to mandate it for landing. It does seem correct to me.
  > We do need to ensure that the flush happens earlier, but you are
  > already doing that.

Okay, as it seems more correct to move commit() to tpc_finish() also for
the non-TPC case, and it simplifies code too, I went for it. I also
modified sortKey() to make non-TPC stores commit first and added a test
for this. So the use case is supported now.

  > <update>See below, changing this seems pointless as ZStorm will not
  > support mixing TPC and non-TPC stores without further
  > extension</update>

Why? ZStorm is already supporting mixing TPC and non-TPC stores via the
set_default_tpc() method? And that's from the very first version of the
branch in this MP.

  >>Good point, fixed.
  >>
  >>I just didn't add a test for re-using the same xid as PostgreSQL
  >>seems just happy with passing the same transaction_id string to
  >>two different PREPARE TRANSACTION commands in two different
  >>transactions. It feels like a questionable behavior, but
  >>apparently that's the way it is. It's surely not something we
  >>want to rely on or suggest users to rely on.

> So P...

>>>However, for backends that don't support TPC we are
  >>>moving the actual commit from happening in the commit() method
  >>>to the tpc_finish() method.
  >>
  >>Uhm, the branch is *not* moving that. The StoreDataManager.commit
  >>method reads:
  >>
  >>   def commit(self, txn):
  >>       if self._store._tpc:
  >>           self._store.prepare()
  >>       else:
  >>           try:
  >>               self._store.commit()
  >>           finally:
  >>               self._hook_register_transaction_event()
  >>
  >>so it explicitly keeps the old behavior and we *do* commit there
  >>stores which are not set for TPC.
  >>
  >>Perhaps you mean that the branch *should* change this? That would
  >>indeed support the use case that you point about a single store
  >>not capable of TPC and all others set for TPC.

> What I meant was that if you enable TPC on the store, the commit
  > happens at a different stage even if your backend does not support
  > TPC.

I see, but if your backend does not support TPC, then
transaction.commit() will fail, because if you enable TPC on the store
with zstorm.set_default_tpc('my-store', False), then the data manager
will try to call begin()/prepare(), which would fail if the backend does
not support that.

> If you are mixing TPC capable and non-TPC capable backends, the 'best'
  > ordering seems to be:

> transaction.commit
  >   1) prepare TPC capable backends for commit
  > transaction.tpc_finish
  >   1) commit non-TPC capable backends
  >   2) commit TPC capable backends

Agreed (see also the following comment).

>>So if we want to support the use case you mention, I can change
  >>the current behavior and run store.commit() in tpc_finish() for
  >>stores not set for TPC, and change sortKey() to have non-TPC
  >>stores commit first.
  >>
  >>This would change the current behavior (though in a way more
  >>compliant with the IDataManager contract), but I can't think of
  >>bad consequences.

> Right. It is an edge case that may never actually apply to anyone, so
  > I'm not going to mandate it for landing. It does seem correct to me.
  > We do need to ensure that the flush happens earlier, but you are
  > already doing that.

Okay, as it seems more correct to move commit() to tpc_finish() also for
the non-TPC case, and it simplifies code too, I went for it. I also
modified sortKey() to make non-TPC stores commit first and added a test
for this. So the use case is supported now.

> <update>See below, changing this seems pointless as ZStorm will not
  > support mixing TPC and non-TPC stores without further
  > extension</update>

Why? ZStorm is already supporting mixing TPC and non-TPC stores via the
set_default_tpc() method? And that's from the very first version of the
branch in this MP.

>>Good point, fixed.
  >>
  >>I just didn't add a test for re-using the same xid as PostgreSQL
  >>seems just happy with passing the same transaction_id string to
  >>two different PREPARE TRANSACTION commands in two different
  >>transactions. It feels like a questionable behavior, but
  >>apparently that's the way it is. It's surely not something we
  >>want to rely on or suggest users to rely on.

> So PostgreSQL does raise an exception if you attempt to reuse an xid:

> postgres=# begin;
  > BEGIN
  > postgres=# prepare transaction 'tid';
  > PREPARE TRANSACTION
  > postgres=#
  > postgres=# begin;
  > BEGIN
  > postgres=# prepare transaction 'tid';
  > ERROR:  transaction identifier "tid" is already in use
  > postgres=#

> What is happening is apparent from Python - the exception does not get
  > raised about a duplicate xid until the tpc_prepare().

> Python 2.7.2+ (default, Oct  4 2011, 20:06:09)
  > [GCC 4.6.1] on linux2
  > Type "help", "copyright", "credits" or "license" for more information.
  >>>>import psycopg2
  >>>>con = psycopg2.connect('')
  >>>>xid = con.xid(0,'foo','bar')
  >>>>con.tpc_begin(xid)
  >>>>con.tpc_prepare()
  >>>>con.tpc_begin(xid)
  > Traceback (most recent call last):
  >   File "<stdin>", line 1, in <module>
  > psycopg2.ProgrammingError: tpc_begin cannot be used inside a transaction
  >>>>con.rollback()
  > Traceback (most recent call last):
  >   File "<stdin>", line 1, in <module>
  > psycopg2.ProgrammingError: rollback cannot be used during a two-phase
  > transaction
  >>>>con = psycopg2.connect('')
  >>>>con.tpc_begin(xid)
  >>>>con.tpc_prepare()
  > Traceback (most recent call last):
  >   File "<stdin>", line 1, in <module>
  > psycopg2.ProgrammingError: transaction identifier "0_Zm9v_YmFy" is
  > already in use

Yeah, sorry, I actually meant that postgresql seems happy to re-use a
transaction ID once a prepared transaction with that ID has been
committed or rolled back. You can't do that concurrently, as you show.

> One other issue that I just noticed - PostgreSQL defaults to having
  > two phase commit transactions disabled (max_prepared_transactions ==
  > 0). store.execute('show max_prepared_transactions') will return the
  > current setting - is it worth skipping tests if
  > max_prepared_transactions is 0? Or just adding this setting to the
  > docs on running the test suite?

Sure, I did both.

>>=== modified file 'tests/databases/postgres.py'
  >>
  >>+class PostgresTest(DatabaseTest, TwoPhaseCommitTest, TestHelper):
  >>
  >>>We are running the TwoPhaseCommitTests with PostgreSQL. Shouldn't we
  >>>also be running them under the other backends? I think all the previous
  >>>tests should pass with the other backends despite the fact they don't
  >>>actually do TPC.
  >>
  >>So the reason why I didn't add "dummy" implementations of begin/prepare
  >>for the backend that don't support TPC, is basically this one (from
  >>the second paragraph on):
  >>
  >>https://bugs.launchpad.net/storm/+bug/132485/comments/4

> Ok. I understand this reasoning. It does have the side effect though
  > of not allowing you to manage both TPC capable and TPC incapable
  > backends with a single ZStorm instance.

I still I don't understand why you are saying this, as mentioned above
this possible with the set_default_tpc() method, which works per-store,
unless I'm missing something

>>>We should have a test that TPC is actually happening with PostgreSQL.
  >>>I think this is important as I don't think we have anything else
  >>>ensuring we actually get around to invoking the real connection.tpc_*
  >>>methods. To do this we need to 'begin, update, prepare'. Then open
  >>>a new psycopg2 connection and make the connection.tpc_commit(xid)
  >>>call. That should succeed, and if you force the original connection
  >>>to reconnect the changes should be visible.
  >>
  >>Sounds good. I ended up adding a Connection.recover() method, and optional
  >>xid parameters to Connection.commit() and Connection.rollback() in order
  >>to support the full story at the Connection level, and test it in unit-tests.
  >>
  >>We might possibly want to expose this at the Store level too.

> Meh. Adding them to the Connection class is already overkill IMO :-)

:)

lp://staging/~free.ekanayaka/storm/tpc-support updated on 2012-03-06

442. By Free Ekanayaka on 2012-03-06: Commit non-TPC stores first

Revision history for this message

Stuart Bishop (stub) wrote on 2012-03-09:

#

On Tue, Mar 6, 2012 at 5:28 PM, Pippo Paperino <email address hidden> wrote:

> I see, but if your backend does not support TPC, then
> transaction.commit() will fail, because if you enable TPC on the store
> with zstorm.set_default_tpc('my-store', False), then the data manager
> will try to call begin()/prepare(), which would fail if the backend does
> not support that.

I see. You are correct.

> Why? ZStorm is already supporting mixing TPC and non-TPC stores via the
> set_default_tpc() method? And that's from the very first version of the
> branch in this MP.

> I still I don't understand why you are saying this, as mentioned above
> this possible with the set_default_tpc() method, which works per-store,
> unless I'm missing something

The confusion is mine sorry. In my mind I had set_default_tpc() as a
method on the ZStorm utility.

I am happy with this branch. It is bigger than I thought it would be,
but it all seems necessary and the test coverage is great. Thanks!

The only additional thing I think we want is in the tests' setUp or
tearDown (or both), is to use your recover() method to discover and
rollback any prepared transactions left by failed tests. It doesn't
need to be on this branch though.

--
Stuart Bishop <email address hidden>

lp://staging/~free.ekanayaka/storm/tpc-support updated on 2012-03-12

443. By Free Ekanayaka on 2012-03-12: Merge from trunk, fix conflicts

Revision history for this message

Thomas Herve (therve) wrote on 2012-03-16:

#

[1]
+ global_transaction_id = getattr(txn, "__storm_transaction_id__", None)

Ideally, we would store this on an external object, but on something that we control instead. Maybe on the ZStorm instance, using a WeakKeyDictionary?

[2]
+ try:
+ self._store.commit()
+ except:
+ # We let the exception propagate, as the transaction manager
+ # will then call tcp_abort, and we will register the hook there
+ raise
+ else:
+ self._hook_register_transaction_event()

Looks equivalent to the following:

+ self._store.commit()
+ self._hook_register_transaction_event()

[3]
+ prepare() can't be used if a two-phase transaction has not began yet.

Typo: begun.

[4]
+ It's possible recover and commit pending transactions that were
+ prepared but not committed or rolled back.

+ It's possible recover and rollback pending transactions that were
+ prepared but not committed or rolled back.

Sounds confusing, it's missing a "to"?

It looks great! I'd be surprised if we get everything right in the first try, but the changes don't seem to impact the current behavior, which is nice. If possible, it'd be good to ask James H. to have a look, he probably have some interesting comments.

Thanks!

review: Approve

Revision history for this message

Thomas Herve (therve) wrote on 2012-03-16:

#

> [1]
> + global_transaction_id = getattr(txn, "__storm_transaction_id__",
> None)
>
> Ideally, we would store this on an external object, but on something that we
> control instead. Maybe on the ZStorm instance, using a WeakKeyDictionary?

We would *not*, sorry.

lp://staging/~free.ekanayaka/storm/tpc-support updated on 2012-03-28

444. By Free Ekanayaka on 2012-03-28: Merge from trunk, fix conflicts

Revision history for this message

Free Ekanayaka (free.ekanayaka) wrote on 2012-03-28:

#

Thanks Thomas! It should be all fixed, I'll ping James and ask if he can have a look.

[1]

Yeah, I did that initially, but than scratched everything because it seems unnecessarily complicated, like adding a _transaction_ids property plus a couple of get_transaction_id(txn) and set_transaction_id(txn) methods which clutter a bit the API.

This really seems like a lacking feature of the transaction module, it should offer an API to annotate a Transaction object.

Anyway, I think your point is valid so I went for a compromise an added just a private _transaction_ids property with WeakKeyDictionary and no public methods on ZStorm (as this is purely internal machinery).

[2]

Fixed.

[3]

Fixed.

[4]
Fixed.

lp://staging/~free.ekanayaka/storm/tpc-support updated on 2012-06-12

445. By Free Ekanayaka on 2012-03-28: Fix Thomas' review points
446. By Free Ekanayaka on 2012-06-12: Merge from trunk, solve conflicts

Storm

Merge lp://staging/~free.ekanayaka/storm/tpc-support into lp://staging/storm

Commit message

Description of the change

Preview Diff

Subscribers