Merge lp://staging/~epics-core/epics-base/fix-async-dns into lp://staging/~epics-core/epics-base/3.14

Proposed by mdavidsaver
Status: Merged
Merge reported by: Andrew Johnson
Merged at revision: not available
Proposed branch: lp://staging/~epics-core/epics-base/fix-async-dns
Merge into: lp://staging/~epics-core/epics-base/3.14
Diff against target: 603 lines (+339/-81)
4 files modified
src/libCom/misc/ipAddrToAsciiAsynchronous.cpp (+171/-81)
src/libCom/misc/ipAddrToAsciiAsynchronous.h (+4/-0)
src/libCom/test/Makefile (+3/-0)
src/libCom/test/ipAddrToAsciiTest.cpp (+161/-0)
To merge this branch: bzr merge lp://staging/~epics-core/epics-base/fix-async-dns
Reviewer Review Type Date Requested Status
Andrew Johnson Approve
Ralph Lange Approve
Review via email: mp+318385@code.staging.launchpad.net

Description of the change

An attempt to properly address the regressions exposed/uncovered by lp:1527636. Previous to that change the ipAddrToAscii worker thread was stopped by the last call to ipAddrToAsciiEngine::release(), preventing further transactions from executing. The change (by myself) for lp:1527636 made release() a no-op, which did not stop pending transactions. This lead to races involving msgForMultiply, and perhaps others.

The original code was still open to races when more than one call to ipAddrToAsciiEngine::allocate() was made. The shared singleton ipAddrToAsciiEngine would not stop the worker until the last release(). So the race with msgForMultiply could happen if, for example, more than one CA context exists.

This branch attempt to achieve the best of both worlds. The worker thread is not stopped, but the semantics of release() are restored, and improved. The semantics of the API should be otherwise unchanged.

Instead of being a singleton ipAddrToAsciiEngine becomes an empty struct to trace references. The singleton is renamed ipAddrToAsciiGlobal and is fully hidden from user code.

Each all to ipAddrToAsciiEngine::allocate() returns a new, unique, ipAddrToAsciiEngine instance. release() must be called to "destroy". ipAddrToAsciiEnginePrivate carries a reference counter to ensure it outlives any transactions which reference it.

Calling ipAddrToAsciiEngine::release() marks the engine as "released" which will any transactions from being pended. It cancels and already pending transactions with engine==this and wait()s if the currently executing callback is the same. Canceled transactions are not executed.

To post a comment you must log in.
Revision history for this message
Andrew Johnson (anj) wrote :

Subject to merge testing, accepted in principle at 3/14/2017 F2F meeting.

Revision history for this message
Ralph Lange (ralph-lange) wrote :

Code looks good to me.

I am not able to create an original setup that allows reproducing the error.
However, running the added test against an unpatched base I see occasional failures (test 4 fails in about 20% of the cases), while running it against this branch does not fail.

review: Approve
Revision history for this message
Andrew Johnson (anj) wrote :

Is there any reason to not run the new ipAddrToAsciiTest code on RTEMS and/or VxWorks? It has not been added to epicsRunLibComTests.c so it isn't included in the test harness at the moment.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

No reason, an oversight on my part. I'll add this evening if you haven't already.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

Done w/ the git branch. I'm going to try to see if I can switch this MR to the git branches w/o restarting it. Be prepared for some email...

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

FYI, "Resubmit proposal" errors when trying to switch to git branches. I'm not going to manually start a new MR unless a larger change is needed.

Revision history for this message
Andrew Johnson (anj) wrote :

This passes its self-test fine on VxWorks.

However a second run inside the harness (without rebooting first) caused this:

***** ipAddrToAsciiTest *****
1..5
0x475798 (tShell0): Unhandled C++ exception resulted in call to terminate

0x00548424 epicsRunLibComTests+0x250: runTestFunc ()
0x00559584 runTestFunc +0x4c : ipAddrToAsciiTest ()
0x00547500 ipAddrToAsciiTest+0x7c : 0x00557b90 ()
0x00557b90 ipAddrToAsciiEngine::allocate()+0x47c: __cxa_throw ()
0x0027c578 __cxa_throw +0x70 : std::terminate() ()
0x00279e64 __cxxabiv1::__unexpected(void (*)())+0x0 : __cxxabiv1::__terminate(void (*)()) ()
0x00279e30 __cxxabiv1::__terminate(void (*)())+0x18 : cplusTerminate ()
0x00133c70 cplusTerminate+0x4c : taskSuspend ()

Is the ipAddrToAsciiEngine::cleanup() routine necessary (this test is the only place it's called at the moment)? Could this become an epicsAtExit() routine, or would that just bring back the shutdown delays that caused these changes in the first place? Cleaning up things which only get initialized in an epicsThreadOnce() routine makes them impossible to use afterwards, so you couldn't call ipAddrToAsciiEngine::cleanup() in dbUnitTest.c; I'm not really sure where it could be used.

I'm not really expecting to be able to run the self-tests more than once without rebooting, but it is nice if they don't crash since many of the tests /can/ be run more than once.

There doesn't appear to be any documentation for this subsystem that could be updated, but the previous fix was mentioned in the 3.14.12.6 Release Notes so it would be good to have this fix covered there too.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

> Is the ipAddrToAsciiEngine::cleanup() routine necessary ...

This exists to help valgrind to tell me if I'm doing something stupid. It can be skipped with no ill effects. Maybe #ifdef to omit on rtems/vxworks?

> Release Notes

ok

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

@anj done

Revision history for this message
Andrew Johnson (anj) wrote :

Thanks. This branch is intended for merging into 3.14, which doesn't have valgrind/valgrind.h (nor does the 3.15 branch), so my build died:

../ipAddrToAsciiTest.cpp:10:31: fatal error: valgrind/valgrind.h: No such file or directory
 #include <valgrind/valgrind.h>
                               ^
compilation terminated.

However the failure didn't happen until it got to my windows-x64-mingw cross-compile; I got warnings from the VxWorks and RTEMS builds about the missing file, but they still succeeded because it isn't actually necessary to pull in that include file at all. The cleanup() routine is only useful for testing under valgrind, which only runs on Linux, so I just changed the conditional around the call to #ifdef __linux__ and now everything works.

Merging.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches