Mir

Merge lp://staging/~robertcarr/mir/session-transactions into lp://staging/~mir-team/mir/trunk

Proposed by Robert Carr
Status: Rejected
Rejected by: Robert Carr
Proposed branch: lp://staging/~robertcarr/mir/session-transactions
Merge into: lp://staging/~mir-team/mir/trunk
Diff against target: 214 lines (+51/-17)
6 files modified
include/server/mir/shell/application_session.h (+3/-1)
include/server/mir/shell/session.h (+7/-0)
include/test/mir_test_doubles/mock_session.h (+1/-1)
src/server/shell/application_session.cpp (+15/-9)
src/server/shell/single_visibility_focus_mechanism.cpp (+11/-6)
tests/unit-tests/shell/test_single_visibility_focus_mechanism.cpp (+14/-0)
To merge this branch: bzr merge lp://staging/~robertcarr/mir/session-transactions
Reviewer Review Type Date Requested Status
Alan Griffiths Needs Fixing
PS Jenkins bot (community) continuous-integration Approve
Robert Ancell Approve
Review via email: mp+175669@code.staging.launchpad.net

Commit message

Perform focus setting transactionally, enabling the stress test suite to run.

Description of the change

This branch is a stab at fixing:

https://bugs.launchpad.net/mir/+bug/1195089

First the back story:

The way the crash from mir_stress presented itself from me was a throw in SessionManager::close_session "Invalid Session" when called from ~SessionMediator. That is to say, somehow we still have the msh::Session shared_ptr around but it has already been removed from the session manager. A bit of digging revealed the problem presents from SessionMediator::disconnect. SessionManager::close_session as called from SessionMediator::disconnect can throw in a few different ways, and when it does so the SessionMediator will be destroyed, but fail to reset the session pointer (which it does immediately following the call to SessionManager::close_session).

I tried logging this exception, and found that all the exceptions which could be thrown originated from setting the focus when closing a surface. I found the following two:

1. SingleVisibilityFocusMechanism requests the default_surface of a session, getting std::shared_ptr<msh::Session>. Now say, we are prempted by a frontend thread which closes the surface we have just grabbed. So now our msh::Session pointer is still valid, but the call to lock the internal ms::Surface will fail and throw an exception "Invalid surface" (when we call ->set_input_focus).
2. Things start the same way, but the surface is destroyed later, say after we enter AndroidInputTargeter::focus_changed(std::shared_ptr<mi::InputChannel>) we are preempted by a frontend thread destroying the surface. Now we try and find the input window handle for InputChannel (back in the focus_changed thread), but it has already been unregistered, and so we throw "Window handle requested for unregistered surface."

In order to solve this, I've introduced an API, msh::Session::transaction(std::function). The semantics of this function are easy to describe:

TEST(Session, create_and_destroy_surface_from_other_threads_blocks_during_transaction)

Unfortunately this test is quite difficult to write deterministic-ally so the function is essentially untested. I came up with one solution using real-time scheduling attributes (SCHED_FIFO), but this requires running the tests as root or using root to set a system capability on the test binary. If we want to do this/can think of a good way I can write the test.

With this branch I was able to run the stress test for several minutes and drag the cursor around. Eventually I got bored.

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Robert Ancell (robert-ancell) wrote :

+ // from a second thread before input focus is set.xs
Use emacs much?

Revision history for this message
Robert Ancell (robert-ancell) wrote :

I'd prefer a second opinion on using the transaction function over just using a lock for consistency with the other code, but otherwise it seems to make sense.

review: Approve
858. By Robert Carr

Remove secret emacs signet ring

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I see the need, but don't like the solution. Maybe my next article should be "recursive_mutex considered evil"?

What we are lacking is a clear synchronization strategy in our data model. (C.f. "The case for a "SceneGraph"" on mir-devel.)

I'm tempted to hold my nose and "Approve" but will try to think of a cleaner solution first.

review: Abstain
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

...having thought about it: doesn't the problem stem from not holding SessionManager::mutex when these actions occur?

void msh::SessionManager::close_session(std::shared_ptr<mf::Session> const& session)
{
    auto shell_session = std::dynamic_pointer_cast<Session>(session);

    session_listener->stopping(shell_session);

    app_container->remove_session(shell_session);

    std::unique_lock<std::mutex> lock(mutex);
    set_focus_to_locked(lock, focus_sequence->default_focus());
}

Note that the SessionManager::mutex isn't seized until after the remove_session() call.

inline void msh::SessionManager::set_focus_to_locked(std::unique_lock<std::mutex> const&, std::shared_ptr<Session> const& shell_session)
{
    auto old_focus = focus_application.lock();

    focus_application = shell_session;

    focus_setter->set_focus_to(shell_session);
    if (shell_session)
    {
        session_listener->focused(shell_session);
    }
    else
    {
        session_listener->unfocused();
    }
}

Note that the SessionManager::mutex isn't seized.

review: Needs Fixing
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

> inline void
> msh::SessionManager::set_focus_to_locked(std::unique_lock<std::mutex> const&,
> std::shared_ptr<Session> const& shell_session)
> {
> auto old_focus = focus_application.lock();
>
> focus_application = shell_session;
>
> focus_setter->set_focus_to(shell_session);
> if (shell_session)
> {
> session_listener->focused(shell_session);
> }
> else
> {
> session_listener->unfocused();
> }
> }
>
> Note that the SessionManager::mutex isn't seized.

Sorry, that last bit is a thinko.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Also, is there any need for DefaultSessionContainer::remove_session() to throw? It can (trivially) meet its post condition.

Revision history for this message
Robert Carr (robertcarr) wrote :

Holding the SessionManager mutex isn't enough. Not that close_surface isn't routed through the SessionManager which is where this race is triggered. The existing locking around close_session is asctually enough to prevent it.

DefaultSessionContainer::remove_session perhaps need not throw (though I think this usage is indicative of programmer error). Still though, that would leave the race around WindowHandleRepository.

More than those two, there is a race in every portion of the shell which has code like:

Step 1: Get surface from session
Step 2: Query properties from surface.

Step 2 can always throw as it stands and there is no reasonable way to get around it.

Revision history for this message
Robert Carr (robertcarr) wrote :

Does anyone have opinions?

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

OK, I'm beginning to see that the code is broken worse than I thought.

In any case, the problems arise from a poorly synchronized model - and part of that is that some locks are in the wrong place (i.e. in the controller).

Putting recursive locks into the controller seems like the wrong answer.

Sadly, fixing the model ("scenegraph") is a larger conversation.

review: Needs Fixing

Unmerged revisions

858. By Robert Carr

Remove secret emacs signet ring

857. By Robert Carr

Set focus transactionally

856. By Robert Carr

Hack

855. By Daniel van Vugt

buffer_swapper_spin.h: Remove dead code: initialize_queues.

Approved by PS Jenkins bot, Alan Griffiths.

854. By Stephen M. Webb

disable running the integration test suite on arm architecture during the packaging builds (lp: #1195265). Fixes: https://bugs.launchpad.net/bugs/1195260, https://bugs.launchpad.net/bugs/1195265.

Approved by PS Jenkins bot, Kevin DuBois.

853. By Alan Griffiths

doc: use house font and colors for coding guidelines.

Approved by PS Jenkins bot, Kevin DuBois, Alexandros Frantzis.

852. By Robert Carr

Implement a connection authorization mechanism.

Approved by Gerry Boland, Alan Griffiths, Robert Ancell, PS Jenkins bot.

851. By Robert Ancell

Releasing 0.0.7

850. By Robert Carr

Extract DefaultServerConfiguration::the_cursor_listener from DefaultServerConfiguration::the_input_configuration. Fixes: https://bugs.launchpad.net/bugs/1192916.

Approved by PS Jenkins bot, Alan Griffiths, Alexandros Frantzis, Robert Ancell.

849. By Alan Griffiths

geometry: make geometry compound types easier to construct. Fixes: https://bugs.launchpad.net/bugs/1199756.

Approved by PS Jenkins bot, Alexandros Frantzis.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches