Merge lp://staging/~thomas-voss/net-cpp/fix-1423765 into lp://staging/net-cpp

Proposed by Thomas Voß
Status: Merged
Approved by: Michi Henning
Approved revision: 54
Merged at revision: 46
Proposed branch: lp://staging/~thomas-voss/net-cpp/fix-1423765
Merge into: lp://staging/net-cpp
Diff against target: 484 lines (+149/-91)
4 files modified
debian/control (+0/-1)
src/core/net/http/impl/curl/easy.cpp (+19/-9)
src/core/net/http/impl/curl/easy.h (+38/-12)
src/core/net/http/impl/curl/multi.cpp (+92/-69)
To merge this branch: bzr merge lp://staging/~thomas-voss/net-cpp/fix-1423765
Reviewer Review Type Date Requested Status
Michi Henning (community) Approve
PS Jenkins bot continuous-integration Needs Fixing
Review via email: mp+250588@code.staging.launchpad.net

Commit message

Make sure that Multi::Private instances are correctly cleaned up by only handing out weak_ptr's to it.

Description of the change

Make sure that Multi::Private instances are correctly cleaned up by only handing out weak_ptr's to it.

To post a comment you must log in.
Revision history for this message
Michi Henning (michihenning) wrote :

I just pulled the branch, ran cmake and successfully compiled it. Then, when running the tests, I got the famous "A requested feature, protocol or option was not found built-in in this libcurl due to a build-time decision." again.

net-cpp should not build if the modules it depends on are not installed. Could you please add whatever run-time dependencies there are to the build dependencies. I know we've been down this road once before, but I've forgotten what the missing packages are. And the error message that comes out of curl is singularly unhelpful. It would be sooo nice if the error message actually state *what* the thing is that couldn't be found :-(

review: Needs Fixing
Revision history for this message
Michi Henning (michihenning) wrote :

I'm still seeing some leaks. valgrind reports about 18 kB leaked for the http_client_test, and well over a megabyte for the load test.

review: Needs Fixing
47. By Thomas Voß

Disable sharing to avoid memory leaks.

Revision history for this message
Thomas Voß (thomas-voss) wrote :
Download full text (3.8 KiB)

valgrind --leak-check=full for http_client_test now reports:

==30442==
==30442== HEAP SUMMARY:
==30442== in use at exit: 304 bytes in 10 blocks
==30442== total heap usage: 54,976 allocs, 54,966 frees, 4,334,461 bytes allocated
==30442==
==30442== LEAK SUMMARY:
==30442== definitely lost: 0 bytes in 0 blocks
==30442== indirectly lost: 0 bytes in 0 blocks
==30442== possibly lost: 0 bytes in 0 blocks
==30442== still reachable: 304 bytes in 10 blocks
==30442== suppressed: 0 bytes in 0 blocks
==30442== Reachable blocks (those to which a pointer was found) are not shown.
==30442== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==30442==
==30442== For counts of detected and suppressed errors, rerun with: -v
==30442== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

valgrind --leak-check=full for http_client_load_test now reports:

==30468==
==30468== HEAP SUMMARY:
==30468== in use at exit: 139,615 bytes in 1,125 blocks
==30468== total heap usage: 315,910 allocs, 314,785 frees, 54,715,917 bytes allocated
==30468==
==30468== 139,311 (152 direct, 139,159 indirect) bytes in 1 blocks are definitely lost in loss record 97 of 97
==30468== at 0x4C2B100: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30468== by 0x4E8FDDD: boost::asio::detail::epoll_reactor::descriptor_state* boost::asio::detail::object_pool_access::create<boost::asio::detail::epoll_reactor::descriptor_state>() (object_pool.hpp:35)
==30468== by 0x4E8D6AA: boost::asio::detail::object_pool<boost::asio::detail::epoll_reactor::descriptor_state>::alloc() (object_pool.hpp:89)
==30468== by 0x4E8985A: boost::asio::detail::epoll_reactor::allocate_descriptor_state() (epoll_reactor.ipp:512)
==30468== by 0x4E88D99: boost::asio::detail::epoll_reactor::register_descriptor(int, boost::asio::detail::epoll_reactor::descriptor_state*&) (epoll_reactor.ipp:151)
==30468== by 0x4E8BE6C: boost::asio::detail::reactive_descriptor_service::assign(boost::asio::detail::reactive_descriptor_service::implementation_type&, int const&, boost::system::error_code&) (reactive_descriptor_service.ipp:108)
==30468== by 0x4E8C1B6: boost::asio::posix::stream_descriptor_service::assign(boost::asio::detail::reactive_descriptor_service::implementation_type&, int const&, boost::system::error_code&) (stream_descriptor_service.hpp:116)
==30468== by 0x4E922D2: boost::asio::posix::basic_descriptor<boost::asio::posix::stream_descriptor_service>::basic_descriptor(boost::asio::io_service&, int const&) (basic_descriptor.hpp:90)
==30468== by 0x4E8F84E: boost::asio::posix::basic_stream_descriptor<boost::asio::posix::stream_descriptor_service>::basic_stream_descriptor(boost::asio::io_service&, int const&) (basic_stream_descriptor.hpp:91)
==30468== by 0x4E83FC9: curl::multi::Handle::Private::Socket::Private::Private(boost::asio::io_service&, int) (multi.cpp:463)
==30468== by 0x4E83E4E: curl::multi::Handle::Private::Socket::Socket(boost::asio::io_service&, int) (multi.cpp:441)
==30468== by 0x4E84741: curl::multi::Handle::Private::socket_callback(void*, int, int, void*, void*) (multi.cpp:577)
==30468==
=...

Read more...

48. By Thomas Voß

Adjust build- and runtime deps for curl.

Revision history for this message
Michi Henning (michihenning) wrote :
Download full text (5.2 KiB)

Looks like most of the leaks are gone, thanks! But I'm still seeing problems. About one in every 10-15 runs of the load test on my 8-core Vivid machine, I get the output below from valgrind. Looks like there is a race condition somewhere.

==13111==
==13111== HEAP SUMMARY:
==13111== in use at exit: 174,592 bytes in 1,968 blocks
==13111== total heap usage: 317,581 allocs, 315,613 frees, 54,709,429 bytes allocated
==13111==
==13111== 174,528 (81 direct, 174,447 indirect) bytes in 1 blocks are definitely lost in loss record 65 of 65
==13111== at 0x4C2B100: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13111== by 0x4E878E7: boost::asio::detail::thread_info_base::allocate(boost::asio::detail::thread_info_base*, unsigned long) (thread_info_base.hpp:60)
==13111== by 0x4E879F6: boost::asio::asio_handler_allocate(unsigned long, ...) (handler_alloc_hook.ipp:50)
==13111== by 0x4E855F0: void* boost_asio_handler_alloc_helpers::allocate<curl::multi::Handle::Private::Timeout::Private::async_wait_for(std::weak_ptr<curl::multi::Handle::Private> const&, std::chrono::duration<long, std::ratio<1l, 1000l> > const&)::{lambda(boost::system::error_code const&)#1}>(unsigned long, curl::multi::Handle::Private::Timeout::Private::async_wait_for(std::weak_ptr<curl::multi::Handle::Private> const&, std::chrono::duration<long, std::ratio<1l, 1000l> > const&)::{lambda(boost::system::error_code const&)#1}&) (handler_alloc_helpers.hpp:37)
==13111== by 0x4E8513C: void boost::asio::detail::deadline_timer_service<boost::asio::time_traits<boost::posix_time::ptime> >::async_wait<curl::multi::Handle::Private::Timeout::Private::async_wait_for(std::weak_ptr<curl::multi::Handle::Private> const&, std::chrono::duration<long, std::ratio<1l, 1000l> > const&)::{lambda(boost::system::error_code const&)#1}>(boost::asio::detail::deadline_timer_service<boost::asio::time_traits<boost::posix_time::ptime> >::implementation_type&, curl::multi::Handle::Private::Timeout::Private::async_wait_for(std::weak_ptr<curl::multi::Handle::Private> const&, std::chrono::duration<long, std::ratio<1l, 1000l> > const&)::{lambda(boost::system::error_code const&)#1}&) (deadline_timer_service.hpp:185)
==13111== by 0x4E84E71: boost::asio::async_result<boost::asio::handler_type<curl::multi::Handle::Private::Timeout::Private::async_wait_for(std::weak_ptr<curl::multi::Handle::Private> const&, std::chrono::duration<long, std::ratio<1l, 1000l> > const&)::{lambda(boost::system::error_code const&)#1}, void (boost::system::error_code)>::type>::type boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> >::async_wait<curl::multi::Handle::Private::Timeout::Private::async_wait_for(std::weak_ptr<curl::multi::Handle::Private> const&, std::chrono::duration<long, std::ratio<1l, 1000l> > const&)::{lambda(boost::system::error_code const&)#1}>(boost::asio::detail::deadline_timer_service<boost::asio::time_traits<boost::posix_time::ptime> >::implementation_type&, boost::asio::handler_type&&) (deadline_timer_service.hpp:149)
==13111== by 0x4E84D63: boost::asio::async_result<boost::asio::handler_type<curl::multi::...

Read more...

review: Needs Fixing
Revision history for this message
Michi Henning (michihenning) wrote :

Regarding the issue with the dependencies, things work fine for me on my Vivid machine. On my Utopic machine, I can build the code but then still get the error when running the tests about a missing feature. I've checked that I have all the build deps from debian/control installed.

Either there is still something missing, or there is an issue with running the tests on Utopic?

49. By Thomas Voß

Make sure that we do not keep instances of Private::Socket or Private::Timeout alive.
Clean up error reporting, and allow for transporting detailed error reports generated by curl via an exception.

50. By Thomas Voß

Allow for querying the current error description.

51. By Thomas Voß

Remove dep on libcurl3-gnutls to make tests work out of the box.

52. By Thomas Voß

Ensure that async writes are not keeping objects alive.

53. By Thomas Voß

Make curl's openssl dialect the default.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Michi Henning (michihenning) wrote :
Download full text (5.0 KiB)

Running the load test in a loop, after maybe a hundred iterations or so, I got a valgrind complaint:

==29504==
==29504== HEAP SUMMARY:
==29504== in use at exit: 161,360 bytes in 1,769 blocks
==29504== total heap usage: 325,601 allocs, 323,832 frees, 54,917,164 bytes allocated
==29504==
==29504== 161,296 (73 direct, 161,223 indirect) bytes in 1 blocks are definitely lost in loss record 64 of 64
==29504== at 0x4C2B100: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==29504== by 0x4E8A699: boost::asio::detail::thread_info_base::allocate(boost::asio::detail::thread_info_base*, unsigned long) (thread_info_base.hpp:60)
==29504== by 0x4E8A7A8: boost::asio::asio_handler_allocate(unsigned long, ...) (handler_alloc_hook.ipp:50)
==29504== by 0x4E883DE: void* boost_asio_handler_alloc_helpers::allocate<curl::multi::Handle::Private::Timeout::Private::async_wait_for(std::weak_ptr<curl::multi::Handle::Private> const&, std::chrono::duration<long, std::ratio<1l, 1000l> > const&)::{lambda(boost::system::error_code const&)#1}>(unsigned long, curl::multi::Handle::Private::Timeout::Private::async_wait_for(std::weak_ptr<curl::multi::Handle::Private> const&, std::chrono::duration<long, std::ratio<1l, 1000l> > const&)::{lambda(boost::system::error_code const&)#1}&) (handler_alloc_helpers.hpp:37)
==29504== by 0x4E87F56: void boost::asio::detail::deadline_timer_service<boost::asio::time_traits<boost::posix_time::ptime> >::async_wait<curl::multi::Handle::Private::Timeout::Private::async_wait_for(std::weak_ptr<curl::multi::Handle::Private> const&, std::chrono::duration<long, std::ratio<1l, 1000l> > const&)::{lambda(boost::system::error_code const&)#1}>(boost::asio::detail::deadline_timer_service<boost::asio::time_traits<boost::posix_time::ptime> >::implementation_type&, curl::multi::Handle::Private::Timeout::Private::async_wait_for(std::weak_ptr<curl::multi::Handle::Private> const&, std::chrono::duration<long, std::ratio<1l, 1000l> > const&)::{lambda(boost::system::error_code const&)#1}&) (deadline_timer_service.hpp:185)
==29504== by 0x4E87CA1: boost::asio::async_result<boost::asio::handler_type<curl::multi::Handle::Private::Timeout::Private::async_wait_for(std::weak_ptr<curl::multi::Handle::Private> const&, std::chrono::duration<long, std::ratio<1l, 1000l> > const&)::{lambda(boost::system::error_code const&)#1}, void (boost::system::error_code)>::type>::type boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> >::async_wait<curl::multi::Handle::Private::Timeout::Private::async_wait_for(std::weak_ptr<curl::multi::Handle::Private> const&, std::chrono::duration<long, std::ratio<1l, 1000l> > const&)::{lambda(boost::system::error_code const&)#1}>(boost::asio::detail::deadline_timer_service<boost::asio::time_traits<boost::posix_time::ptime> >::implementation_type&, boost::asio::handler_type&&) (deadline_timer_service.hpp:149)
==29504== by 0x4E87B93: boost::asio::async_result<boost::asio::handler_type<curl::multi::Handle::Private::Timeout::Private::async_wait_for(std::weak_ptr<curl::multi::Handle::Private> const&, std::chrono::duration<long, std::rati...

Read more...

review: Needs Fixing
Revision history for this message
Michi Henning (michihenning) wrote :

Look, I'm starting to feel a bit guilty here. I don't want to hold up this fix unnecessarily. Most certainly, things are *heaps* better now than they were when I opened the bug.

But it's a concern to me that it doesn't run on my Utopic VM. I can build just fine, but the tests blow up. That, by itself, isn't that important to me. But, when I point my LD_LIBRARY_PATH at the libnet-cpp I have built an run the scopes-api unity tests, all the smartscopesproxy tests blow up too, and that *is* important to me :-)

So, on my Utopic machine, the net-cpp I got from the archives works fine, but the one I've built from source doesn't. There *has* to be some difference that matters. I don't have any PPAs installed on this machine, and my packages are all up to date. I'm really stumped on this one.

Would it be possible to instrument libcurl to tell us whatever it couldn't find? I haven't looked at the libcurl source yet. I might try building that and seeing whether I can insert some trace at the point of failure to tell me *what* it is that curl can't find. This has to be the most singularly useless error message I've seen in some time: "I couldn't find something or other. It's your job to guess what it is..." :-(

Revision history for this message
Michi Henning (michihenning) wrote :

Running ldd on the libnet-cpp.so in /usr/lib/x86_64-linux-gnu gives:

libcurl.so.4 => /usr/lib/x86_64-linux-gnu/libcurl.so.4 (0x00007f636c483000)

That's a symlink to /usr/lib/x86_64-linux-gnu/libcurl.so.4.3.0

Running ldd on the libnet-cpp.so that I've built myself gives:

libcurl-gnutls.so.4 => /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4 (0x00007f4e64426000)

That's a symlink to /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4.3.0

That would suggest that libcurl-gnutls.so.4.3.0 has a run-time dependency on something that isn't there?

I tried building with libcurl4-openssl-dev installed instead of libcurl4-gnutls-dev. Interestingly, with that, *only* the http_client_test fails whereas, with libcurl4-gnutls-dev, *both* the http_client_test and the http_client_load_test fail.

Revision history for this message
Michi Henning (michihenning) wrote :

So, summary:

with libcurl4-gnutls-dev installed, things break. With libcurl4-openssl-dev installed, things work. (The failure of the client_http_test in that case is due to not being able to connect to 127.0.0.1:5000 instead of a missing run-time dependency.)

Revision history for this message
Thomas Voß (thomas-voss) wrote :

> So, summary:
>
> with libcurl4-gnutls-dev installed, things break. With libcurl4-openssl-dev
> installed, things work. (The failure of the client_http_test in that case is
> due to not being able to connect to 127.0.0.1:5000 instead of a missing run-
> time dependency.)

Yup, but: the new debian/control setup in this MP takes care of replacing -gnutls with -openssl. So things should be good.

Revision history for this message
Michi Henning (michihenning) wrote :

 > Yup, but: the new debian/control setup in this MP takes care of replacing
> -gnutls with -openssl. So things should be good.

I'm not sure that's sufficient. The problem happened to me when I pulled the branch, typed "cmake", got a successful build, and then had the tests blow up with no sensible indication of what was wrong.

I think the cmake configuration step should check that the run-time dependencies are satisfied. Without this, the next unlucky person to come along and do what I did will get the same surprise.

From memory, I think the python stuff also has some run-time dependencies? These should be verified by cmake too.

54. By Thomas Voß

Augment the exception thrown when setting the default ssl engine option with a more understandable descriptive text.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Michi Henning (michihenning) wrote :

Thanks for all your work and persistence, Thomas!

review: Approve
Revision history for this message
Michi Henning (michihenning) wrote :

The failure on amd64 is the same one I got with my Utopic VM. It was caused by me not having python-flask installed.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches