> For a speed check I tried to see if it has issues going concurrently. > I think I didn't find the right way to call it - I tried: > rm -rf output/; CURTIN_VMTEST_KEEP_DATA_FAIL=all ./tools/jenkins-runner -vv > --nologcapture --processes=10 --process-timeout=3600 tests/storagetest_runner/ > > I found that with this call the section following section tries to write an > infinitely huge file: > Building tarball of curtin: /mnt/nvme/curtin-wesley/partial-testing > > Wanted to let you know just in case that infinite write is a bug. Thanks for pointing that out, that's definintely a bug in how I generate the curtin tarball. I hadn't run before in the jenkins runner, because sparse files don't work right in my /home partition and I didn't want the tests writing huge files there, but yeah, the jenkins runner puts the curtin tmp dir inside the curtin dir being tested, so it would cause tar to try to include the tarball its generating in itself. I need to make the tests aware of what environment they're running in so they know to omit that directory. > Later on I found this in the doc: > nosetests3 --processes=-1 tests/storagetest_runner > > That gave me a stuck system - maybe too much cpus (6x2threads) and by that too > much output? > In any case the output indenting was totally broken - I had to reset my > console to scroll again. Yeah, I've seen the output get messed up too, but I think what is happening is just that the processes are racing to write to stdout and the data they're writing is getting corrupted somehow. I haven't tried with more than 4 processes at once, so there may be some bugs that occur in that case. I'll look through the log and try to see if I can figure out what was going wrong there. > That failed me then with errors, not sure if that is bad or just a wrong call > - here is the log: http://paste.ubuntu.com/19157752/ > I realized that since this first "hanging" processes -1 run all tests were > failing this way now. > All on: > Traceback (most recent call last): > File "/mnt/nvme/curtin-wesley/partial- > testing/tests/storagetest_runner/__init__.py", line 385, in > test_reported_results > self.assertTrue(os.path.isfile(self.storage_log)) > AssertionError: False is not true > > Debugging gave me: "(qemu) qemu-system-x86_64: cannot set up guest memory > 'pc.ram': Cannot allocate memory" > That likely also was my first hanging - but that could be fixed by freeing > some up :-) > But I wonder if we need some sort of "is enough mem avail" prior to call qemu? Yeah, the test runner eats up memory, not quite as much as the vmtests do while the target system tarball is being extracted in the vm though. A check would definitely be good, I'll look into how to write that. Watching a vm trying to run from swap isn't fun :) > In the following retry it left me again with a good return code, but plenty of > running qemu processes up. > That really needs some hardening. > > Then I wanted to step down and did only: > nosetests3 --processes=2 tests/storagetest_runner > To check if it works at all. > I got the same console that gets misformatted after a while ending with > Ran 0 tests in 112.494s > > Since all fails keep the logs around here the log file: > http://paste.ubuntu.com/19158435/ > > > TL;DR: concurrent execution needs some fixes and probably a bit hardening > against shooting itself :-) Yeah, looks like there are still quite a few problems there. I'll see if I can get the tests running a bit more reliably in parallel. Quite a bit of it may just come down to system resources though. About the good return code with the failed runs, the test suite seems to do that whenever there are no tests run and there are no exceptions, but the actual storagetest_runner, in tearDownClass will behave as through there was a failure and keep data if no tests are run, so maybe I should throw an error there just to make the nosetests return code be non zero.