Merge lp://staging/~nelson-chu/opencompute/add-ocp-cpu-memory-job into lp://staging/opencompute/checkbox

Proposed by Nelson Chu
Status: Superseded
Proposed branch: lp://staging/~nelson-chu/opencompute/add-ocp-cpu-memory-job
Merge into: lp://staging/opencompute/checkbox
Diff against target: 506 lines (+365/-43)
7 files modified
data/whitelists/opencompute-certify-local.whitelist (+0/-43)
debian/changelog (+19/-0)
jobs/TC-001-0001-CPU_Memory.txt.in (+31/-0)
jobs/local.txt.in (+7/-0)
scripts/cpu_info (+100/-0)
scripts/memory_info (+78/-0)
scripts/processor_topology (+130/-0)
To merge this branch: bzr merge lp://staging/~nelson-chu/opencompute/add-ocp-cpu-memory-job
Reviewer Review Type Date Requested Status
Nelson Chu Needs Resubmitting
Jeff Lane  Needs Fixing
Review via email: mp+206113@code.staging.launchpad.net

This proposal supersedes a proposal from 2014-02-12.

This proposal has been superseded by a proposal from 2014-02-14.

Description of the change

Scripts and annotations have been modified.

To post a comment you must log in.
Revision history for this message
Jeff Lane  (bladernr) wrote : Posted in a previous version of this proposal
Download full text (4.6 KiB)

Hi Nelson,

Thank you for breaking this into smaller pieces... it makes review a LOT easier. Now, there are some things that need fixing.

I'll take them one file at a time:
data/whitelists/opencompute-certify-local.whitelist looks fine.

jobs/TC-001-0001-CPU_Memory.txt.in:
1: any job that calls a script that needs root permissions to run must include the 'user: root' definition. See the file 'jobs/cpu.txt.in' for some examples where this is necessary. When I run the script cpu_info without root, the cache data is not returned and the test will fail. So this job needs 'user:root'
2: TC-001-0001-003-Memory_Information also needs 'user: root' added to properly run.

jobs/local.txt.in
1: The word "Verified" should be "Verify" in the description field.

scripts/cpu_info:
1: When I manually run the cpu_info script for the test TC-001-0001-001-CPU_information with and without root permissions, I get the following different outputs:
bladernr@klaatu:~/development/ocp-nelson-memory-cpu-test$ ./scripts/cpu_infoIntel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz
Can not found any CPU cache.
bladernr@klaatu:~/development/ocp-nelson-memory-cpu-test$ echo $?
30
bladernr@klaatu:~/development/ocp-nelson-memory-cpu-test$ sudo ./scripts/cpu_info
Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz
L1 Cache 32 KB
L2 Cache 256 KB
L3 Cache 6 MB
L3 cache size less than 20MB.
bladernr@klaatu:~/development/ocp-nelson-memory-cpu-test$ echo $?
50

I don't have a system that meets the test criteria, so it will always fail for me. First, the explanation for failure should be more explicit. Example: 'FAIL: Can not find any CPU cache.' for the first example. This is more important in the second example where 'L3 cache size less than 20MB' looks like part of the lshw output. It would be better more explicitly stated as 'FAIL: L3 cache size less than 20MB.'

2: You don't need all those explicit error codes. Checkbox only knows and stores 0 and Not 0 exit codes. A 0 exit code indicates a test passed. A non-zero exit code indicates a failure. Checkbox does not store the actual exit codes. This is not necessarily something you need to change, but you DO need to be aware of the behaviour in case a script behaves differently than expected when you run it via checkbox.

3: The description and spec for this test says: "CPU model should belong to Intel Xeon processor E5-2600 family..." but your test does not fail on non-Xeon processors. For example, when I comment out the error code return for my cache limit on my laptop, I get this output:
bladernr@klaatu:~/development/ocp-nelson-memory-cpu-test$ sudo ./scripts/cpu_info; echo $?
Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz
L1 Cache 32 KB
L2 Cache 256 KB
L3 Cache 6 MB
L3 cache size less than 20MB.
0

but my laptop should clearly fail the test case since it's not up to OCP spec.

4: The output needs to be cleaned up a bit. Sorry, I know English is a second language for you, so I'll try to help as much as I can.
    "Can not parser" should be "Can not parse"
    "Can not found" should be "Can not find"
    "# Parser lshw XML for gather" should be "# Parse lshw XML for gathering"

scripts/memory_info:
1: Needs to be run as ...

Read more...

review: Needs Fixing
Revision history for this message
Nelson Chu (nelson-chu) wrote : Posted in a previous version of this proposal

OK, Thank you for your suggestion.
I will revise it accordingly.

Revision history for this message
Nelson Chu (nelson-chu) wrote : Posted in a previous version of this proposal

Hi Jeff,

I have modified scripts. Please help me review them.
Any suggestion will be appreciated.

Thanks,
Nelson

review: Needs Resubmitting
Revision history for this message
Jeff Lane  (bladernr) wrote : Posted in a previous version of this proposal

Before I can go any further, you have a conflict in data/whitelists/opencompute-certify-local.whitelist

To see this, you should do the following:

bzr branch lp:opencompute/checkbox ocp-checkbox
cd ocp-checkbox
bzr merge lp:~nelson-chu/opencompute/add-ocp-cpu-memory-job

That MAY actually just clean it up... but there's a file conflict in there. so please resolve that and I'll review the rest at that time.

review: Needs Fixing
Revision history for this message
Jeff Lane  (bladernr) wrote :

Hi Nelson.. Most everything looks good now. There are two remaining issues...

1: the changelog now seems to have duplicate entreies (see the diff below).

I am sorry if I wasn't clear, but you need to put your entries BETWEEN the version header and timestamp lines. For example:

checkbox (1.16.13~OCP) UNRELEASED; urgency=low

  [ Jeff Marcom ]
  * Updated scripts/network script from lp:checkbox

 -- Jeff Marcom <email address hidden> Tue, 5 Nov 2013 11:12:04 -0400

You entry should come after the "checkbox (1.16.13~OCP) line and above the first entry by Jeff Marcom like so:

checkbox (1.16.13~OCP) UNRELEASED; urgency=low

  [ Nelson Chu ]
  * data/whitelists/opencompute-certify-local.whitelist - Added new jobs to
    certification whitelist
  * jobs/TC-001-0002-Platform_Controller_Hub.txt - Added new jobs for PCH
    test cases
  * jobs/local.txt.in - Added job to parse new PCH job file
  * scripts/check_sata_port - new script to verify SATA port speed
  * scripts/check_usb_port - new script to verify USB version

  [ Jeff Marcom ]
  * Updated scripts/network script from lp:checkbox

 -- Jeff Marcom <email address hidden> Tue, 5 Nov 2013 11:12:04 -0400

2: the memory_info script doesn't seem to work... I keep getting this when I try to run it:
bladernr@mini-ubuntu:~/development/nelson-cpu-memory$ sudo ./scripts/memory_info
Fail: Cannot find memory slot

If I run lshw manually, I see memory class objects in the listing:
<node id="bank:0" claimed="true" class="memory" handle="DMI:001F">
       <description>DIMM</description>
       <product>None</product>
       <vendor>None</vendor>
       <physid>0</physid>
       <serial>None</serial>
       <slot>A0</slot>
       <size units="bytes">1073741824</size>
       <width units="bits">64</width>
      </node>

which does list all the required attributes (vendor, description, slot and size).

Can you show me an example of XML from lshw that is correct and works, and an example of passing script output

3: one other thing I just noticed... you do this a lot:

  print("some error message")
  return SOME_NON_ZERO_INTEGER

If you do it this way, your error message will not appear in the checkbox results. Checkbox will add stdout if a test passes and stderr if a test returns a non-zero exit code.

To correct this, ANY message that should be printed when a script exits in an error condition like this:

  print("FAIL: Some necessary criteria was not met")
  return 50

should be instead be printed to stderr like this:
  print("FAIL: Some necessary criteria was not met", file=sys.stderr)
  return 50

This will save your sanity in the long run if a test fails and you don't see any output in the results file.

review: Needs Fixing
2171. By Nelson Chu

Modify cpu_info and memory_info scripts. Revise debian/changelog file.

Revision history for this message
Nelson Chu (nelson-chu) wrote :

Hi Jeff,

I glad I am getting closer.

1.) I misunderstood what you meant. Sorry for that...

2.) This is a strange issue. When I run memory_info script on Winterfell is fine.

But, run on other server both good and bad.

I parse lshw XML from server that occur this issue. It appears this section as below:
      <node id="bank" class="memory" handle="DMI:0057">
       <description>FLASH Non-volatile 33 MHz (30.3 ns)</description>
       <product>25Q Series</product>
       <vendor>Micron/Numonyx</vendor>
       <physid>0</physid>
       <size units="bytes">16777216</size>
       <width units="bits">8</width>
       <clock units="Hz">33000000</clock>
      </node>

So XML parser cannot gather memory info correctly...

I think it work fine with Winterfell, the successful output is:
DIMM DDR3 1333 MHz (0.8 ns) vendor=Samsung slot=ChannelA_Dimm1 size=8GB.
DIMM DDR3 1333 MHz (0.8 ns) vendor=Samsung slot=ChannelA_Dimm2 size=8GB.
DIMM DDR3 1333 MHz (0.8 ns) vendor=Samsung slot=ChannelB_Dimm1 size=8GB.
DIMM DDR3 1333 MHz (0.8 ns) vendor=Samsung slot=ChannelB_Dimm2 size=8GB.
DIMM DDR3 1333 MHz (0.8 ns) vendor=Samsung slot=ChannelC_Dimm1 size=8GB.
DIMM DDR3 1333 MHz (0.8 ns) vendor=Samsung slot=ChannelC_Dimm2 size=8GB.
DIMM DDR3 1333 MHz (0.8 ns) vendor=Samsung slot=ChannelD_Dimm1 size=8GB.
DIMM DDR3 1333 MHz (0.8 ns) vendor=Samsung slot=ChannelD_Dimm2 size=8GB.
DIMM DDR3 1333 MHz (0.8 ns) vendor=Samsung slot=ChannelE_Dimm1 size=8GB.
DIMM DDR3 1333 MHz (0.8 ns) vendor=Samsung slot=ChannelE_Dimm2 size=8GB.
DIMM DDR3 1333 MHz (0.8 ns) vendor=Samsung slot=ChannelF_Dimm1 size=8GB.
DIMM DDR3 1333 MHz (0.8 ns) vendor=Samsung slot=ChannelF_Dimm2 size=8GB.
DIMM DDR3 1333 MHz (0.8 ns) vendor=Samsung slot=ChannelG_Dimm1 size=8GB.
DIMM DDR3 1333 MHz (0.8 ns) vendor=Samsung slot=ChannelG_Dimm2 size=8GB.
DIMM DDR3 1333 MHz (0.8 ns) vendor=Samsung slot=ChannelH_Dimm1 size=8GB.
DIMM DDR3 1333 MHz (0.8 ns) vendor=Samsung slot=ChannelH_Dimm2 size=8GB.
Total number of DIMMs is 16.

if you insist I will modify the script.

3.) I didn't notice that before. I have modified it correctly.

I am appreciated greatly your help.

Nelson

review: Needs Resubmitting
2172. By Nelson Chu

debian/changelog file and memory_info script

Unmerged revisions

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches