Merge lp://staging/~wallyworld/juju-core/oxygen-data-parse into lp://staging/~juju/juju-core/trunk
- oxygen-data-parse
- Merge into trunk
Status: | Merged |
---|---|
Merged at revision: | 1223 |
Proposed branch: | lp://staging/~wallyworld/juju-core/oxygen-data-parse |
Merge into: | lp://staging/~juju/juju-core/trunk |
Diff against target: |
954 lines (+944/-0) 2 files modified
environs/imagemetadata/simplestreams.go (+510/-0) environs/imagemetadata/simplestreams_test.go (+434/-0) |
To merge this branch: | bzr merge lp://staging/~wallyworld/juju-core/oxygen-data-parse |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Juju Engineering | Pending | ||
Review via email: mp+162308@code.staging.launchpad.net |
Commit message
Description of the change
Initial simple streams support
This branch adds a simple streams package to allow image metadata to be retrieved
from json encoded metadata. The process starts by reading an index file from a known
URL. The URL will be a static URL for EC2 (http://
and for openstack can be published in the keystone catalog or provided from an env
config. This latter work to plug in the URLs from the providers will be done separately.
This branch is just the core capability.
The data model used is a single implementation for all providers (so far EC2 and Openstack).
Various providers use many different metadata attributes but these are ignored - only the common
ones like id, region, arch, vtype etc are required to be read and used for filtering, and only
the id is ultimately required by the calling business logic.
The tests are written to use example data to check the various processing scenarios, but
also can be run live against canonistack or EC2. The live tests are a subset of the local
tests and check that basic parsing and image lookup works with real data from two providers
with significant differences in the exact data attributes used. The canonistack metadata has
been published in a known public bucket.
Ian Booth (wallyworld) wrote : | # |
Roger Peppe (rogpeppe) wrote : | # |
Nice work deciphering a fairly abstruse format. Generally pretty good,
but could use some work. This does not lgtm until the below issues are
resolved.
https:/
File environs/
https:/
environs/
parsing, and filtering Ubuntu image metadata in simplestreams format.
// The simplestream package supports ...
https:/
environs/
d (then it becomes a doc comment)
https:/
environs/
This is not a great name. For a start, despite the launchpad project
name, what we're dealing with here are neither simple nor streams.
Also, this package is specialised for image metadata. A "simplestreams"
package would be much more general than that.
You've called the branch "oxygen-
name for the package? (I'm afraid I'm not familiar with all the
terminology).
https:/
environs/
representing a product id in a known format.
The format is known to you perhaps, but definitely worth mentioning.
// String returns a product id in the following format:
with perhaps a reference to where the convention comes from.
https:/
environs/
Ubuntu release %q", ps.Release))
is this really worth it? it means we need to maintain the list of
release versions above, and it adds one more possible panic, when it
doesn't really seem like a panic is necessarily justified.
https:/
environs/
i find these type names quite confusing.
"ImageMetadata" vs "CloudImagedata" for example - there's no indication
in the name of the very different roles of these types.
I think I'd find it easier to read if the types read hierarchically from
the top down rather than the other way around (i.e. start with
CloudImageMetadata and end with ImageMetadata).
It would also help if each type had a comment describing the role of the
type and what it holds. That goes doubly for the Aliases field with its
rather complex semantics.
Despite the name "simple streams", the data format here is really not
very simple at all. Given that there appears to be no other resource
that documents it decently, I think we should do so here to save other
people going through the same reverse-engineering process.
Also, I think it might be better to unexport all but the types that are
returned by GetImageIdMetadata. Then the package documentation...
Scott Moser (smoser) wrote : | # |
https:/
File environs/
https:/
environs/
On 2013/05/03 14:58:08, rog wrote:
> This is not a great name. For a start, despite the launchpad project
name, what
> we're dealing with here are neither simple nor streams.
I'd argue they are streams. The data describes 'product' streams, ie
"sources of like things that get refreshed".
> You've called the branch "oxygen-
better name
> for the package? (I'm afraid I'm not familiar with all the
terminology).
I'd avoid using the word oxygen. that was a codename that encompassed
much more than just reading image-ids.
https:/
environs/
On 2013/05/03 14:58:08, rog wrote:
> This is not a great name. For a start, despite the launchpad project
name, what
> we're dealing with here are neither simple nor streams.
> Also, this package is specialised for image metadata. A
"simplestreams" package
> would be much more general than that.
> You've called the branch "oxygen-
better name
> for the package? (I'm afraid I'm not familiar with all the
terminology).
Please dont use the word oxygen, as that was a codename for something
larger than this. Perhaps 'product-streams' or 'product-
?
https:/
environs/
Ubuntu release %q", ps.Release))
On 2013/05/03 14:58:08, rog wrote:
> is this really worth it? it means we need to maintain the list of
release
> versions above, and it adds one more possible panic, when it doesn't
really seem
> like a panic is necessarily justified.
This is fallout of the data format. you dont have to be able to
generate the "release-name" -> "product name" map.
You *could* just go digging through for it. However, the index file
lists the products that are contained in it. so if you know that you can
avoid reading lots un-related items and looking at their tags for
'release: precise', 'arch: amd64'.
Ian Booth (wallyworld) wrote : | # |
Please take a look.
https:/
File environs/
https:/
environs/
parsing, and filtering Ubuntu image metadata in simplestreams format.
On 2013/05/03 14:58:08, rog wrote:
> // The simplestream package supports ...
Done.
https:/
environs/
On 2013/05/03 14:58:08, rog wrote:
> d (then it becomes a doc comment)
Done.
https:/
environs/
On 2013/05/03 14:58:08, rog wrote:
> This is not a great name. For a start, despite the launchpad project
name, what
> we're dealing with here are neither simple nor streams.
> Also, this package is specialised for image metadata. A
"simplestreams" package
> would be much more general than that.
> You've called the branch "oxygen-
better name
> for the package? (I'm afraid I'm not familiar with all the
terminology).
I've renamed the package "imagemetadata"
https:/
environs/
representing a product id in a known format.
On 2013/05/03 14:58:08, rog wrote:
> The format is known to you perhaps, but definitely worth mentioning.
> // String returns a product id in the following format:
> with perhaps a reference to where the convention comes from.
The main doc for simplestreams is in the doc/README file for the project
which I've now mentioned in the package doc string.
https:/
environs/
Ubuntu release %q", ps.Release))
On 2013/05/03 14:58:08, rog wrote:
> is this really worth it? it means we need to maintain the list of
release
> versions above, and it adds one more possible panic, when it doesn't
really seem
> like a panic is necessarily justified.
There's no easy way to get the mapping from release-
there's only ever going to need to be a new version added every 6
months. So the releaseVersions map is essentially static for all intents
and purposes. I think the approach I've used is reasonable in this case.
https:/
environs/
On 2013/05/03 14:58:08, rog wrote:
> i find these type names quite confusing.
> "ImageMetadata" vs "CloudImagedata" for example - there's no
indication in the
> name of the very different roles of these types.
> I think I'd find it easier to read if the types read hierarchically
from the top
> down rather than the other way around (i.e. start with
CloudImageMetadata and
> end with ImageMetad...
Martin Packman (gz) wrote : | # |
A couple of comments, I've not gone over everything.
https:/
File environs/
https:/
environs/
map[string]string{
Rather than adding another list of release names we'll need to keep
updated, I'd be tempted to depend on the distro-info-data package then
use /usr/share/
https:/
environs/
fmt.Errorf("cannot unmarshal JSON index metadata: %s", err.Error())
Including the URL included in this error would make debugging easier.
https:/
environs/
fmt.Errorf(
indices.Format, format)
Likewise, including the URL here would be useful.
https:/
File environs/
https:/
environs/
Also checking the error messages are sane here and contain useful
details, rather than just NotNil, would be nice.
Ian Booth (wallyworld) wrote : | # |
Please take a look.
https:/
File environs/
https:/
environs/
map[string]string{
On 2013/05/07 17:33:54, gz wrote:
> Rather than adding another list of release names we'll need to keep
updated, I'd
> be tempted to depend on the distro-info-data package then use
> /usr/share/
Done.
https:/
environs/
fmt.Errorf("cannot unmarshal JSON index metadata: %s", err.Error())
On 2013/05/07 17:33:54, gz wrote:
> Including the URL included in this error would make debugging easier.
Done.
https:/
environs/
fmt.Errorf(
indices.Format, format)
On 2013/05/07 17:33:54, gz wrote:
> Likewise, including the URL here would be useful.
Done.
Ian Booth (wallyworld) wrote : | # |
Please take a look.
Ian Booth (wallyworld) wrote : | # |
Please take a look.
Martin Packman (gz) wrote : | # |
LGTM with the changes
Roger Peppe (rogpeppe) wrote : | # |
LGTM with the below issues addressed.
thanks!
https:/
File environs/
https:/
environs/
NewImageConstra
ImageConstraint {
i'm not entirely convinced of the worth of this function - if i saw a
call to it, i'd have to look up which argument corresponded to which
attribute - a struct literal would be clearer.
https:/
environs/
map[string]string{
since this is a mutable global, this should be guarded with a
sync.Mutex.
How about, rather than caching the image in the image constraint, you
read the distro info once (I think it's not unreasonable to assume that
/usr/share/
just look up the info via the releaseVersions map.
var (
releaseVer
updatedRel
releaseVer
)
func (id *ImageConstraint) Id() (s string, err error) {
vers, err := releaseVersion(
etc
}
func releaseVersion(
releaseVer
defer releaseVersions
if vers, ok := releaseVersions
return vers, nil
}
if updatedReleaseV
return "", errNotFound
}
err := updateDistroInfo()
updatedRel
if err != nil {
return "", err
}
if vers, ok := releaseVersions
return vers, nil
}
return "", errNotFound
}
https:/
environs/
contain a LTS moniker so strip that out.
if len(parts) < 3 {
return fmt.Errorf("syntax error in ...")
}
(or just ignore lines with fewer than the right number of parts).
we don't want to panic because the file isn't correctly formatted.
https:/
environs/
matching the supplied region, arch etc.
this comment seems out of place.
how about just:
// ImageMetadata holds information about a particular cloud image.
?
https:/
environs/
is as specified.
s/specified.
https:/
environs/
GetImageIdMetad
*ImageConstraint) ([]*ImageMetadata, error) {
this name seems a little redundant now that ...
Ian Booth (wallyworld) wrote : | # |
Thanks for the review. I've implemented the suggestions but have a
question about the NewImageConstra
comment.
https:/
File environs/
https:/
environs/
NewImageConstra
ImageConstraint {
On 2013/05/15 17:06:26, rog wrote:
> i'm not entirely convinced of the worth of this function - if i saw a
call to
> it, i'd have to look up which argument corresponded to which attribute
- a
> struct literal would be clearer.
The reason for the function is because the returned ImageConstraint
struct contains an embedded CloudSpec struct and I didn't want to expose
this implementation detail. If I removed this function, then the user
will need to type:
ImageConstraint{
CloudSpec: CloudSpec{
Endpoint: endpoint,
Region: region,
},
Release: release,
Arch: arch,
Stream: stream,
}
The above seems kinda verbose and as I said, exposes the internals of
ImageConstraint. Are you ok with those issues if I remove the function?
https:/
environs/
map[string]string{
On 2013/05/15 17:06:26, rog wrote:
> since this is a mutable global, this should be guarded with a
sync.Mutex.
> How about, rather than caching the image in the image constraint, you
read the
> distro info once (I think it's not unreasonable to assume that
> /usr/share/
and just
> look up the info via the releaseVersions map.
> var (
> releaseVersions
> updatedReleaseV
> releaseVersions map[string]string
> )
> func (id *ImageConstraint) Id() (s string, err error) {
> vers, err := releaseVersion(
> etc
> }
> func releaseVersion(
> releaseVersions
> defer releaseVersions
> if vers, ok := releaseVersions
> return vers, nil
> }
> if updatedReleaseV
> return "", errNotFound
> }
> err := updateDistroInfo()
> updatedReleaseV
> if err != nil {
> return "", err
> }
> if vers, ok := releaseVersions
> return vers, nil
> }
> return "", errNotFound
> }
Done.
https:/
environs/
contain a LTS moniker so strip that out.
On 2013/05/15 17:06:26, rog wrote:
> if len(parts) < 3 {
> return fmt.Errorf("syntax error in ...")
> }
> (or just ignore lines with fewer than the right number of parts).
> we don't want to panic because the file isn't correctly formatted.
Done.
John A Meinel (jameinel) wrote : | # |
LGTM
https:/
File environs/
https:/
environs/
this file won't exist butr that's expected.
typo "butr"
Should this only trap for NoExist? Given we don't really want to fail,
I'm probably fine suppressing all errors, though.
https:/
environs/
Channeling my inner thumper, the name could be improved. :)
https:/
File environs/
https:/
environs/
[]jujutest.
It seems odd that the two bits of content are indented differently. I
would have expected either both to be left justified, or both indented.
(I realize the actual JSON unmarshalling just ignores the extra
whitespace.) Is there a reason to indent one but not the other?
William Reade (fwereade) wrote : | # |
sorry, but not LGTM without further discussion
https:/
File environs/
https:/
environs/
Can we call this "Series" please, for consistency with the rest of the
codebase?
https:/
environs/
This shouldn't be here, I think.
https:/
environs/
We cannot assume that arch is known at this stage. We need to get all
arches (or possibly only those matching an []string (ie the arches we
have tools available for)).
https:/
environs/
fmt.Errorf("Invalid Ubuntu release %q", ic.Release)
"invalid series %q"?
https:/
environs/
I'm not sure this is correct, because of (1) arch and (2) our inability
to determine for sure the release number corresponding to the desired
name. Surely we should be getting every products file that is not
*definitely* inappropriate, and filtering within those by arch and
release (series)?
Roger Peppe (rogpeppe) wrote : | # |
On 16 May 2013 01:24, <email address hidden> wrote:
> Thanks for the review. I've implemented the suggestions but have a
> question about the NewImageConstra
> comment.
>
>
>
> https:/
> File environs/
>
> https:/
> environs/
> NewImageConstra
> ImageConstraint {
> On 2013/05/15 17:06:26, rog wrote:
>>
>> i'm not entirely convinced of the worth of this function - if i saw a
>
> call to
>>
>> it, i'd have to look up which argument corresponded to which attribute
>
> - a
>>
>> struct literal would be clearer.
>
>
> The reason for the function is because the returned ImageConstraint
> struct contains an embedded CloudSpec struct and I didn't want to expose
> this implementation detail. If I removed this function, then the user
> will need to type:
>
> ImageConstraint{
> CloudSpec: CloudSpec{
> Endpoint: endpoint,
> Region: region,
> },
> Release: release,
> Arch: arch,
> Stream: stream,
> }
>
> The above seems kinda verbose and as I said, exposes the internals of
> ImageConstraint. Are you ok with those issues if I remove the function?
Yes. It's all exported, so none of it counts as internals.
It's a pity about the CloudSpec: Cloudspec thing though.
Another possibility would be to export a struct that had all
the fields at one level, and translate into an internal form
(or use that one-level form internally). It hardly matters
though - this is not a function that's going to be called often,
AFAICS.
> https:/
> environs/
> map[string]string{
> On 2013/05/15 17:06:26, rog wrote:
>>
>> since this is a mutable global, this should be guarded with a
>
> sync.Mutex.
>
>> How about, rather than caching the image in the image constraint, you
>
> read the
>>
>> distro info once (I think it's not unreasonable to assume that
>> /usr/share/
>
> and just
>>
>> look up the info via the releaseVersions map.
>
>
>> var (
>> releaseVersions
>> updatedReleaseV
>> releaseVersions map[string]string
>> )
>
>
>> func (id *ImageConstraint) Id() (s string, err error) {
>> vers, err := releaseVersion(
>> etc
>> }
>
>
>> func releaseVersion(
>> releaseVersions
>> defer releaseVersions
>> if vers, ok := releaseVersions
>> return vers, nil
>> }
>> if updatedReleaseV
>> return "", errNotFound
>> }
>> err := updateDistroInfo()
>> updatedReleaseV
>> if err != nil {
>> return "", err
>> }
>> if vers, ok := releaseVersions
>> return vers, ...
Ian Booth (wallyworld) wrote : | # |
As discussed with William on a hangout, this code is just for the
parsing of the simplestreams data. The multiple arch stuff is plugged in
downstream.
https:/
File environs/
https:/
environs/
On 2013/05/16 07:54:54, fwereade wrote:
> Can we call this "Series" please, for consistency with the rest of the
codebase?
Done.
https:/
environs/
fmt.Errorf("Invalid Ubuntu release %q", ic.Release)
On 2013/05/16 07:54:54, fwereade wrote:
> "invalid series %q"?
Done.
https:/
environs/
this file won't exist butr that's expected.
On 2013/05/16 06:18:52, jameinel wrote:
> typo "butr"
> Should this only trap for NoExist? Given we don't really want to fail,
I'm
> probably fine suppressing all errors, though.
Yeah, the consensus seemed to be to not fail
Ian Booth (wallyworld) wrote : | # |
*** Submitted:
Initial simple streams support
This branch adds a simple streams package to allow image metadata to be
retrieved
from json encoded metadata. The process starts by reading an index file
from a known
URL. The URL will be a static URL for EC2
(http://
and for openstack can be published in the keystone catalog or provided
from an env
config. This latter work to plug in the URLs from the providers will be
done separately.
This branch is just the core capability.
The data model used is a single implementation for all providers (so far
EC2 and Openstack).
Various providers use many different metadata attributes but these are
ignored - only the common
ones like id, region, arch, vtype etc are required to be read and used
for filtering, and only
the id is ultimately required by the calling business logic.
The tests are written to use example data to check the various
processing scenarios, but
also can be run live against canonistack or EC2. The live tests are a
subset of the local
tests and check that basic parsing and image lookup works with real data
from two providers
with significant differences in the exact data attributes used. The
canonistack metadata has
been published in a known public bucket.
R=rog, smoser, gz, jameinel, fwereade
CC=
https:/
Reviewers: mp+162308_ code.launchpad. net,
Message:
Please take a look.
Description:
Initial simple streams support
This branch adds a simple streams package to allow image metadata to be cloud-images. ubuntu. com/releases),
retrieved
from json encoded metadata. The process starts by reading an index file
from a known
URL. The URL will be a static URL for EC2
(http://
and for openstack can be published in the keystone catalog or provided
from an env
config. This latter work to plug in the URLs from the providers will be
done separately.
This branch is just the core capability.
The data model used is a single implementation for all providers (so far
EC2 and Openstack).
Various providers use many different metadata attributes but these are
ignored - only the common
ones like id, region, arch, vtype etc are required to be read and used
for filtering, and only
the id is ultimately required by the calling business logic.
The tests are written to use example data to check the various
processing scenarios, but
also can be run live against canonistack or EC2. The live tests are a
subset of the local
tests and check that basic parsing and image lookup works with real data
from two providers
with significant differences in the exact data attributes used. The
canonistack metadata has
been published in a known public bucket.
https:/ /code.launchpad .net/~wallyworl d/juju- core/oxygen- data-parse/ +merge/ 162308
(do not edit description out of merge proposal)
Please review this at https:/ /codereview. appspot. com/9138044/
Affected files: simplestreams/ simplestreams. go simplestreams/ simplestreams_ test.go
A [revision details]
A environs/
A environs/