Merge ~racb/git-ubuntu:rich-history-from-changes-file into git-ubuntu:master
- Git
- lp:~racb/git-ubuntu
- rich-history-from-changes-file
- Merge into master
Status: | Merged |
---|---|
Merged at revision: | be7fa10f2ef0d8d7e8437c61593400d6c0a4a37b |
Proposed branch: | ~racb/git-ubuntu:rich-history-from-changes-file |
Merge into: | git-ubuntu:master |
Diff against target: |
1138 lines (+858/-56) 6 files modified
debian/control (+1/-0) gitubuntu/importer.py (+369/-34) gitubuntu/importer_test.py (+472/-22) gitubuntu/source_information.py (+13/-0) setup.py (+1/-0) snap/snapcraft.yaml (+2/-0) |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Bryce Harrington | Approve | ||
Server Team CI bot | continuous-integration | Approve | |
git-ubuntu developers | Pending | ||
Review via email: mp+396756@code.staging.launchpad.net |
Commit message
Make Jenkins happy
Description of the change
Robie Basak (racb) wrote : | # |
Server Team CI bot (server-team-bot) wrote : | # |
FAILED: Continuous integration, rev:fd6bc2c91e9
No commit message was specified in the merge proposal. Click on the following link and set the commit message (if you want jenkins to rebuild you need to trigger it yourself):
https:/
https:/
Executed test runs:
SUCCESS: VM Setup
SUCCESS: Build
SUCCESS: Unit Tests
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
Server Team CI bot (server-team-bot) wrote : | # |
FAILED: Continuous integration, rev:db9c906fb0c
No commit message was specified in the merge proposal. Click on the following link and set the commit message (if you want jenkins to rebuild you need to trigger it yourself):
https:/
https:/
Executed test runs:
SUCCESS: VM Setup
SUCCESS: Build
SUCCESS: Unit Tests
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
Bryce Harrington (bryce) wrote : | # |
Overall from first pass review looks pretty solid. I really like the approach of relying on the .changes file for carrying the Vcs-* stuff. Not having to do upload tagging will be a welcome usability improvement.
I suppose the main thing you're looking for feedback on is about using git cli vs pygit2 for retrieving arbitrary urls. I see what you mean that the api as-is requires creating a local repository object first; all the examples online I see for this are taking that approach. I suppose you could define a namespace to import the temporary repositories into, however I also don't see too much problem just calling the git cli directly in this case.
Taking a step back, git-ubuntu seems to end up needing to use this type of workaround often with pygit2, which makes me wonder if it would be worth the effort to introduce a wrapper class with a convenience API that hides the details of whether pygit2 or the git cli is used? Seems like it would be cleaner, although not sure it'd be worth the effort. I vaguely recall we discussed something like this before... Anyway just a thought.
Beyond that, a few small notes fwiw:
84e0a21bc1ad004
- Good refactoring to move warnings up a level
- test_validate_
9ae83e0d9a6e80e
- I wonder if validate_
- "to that upload tag" -> "to the upload tag"
ae13bed802f8c46
- LGTM
db9c906fb0cc4cb
- If you plan to add pydocs for the parse_* routines, it could handy to have an example of the line(s) the routine will parse as valid. Or even just having that present in a comment would be useful.
- Presumably this relies on code changes to dput and/or launchpad, so links to appropriate references (bugs or commits) in the changelog message or comments might be useful for us to check on status of those external development tasks later.
- Is the output of get_changes_
- Should urllib.
- Alternate way to check for keys:
- has_required_fields = all(
- field in changes
- for field in ['Vcs-Git', 'Vcs-Git-Commit', 'Vcs-Git-Refs']
- )
- if not has_required_
- return None
+ if not changes.keys() >= ['Vcs-Git', 'Vcs-Git-Commit', 'Vcs-Git-Refs']:
+ return None
- Test cases are a bit hard to grok
Robie Basak (racb) wrote : | # |
On Tue, Jan 26, 2021 at 12:25:16AM -0000, Bryce Harrington wrote:
> I suppose the main thing you're looking for feedback on is about using git cli vs pygit2 for retrieving arbitrary urls. I see what you mean that the api as-is requires creating a local repository object first; all the examples online I see for this are taking that approach. I suppose you could define a namespace to import the temporary repositories into, however I also don't see too much problem just calling the git cli directly in this case.
Yes, and thanks. But also, are you happy with my approach to input
validation, given Colin's comments? A couple of pitfalls, for example,
are ensuring that Vcs-Git-Refs: does not contain ':'s, since these could
be used to overwrite arbitrary local refs, and that Vcs-Git-Url: doesn't
use a scheme like ext::, which could be used to execute arbitrary code.
Though these would need Launchpad to give me a bad/malicious changes
file, and Launchpad is already limiting what it accepts to changes files
that have been signed by uploaders. Still, it seemed appropriate to
check these for ourselves, rather than relying on uploaders, since the
importer shouldn't be able to be subverted by arbitrary uploaders
either.
> Taking a step back, git-ubuntu seems to end up needing to use this type of workaround often with pygit2, which makes me wonder if it would be worth the effort to introduce a wrapper class with a convenience API that hides the details of whether pygit2 or the git cli is used? Seems like it would be cleaner, although not sure it'd be worth the effort. I vaguely recall we discussed something like this before... Anyway just a thought.
This does exist - the GitUbuntuRepository class. However I'm reluctant
to add more to that class when there's just one case where a particular
feature is needed. That class provides the git_run method, which we are
using - so it's not any harder to audit "exceptional" git calls this
way.
> Beyond that, a few small notes fwiw:
>
> 84e0a21bc1ad004
> - Good refactoring to move warnings up a level
> - test_validate_
I see what you're saying. I'm on the fence. I'm not sure it'd make
things clearer, since we'd have to provide a None (or equivalent) for
when an exception is raised, which will then make it look like a return
value is expected. I think this is an either-way thing so I'd like to
leave it as it is.
> 9ae83e0d9a6e80e
> - I wonder if validate_
I think the spec explains the requirement? I'm reluctant to have it
explained in two places - I'd rather that one refers to the other.
> - "to that upload tag" -> "to the upload tag"
Will fix - thanks!
> ae13bed802f8c46
> - LGTM
>
> db9c906fb0cc4cb
> - If you plan to add pydocs for...
Bryce Harrington (bryce) wrote : | # |
On Tue, Jan 26, 2021 at 03:18:22PM -0000, Robie Basak wrote:
> On Tue, Jan 26, 2021 at 12:25:16AM -0000, Bryce Harrington wrote:
> > I suppose the main thing you're looking for feedback on is about using git cli vs pygit2 for retrieving arbitrary urls. I see what you mean that the api as-is requires creating a local repository object first; all the examples online I see for this are taking that approach. I suppose you could define a namespace to import the temporary repositories into, however I also don't see too much problem just calling the git cli directly in this case.
>
> Yes, and thanks. But also, are you happy with my approach to input
> validation, given Colin's comments? A couple of pitfalls, for example,
> are ensuring that Vcs-Git-Refs: does not contain ':'s, since these could
> be used to overwrite arbitrary local refs, and that Vcs-Git-Url: doesn't
> use a scheme like ext::, which could be used to execute arbitrary code.
Yeah I like the approach you're taking of testing positively for allowed
characters. Then, the restrictions can be relaxed if/when new chars or
special cases are encountered.
For the URL regex, I ran it over Vcs-Git lines in the control files of
packages I have checked out locally:
for file in $(find . -name "control"); do grep Vcs-Git ${file} | xargs ~/check-re.py; done
Here's a few examples of types of things that didn't match:
https:/
git:/
<email address hidden>
https:/
https:/
The last two look more like errors than something that should be
supported. + and ^git might be worth considering.
Are you aware of Vcs-Git strings with ? in them? That char may be
pretty rare for git urls.
I suppose the more fundamental question than how git-ubuntu handles
these cases, is how they'll be handled by dput->launchpad. Presumably
they'll be filtering the links to an allowed subset? Or
quoting/escaping invalid chars? If we can assume LP is filtering the
weird stuff, that of course simplifies what needs to be tested for
here.
I notice you're including * as an allowed character for git refs; is
that intended, like for supporting wildcarding in refs?
Could you do a sampling of refs from the imported packages to see what
range of characters this'll likely be seeing in practice? (Might also
be useful to run the dump through to test it.) Old projects and
projects that have been migrated from one VCS to another sometimes have
weird bits in their ancient history.
Minor cosmetic, but in two of the regexes you list letters then numbers,
and in the third the reverse. This would be more consistent:
+VCS_GIT_
Technically I suppose you really only need [a-f0-9]{40} for SHAs?
I don't think your patch is validating LP user or project names, but saw
it came up in the IRC discussion. Colin's advice matches my own
experiences. I had tried implementing some validators on bug reports
way back when but Launchpad's acceptance algorithms are...
Server Team CI bot (server-team-bot) wrote : | # |
FAILED: Continuous integration, rev:0a97e045cce
https:/
Executed test runs:
SUCCESS: VM Setup
FAILED: Build
Click here to trigger a rebuild:
https:/
Server Team CI bot (server-team-bot) wrote : | # |
PASSED: Continuous integration, rev:0a97e045cce
https:/
Executed test runs:
SUCCESS: VM Setup
SUCCESS: Build
SUCCESS: Unit Tests
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
Server Team CI bot (server-team-bot) wrote : | # |
PASSED: Continuous integration, rev:efad8227a09
https:/
Executed test runs:
SUCCESS: VM Setup
SUCCESS: Build
SUCCESS: VM Reset
SUCCESS: Unit Tests
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
Robie Basak (racb) wrote : | # |
I think I've addressed all outstanding feedback so far. Some specifics:
On Tue, Jan 26, 2021 at 10:27:16PM -0000, Bryce Harrington wrote:
> For the URL regex, I ran it over Vcs-Git lines in the control files of
> packages I have checked out locally:
>
> for file in $(find . -name "control"); do grep Vcs-Git ${file} | xargs ~/check-re.py; done
>
> Here's a few examples of types of things that didn't match:
>
> https:/
> git://anonscm.
> <email address hidden>
> https:/
> https:/
>
> The last two look more like errors than something that should be
> supported. + and ^git might be worth considering.
I'm only accepting https:// URLs for now. I've incorporated all the
above patterns into existing tests, including the ones that should be
rejected for now.
> Are you aware of Vcs-Git strings with ? in them? That char may be
> pretty rare for git urls.
Good shout. Added into the regexp and added a test.
> I suppose the more fundamental question than how git-ubuntu handles
> these cases, is how they'll be handled by dput->launchpad. Presumably
> they'll be filtering the links to an allowed subset? Or
> quoting/escaping invalid chars? If we can assume LP is filtering the
> weird stuff, that of course simplifies what needs to be tested for
> here.
Actually dput/Launchpad won't filter anything. Whatever the uploader
puts in, we'll get out at the other end. I care from the importer's end
that we aren't subverted by malicious input, and that we are able to
fetch the URL if we decide it's acceptable (barring network issues).
> I notice you're including * as an allowed character for git refs; is
> that intended, like for supporting wildcarding in refs?
It was intended, but I've now removed that following Colin's feedback,
and changed the header name from Vcs-Git-Refs to Vcs-Git-Ref. I've
adjusted tests accordingly.
> Could you do a sampling of refs from the imported packages to see what
> range of characters this'll likely be seeing in practice? (Might also
> be useful to run the dump through to test it.) Old projects and
> projects that have been migrated from one VCS to another sometimes have
> weird bits in their ancient history.
I'm not expecting to get refs like from existing Vcs-Git repositories.
These aren't normally based on git-ubuntu branches, and probably will
continue not to be. Instead I expect people to branch from git-ubuntu
branches, make their changes, push to a new ref somewhere in Launchpad,
and then provide those in Vcs-Git/
changes file. This might well mean that Vcs-Git in changes files for
git-ubuntu will typically not match the Vcs-Git in debian/control.
My goal in testing my validation in tests for parsing/validating these
fields is therefore slightly different. What URLs and refs will
git-ubuntu users push to, _on Launchpad_ (as this will be an initial
requirement), and will we correctly validate all those?
My thinking for a belt and braces approach here i...
Bryce Harrington (bryce) wrote : | # |
Thanks, the updates look good and answers to all the questions sound fine.
I did one more code review pass and didn't spot anything to mention.
They critical piece for this patch is the filtration of potential malicious input. It looks like more than enough thought has gone into this versus the (probably low) level of risk we're likely to see in practice.
LGTM, +1.
There was an error fetching revisions from git servers. Please try again in a few minutes. If the problem persists, contact Launchpad support.
Request for some input from Bryce please. I suggest you start from the fetch_rich_ history_ from_changes_ file() docstring which I think should be good enough to give you an intro to what I'm up to. See also: https:/ /irclogs. ubuntu. com/2021/ 01/22/% 23launchpad. html