Merge into trunk : ideas : Code : Ubuntu CI Engine

Status:	Work in progress
Proposed branch:	lp://staging/~vila/uci-engine/ideas
Merge into:	lp://staging/uci-engine
Diff against target:	193 lines (+158/-1) 4 files modified .bzrignore (+4/-0) docs/Makefile (+8/-1) docs/architecture.rst (+134/-0) docs/images/ticket-worker.dot (+12/-0)
To merge this branch:	bzr merge lp://staging/~vila/uci-engine/ideas
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Canonical CI Engineering		2014-04-01	Pending
Review via email: mp+213601@code.staging.launchpad.net

Description of the change

Throwing ideas to fuel discussion, not proposing to merge.

Best read with:

$ (cd docs ; make html)
$ firefox docs/_build/html/architecture.html

so you get the pretty picture.

The ticket worker is intended to implement various workflows defining isolated tasks. It could replace jenkins usage by the lander and provide a more agile architecture.

The main targeted change is to define the API via the messages exchanged between the workers.

Revision history for this message

Francis Ginther (fginther) wrote on 2014-04-02:

#

This is nice and straight forward. Well thought out and described (not half baked like mine have been :-) ).

To perform different workflows, can I assume that the ticket worker is created with that knowledge as an input from the ticket system? For example, ticket 9 needs to perform task A/B/C and ticket 10 needs to do just A/B. Also, does this convey what tasks can be retried and which ones fail the ticket?

You mention "the task send an outgoing message listing the outputs to another queue." What (or who's) queue is this sent to? Is it a queue owned by this ticket worker? Or is the imagebuilder send a message to the test-runner's queue? Or...

I've also been thinking about the possibility of duplicates tasks. I think with a lot of what we're doing, there's the possibility of a worker or task running but it's unable to communicate and so just merrily proceeds. Meanwhile, the owner of the worker declares it dead and starts up a new one to replace it. I'm not confident we can completely eradicate duplicates so am considering how to live with these in the back of my mind.

Reply

Revision history for this message

Vincent Ladeuil (vila) wrote on 2014-04-02:

#

Download full text (3.4 KiB)

Thanks for the thoughtful review, I have some answers below and will
incorporate them in the proposal asap.

> To perform different workflows, can I assume that the ticket worker is created
> with that knowledge as an input from the ticket system? For example, ticket 9
> needs to perform task A/B/C and ticket 10 needs to do just A/B. Also, does
> this convey what tasks can be retried and which ones fail the ticket?

A ticket is associated with a single workflow. The workflow defines which
tasks should be done, their order and how/if they are retried. I first
thought that a task could only be retried, but I think we may want to allow
the case where a previous task is retried instead. For example, the test run
fails but we re-try the image building. I think we still want to keep the
workflow as an ordered list though.

> You mention "the task send an outgoing message listing the outputs to
> another queue." What (or who's) queue is this sent to?

Right, each worker has two queues:
- the incoming queue is shared across workers,
- the ouput queue is unique.

Just like we do today (and just like Evan did).

Or may be we don't need to make the output queue unique ? I.e. the ticket id
can be either part of the queue name or part of the message.

> Is it a queue owned by this ticket worker?

If it's unique, yes. Or rather, it's a queue between the ticket worker and
the task worker.

> Or is the imagebuilder send a message to the test-runner's queue?

No, task workers communicate via the ticket worker, never directly. That's
exactly the coupling we want to avoid.

> Or...

Or we define a single queue between the classes of workers instead of having
them specific to a ticket. Now that you've asked... I think this may be
simpler as it would significantly reduce the number of queues, making the
controller work easier.

>
> I've also been thinking about the possibility of duplicates tasks. I think
> with a lot of what we're doing, there's the possibility of a worker or task
> running but it's unable to communicate and so just merrily proceeds.

/me nods

> Meanwhile, the owner of the worker declares it dead and starts up a new
> one to replace it. I'm not confident we can completely eradicate
> duplicates so am considering how to live with these in the back of my
> mind.

We cannot completely avoid duplicate tasks at the system level so it's
possible (after some network issue or worker death) that the same task is
excuted twice in parallel or that some artifacts has already been created
(though we may want to put constraints on the task worker to upload
artifacts when the job is done to reduce potential issues).

In that case, the controller is responsible to ignore the duplicate when
detected.

I was thinking that the ticket worker would create a message with (ticket
id, task id, task number), with task number being incremented to make it
unique for a ticket. This gives a way to represent the multiple attempts.

But now that you mention duplicate taks, I realized this may not be enough
to create unique identifiers for the data store. So, the task worker should
create its artifacts with its node id (in addition to the above) and the
ticket worker will h...

Unmerged revisions

419. By Vincent Ladeuil on 2014-03-31: Add a state automaton diagram for the ticket worker.
418. By Vincent Ladeuil on 2014-03-29: Pointers to rabbit for the never failing cluster configuration.
417. By Vincent Ladeuil on 2014-03-29: Brain dump for phase-1.

Ubuntu CI Engine

Merge lp://staging/~vila/uci-engine/ideas into lp://staging/uci-engine

Commit message

Description of the change

Unmerged revisions

Preview Diff

Subscribers