Merge lp://staging/~vila/uci-engine/ideas into lp://staging/uci-engine

Proposed by Vincent Ladeuil
Status: Work in progress
Proposed branch: lp://staging/~vila/uci-engine/ideas
Merge into: lp://staging/uci-engine
Diff against target: 193 lines (+158/-1)
4 files modified
.bzrignore (+4/-0)
docs/Makefile (+8/-1)
docs/architecture.rst (+134/-0)
docs/images/ticket-worker.dot (+12/-0)
To merge this branch: bzr merge lp://staging/~vila/uci-engine/ideas
Reviewer Review Type Date Requested Status
Canonical CI Engineering Pending
Review via email: mp+213601@code.staging.launchpad.net

Description of the change

Throwing ideas to fuel discussion, not proposing to merge.

Best read with:

$ (cd docs ; make html)
$ firefox docs/_build/html/architecture.html

so you get the pretty picture.

The ticket worker is intended to implement various workflows defining isolated tasks. It could replace jenkins usage by the lander and provide a more agile architecture.

The main targeted change is to define the API via the messages exchanged between the workers.

To post a comment you must log in.
Revision history for this message
Francis Ginther (fginther) wrote :

This is nice and straight forward. Well thought out and described (not half baked like mine have been :-) ).

To perform different workflows, can I assume that the ticket worker is created with that knowledge as an input from the ticket system? For example, ticket 9 needs to perform task A/B/C and ticket 10 needs to do just A/B. Also, does this convey what tasks can be retried and which ones fail the ticket?

You mention "the task send an outgoing message listing the outputs to another queue." What (or who's) queue is this sent to? Is it a queue owned by this ticket worker? Or is the imagebuilder send a message to the test-runner's queue? Or...

I've also been thinking about the possibility of duplicates tasks. I think with a lot of what we're doing, there's the possibility of a worker or task running but it's unable to communicate and so just merrily proceeds. Meanwhile, the owner of the worker declares it dead and starts up a new one to replace it. I'm not confident we can completely eradicate duplicates so am considering how to live with these in the back of my mind.

Revision history for this message
Vincent Ladeuil (vila) wrote :
Download full text (3.4 KiB)

Thanks for the thoughtful review, I have some answers below and will
incorporate them in the proposal asap.

> To perform different workflows, can I assume that the ticket worker is created
> with that knowledge as an input from the ticket system? For example, ticket 9
> needs to perform task A/B/C and ticket 10 needs to do just A/B. Also, does
> this convey what tasks can be retried and which ones fail the ticket?

A ticket is associated with a single workflow. The workflow defines which
tasks should be done, their order and how/if they are retried. I first
thought that a task could only be retried, but I think we may want to allow
the case where a previous task is retried instead. For example, the test run
fails but we re-try the image building. I think we still want to keep the
workflow as an ordered list though.

> You mention "the task send an outgoing message listing the outputs to
> another queue." What (or who's) queue is this sent to?

Right, each worker has two queues:
- the incoming queue is shared across workers,
- the ouput queue is unique.

Just like we do today (and just like Evan did).

Or may be we don't need to make the output queue unique ? I.e. the ticket id
can be either part of the queue name or part of the message.

> Is it a queue owned by this ticket worker?

If it's unique, yes. Or rather, it's a queue between the ticket worker and
the task worker.

> Or is the imagebuilder send a message to the test-runner's queue?

No, task workers communicate via the ticket worker, never directly. That's
exactly the coupling we want to avoid.

> Or...

Or we define a single queue between the classes of workers instead of having
them specific to a ticket. Now that you've asked... I think this may be
simpler as it would significantly reduce the number of queues, making the
controller work easier.

>
> I've also been thinking about the possibility of duplicates tasks. I think
> with a lot of what we're doing, there's the possibility of a worker or task
> running but it's unable to communicate and so just merrily proceeds.

/me nods

> Meanwhile, the owner of the worker declares it dead and starts up a new
> one to replace it. I'm not confident we can completely eradicate
> duplicates so am considering how to live with these in the back of my
> mind.

We cannot completely avoid duplicate tasks at the system level so it's
possible (after some network issue or worker death) that the same task is
excuted twice in parallel or that some artifacts has already been created
(though we may want to put constraints on the task worker to upload
artifacts when the job is done to reduce potential issues).

In that case, the controller is responsible to ignore the duplicate when
detected.

I was thinking that the ticket worker would create a message with (ticket
id, task id, task number), with task number being incremented to make it
unique for a ticket. This gives a way to represent the multiple attempts.

But now that you mention duplicate taks, I realized this may not be enough
to create unique identifiers for the data store. So, the task worker should
create its artifacts with its node id (in addition to the above) and the
ticket worker will h...

Read more...

Unmerged revisions

419. By Vincent Ladeuil

Add a state automaton diagram for the ticket worker.

418. By Vincent Ladeuil

Pointers to rabbit for the never failing cluster configuration.

417. By Vincent Ladeuil

Brain dump for phase-1.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches