Merge lp://staging/~michael.nelson/launchpad/db-changes-build-generalisation-new into lp://staging/launchpad/db-devel

Proposed by Michael Nelson
Status: Merged
Merged at revision: 9405
Proposed branch: lp://staging/~michael.nelson/launchpad/db-changes-build-generalisation-new
Merge into: lp://staging/launchpad/db-devel
Diff against target: 736 lines (+484/-113)
5 files modified
database/sampledata/current-dev.sql (+112/-44)
database/sampledata/current.sql (+112/-44)
database/schema/comments.sql (+26/-17)
database/schema/patch-2207-57-0.sql (+210/-0)
database/schema/security.cfg (+24/-8)
To merge this branch: bzr merge lp://staging/~michael.nelson/launchpad/db-changes-build-generalisation-new
Reviewer Review Type Date Requested Status
Björn Tillenius (community) db Approve
Stuart Bishop (community) db Approve
Review via email: mp+25594@code.staging.launchpad.net

Description of the change

This branch is the schema change required for:

https://blueprints.edge.launchpad.net/soyuz/+spec/build-generalisation
https://dev.launchpad.net/LEP/GeneralBuildHistories

A slightly out-of-date visual for the schema change can be seen here:
http://people.ubuntu.com/~wgrant/launchpad/buildfarm/new-build-model-again.png

Overview
========
This schema change splits the current Build table into three: BuildFarmJob, PackageBuild and BinaryPackageBuild, and then migrates the data across. It also updates foreign key references, and some old views that are still used by the codebase to use the new tables.

I did chat with stub about this and he said with only a few hundred thousand records, performance isn't an issue, but it would be good to get him to confirm that after reading through the patch (we'll also test it on dogfood).

There is a pipeline of over 7k lines of code changes that need to land with this that ensures all the tests pass. It's all approved, currently ending with:

https://code.edge.launchpad.net/~michael.nelson/launchpad/567922-binarypackagebuild-new-table-8/+merge/25515

I just need to merge those changes with a fresh db-devel and resolve conflicts, before getting this on dogfood for some serious testing.

To post a comment you must log in.
Revision history for this message
Michael Nelson (michael.nelson) wrote :

Hi stub or BjornT:

When I ran this schema change (together with the code changes) through ec2 test, I see the following error in test_sampledata - it seems related to moving the old build table to the todrop namespace rather than deleting it leaves the constraints in place - I assume I just remove the constraints on todrop.build?

http://pastebin.ubuntu.com/436693/

Also, there are a bunch of other failures:

http://pastebin.ubuntu.com/436699/

which are (I assume) because listReferences(cursor(), 'libraryfilealias', 'id') still returns 'build' as a from table?
http://pastebin.ubuntu.com/436710/

Revision history for this message
Michael Nelson (michael.nelson) wrote :

Never mind... dropping the constraints on todrop.build worked.

Revision history for this message
Stuart Bishop (stub) wrote :

Yes, you need to manually drop the constraints in the DB patch I'm afraid.

Revision history for this message
Stuart Bishop (stub) wrote :
Download full text (3.4 KiB)

This looks great. Comments, naming standards, indexes - all pretty good.

+CREATE TABLE BinaryPackageBuild (
+ id serial PRIMARY KEY,
+ package_build integer NOT NULL CONSTRAINT binarypackagebuild__package_build__fk REFERENCES packagebuild,
+ distro_arch_series integer NOT NULL CONSTRAINT binarypackagebuild__distro_arch_series__fk REFERENCES distroarchseries,
+ source_package_release integer NOT NULL CONSTRAINT binarypackagebuild__source_package_release__fk REFERENCES sourcepackagerelease

I believe Sourcepackage is one word in Launchpad's vocabulary, so that last column should be sourcepackage_release. Confirm with Bjorn.

+CREATE TABLE BinaryPackageBuild (
+ id serial PRIMARY KEY,
+ package_build integer NOT NULL CONSTRAINT binarypackagebuild__package_build__fk REFERENCES packagebuild,
+ distro_arch_series integer NOT NULL CONSTRAINT binarypackagebuild__distro_arch_series__fk REFERENCES distroarchseries,
+ source_package_release integer NOT NULL CONSTRAINT binarypackagebuild__source_package_release__fk REFERENCES sourcepackagerelease
+);
+
+CREATE UNIQUE INDEX binarypackagebuild__package_build__idx ON binarypackagebuild(package_build);
+-- Indexes that we can no longer create:
+-- CREATE UNIQUE INDEX binarypackagebuild__distro_arch_series_uniq__idx ON binarypackagebuild(distro_arch_series, source_package_release, archive)
+-- CREATE INDEX binarypackagebuild__distro_arch_series__status__idx ON binarypackagebuild(distro_arch_series, status?)
+-- CREATE INDEX binarypackagebuild__distro_arch_series__date_finished ON binarypackagebuild(distro_arch_series, date_finished)
+CREATE INDEX binarypackagebuild__source_package_release_idx ON binarypackagebuild(source_package_release);

We might need an index on distro_arch_series too

CREATE INDEX binarypackagebuild__distro_arch_series__idx ON BinaryPackageBuild(distro_arch_series);

+-- Step 2
+-- Migrate the current data from the build table to the newly created
+-- relationships.
+CREATE OR REPLACE FUNCTION migrate_build_rows() RETURNS integer
+LANGUAGE plpgsql AS
+$$
+DECLARE
+ build_info RECORD;
+ rows_migrated integer;
+ buildfarmjob_id integer;
+ packagebuild_id integer;
+BEGIN
+ rows_migrated := 0;
+ FOR build_info IN
+ SELECT
+ build.id,
+ build.processor,
+ archive.require_virtualized AS virtualized,
+ -- Currently we do not know if a build was virtual or not? (it's
+ -- only on the archive and the builder, both of which can
+ -- change). IBuild.is_virtualized just queries the archive.
+ build.datecreated AS date_created,
+ (build.datebuilt - build.buildduration) AS date_started,
+ build.datebuilt AS date_finished,
+ build.date_first_dispatched,
+ build.builder,
+ build.buildstate AS status,
+ build.buildlog AS log,
+ build.archive,
+ build.pocket,
+ build.upload_log,
+ build.dependencies,
+ build.distroarchseries AS distro_arch_series,
+ build.sourcepackagerelease AS source_package_release,
+ build.build_warnings -- We don't seem to use this in LP code at all?
+ FROM
+ build JOIN archive ON build.archive = archive.id

We have to have 'ORDER BY Build.id' at the end here. Any data migration in a DB patch is run individually on each database replica. We have to guarantee that it will run iden...

Read more...

review: Approve (db)
Revision history for this message
Michael Nelson (michael.nelson) wrote :
Download full text (5.2 KiB)

On Thu, May 20, 2010 at 4:29 PM, Stuart Bishop
<email address hidden> wrote:
> Review: Approve db
> This looks great. Comments, naming standards, indexes - all pretty good.

Thanks Stuart.

>
>
> +CREATE TABLE BinaryPackageBuild (
> + id serial PRIMARY KEY,
> + package_build integer NOT NULL CONSTRAINT binarypackagebuild__package_build__fk REFERENCES packagebuild,
> + distro_arch_series integer NOT NULL CONSTRAINT binarypackagebuild__distro_arch_series__fk REFERENCES distroarchseries,
> + source_package_release integer NOT NULL CONSTRAINT binarypackagebuild__source_package_release__fk REFERENCES sourcepackagerelease
>
> I believe Sourcepackage is one word in Launchpad's vocabulary, so that last column should be sourcepackage_release. Confirm with Bjorn.

The current devel code has sourcepackagerelease (as well as lots of
other non-underscored vars seen in the schema change - datecreated,
buildlog, etc.), and part of this change is to start using underscores
consistently.

Grepping for "source_package" in the tip of db-devel has 1113 hits
(skip over the wadl ;) ) for variable names "source_package" in
registry, as well as things like "source_package_name",
"source_package_format" etc. (admittedly, there are many more for
"sourcepackage", due to the old non-underscored variable and column
names (sourcepackagename, sourcepackagerelease etc.)

Since no one commented on the suggestion in the diagram at:

http://people.ubuntu.com/~wgrant/launchpad/buildfarm/new-build-model-again.png

I've gone ahead and code source_package_release through the pipeline
of changes. Bjorn, if you feel strongly about it, I can change the
schema and the code (or just change the schema and field definition
for the moment?), but I'd prefer to leave it for the moment, and if
you'd like the change, do a separate branch later to standardise
(we'll need to do this anyway as we've lots of other classes that
still use names without any underscores).

>
>
> +CREATE TABLE BinaryPackageBuild (
> + id serial PRIMARY KEY,
> + package_build integer NOT NULL CONSTRAINT binarypackagebuild__package_build__fk REFERENCES packagebuild,
> + distro_arch_series integer NOT NULL CONSTRAINT binarypackagebuild__distro_arch_series__fk REFERENCES distroarchseries,
> + source_package_release integer NOT NULL CONSTRAINT binarypackagebuild__source_package_release__fk REFERENCES sourcepackagerelease
> +);
> +
> +CREATE UNIQUE INDEX binarypackagebuild__package_build__idx ON binarypackagebuild(package_build);
> +-- Indexes that we can no longer create:
> +-- CREATE UNIQUE INDEX binarypackagebuild__distro_arch_series_uniq__idx ON binarypackagebuild(distro_arch_series, source_package_release, archive)
> +-- CREATE INDEX binarypackagebuild__distro_arch_series__status__idx ON binarypackagebuild(distro_arch_series, status?)
> +-- CREATE INDEX binarypackagebuild__distro_arch_series__date_finished ON binarypackagebuild(distro_arch_series, date_finished)
> +CREATE INDEX binarypackagebuild__source_package_release_idx ON binarypackagebuild(source_package_release);
>
> We might need an index on distro_arch_series too
>
> CREATE INDEX binarypackagebuild__distro_arch_series__idx ON BinaryPackageBuild(distro_a...

Read more...

Revision history for this message
Björn Tillenius (bjornt) wrote :

The schema changes look good. I'm having a hard time following the migration part, but I trust that Stuart reviewed it, and that you test it on dogfood properly.

review: Approve (db)
Revision history for this message
Michael Nelson (michael.nelson) wrote :

Thanks Bjorn. I've another question for you both: the patch moves the old build table to todrop.build and deletes its constraints.

Should I have a contingency patch that (1) restores the old table, (2) restores its constraints (3) moves the new tables to the todrop namespace?

I'm kindof nervous about landing this as I don't see a way to back it out if need be (given the size etc.)

I'll get this tested today on dogfood.

Revision history for this message
Stuart Bishop (stub) wrote :

You could write a patch to do that now and probably waste your time, or you could write it if you actually need it. The trick will not be renaming the tables, but migrating updated data back.

Revision history for this message
Michael Nelson (michael.nelson) wrote :

{{{
launchpad@mawson:/srv/launchpad.net/codelines/current$ time PGUSER=postgres LPCONFIG=dogfood bin/py
./database/schema/upgrade.py -vv -p
2010-05-26 11:58:50 INFO Applying patches to unreplicated environment.
2010-05-26 11:58:50 INFO Applying trusted.sql
2010-05-26 11:58:50 DEBUG Committing changes
2010-05-26 11:58:50 DEBUG Found patch 2207.57.0 -- ./database/schema/patch-2207-57-0.sql
2010-05-26 11:58:50 INFO Applying ./database/schema/patch-2207-57-0.sql
^[2010-05-26 13:41:57 DEBUG Committing changes
2010-05-26 13:41:57 INFO Applying comments.sql
2010-05-26 13:42:03 DEBUG Committing changes
2010-05-26 13:42:03 DEBUG Committing changes

real 103m14.459s
user 0m1.340s
sys 0m0.430s
launchpad@mawson
}}}

Revision history for this message
Stuart Bishop (stub) wrote :

Dogfood isn't great for timings. Staging timings are better.

If we think it is too slow, or we can't wait to land on staging to be sure, I'll need to refactor the migration.

Revision history for this message
Michael Nelson (michael.nelson) wrote :

{{{
11:01 < stub> noodles775: It took 17 minutes on staging, so maybe 35-40 mins to apply to production.
11:01 < noodles775> stub: thanks. I assume that's way too long? What is acceptable?
11:02 < stub> Thats up to the release manager and losas
11:02 < noodles775> OK.
11:10 < stub> noodles775: This version should be identical and takes 8 minutes, so 16-20 mins on production: http://paste.ubuntu.com/440285/
11:11 < noodles775> Thanks stub... I'll update and send it off to ec2... but at this time I don't think I'll be landing it.. there seem to be other issues with teh current db-devel on dogfood that need looking in to.
}}}

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches

to status/vote changes: