Wednesday, November 11, 2009

Automatic Promotion of stable code

We are working to achieve automatic promotion of "good" revisions to main. First of all, what do we mean with "good"? Right now we measure this as a set of builds and tests that are run successfully on a specific code revision of pi. When this happens then all the code up to that revision is selected to be promoted to the stable repository main.

Our goals:
  • Main only includes stable code.
  • Code is promoted from the development repositories to main as fast as possible. And as fast as the code quality permits it.
Current Model
We have pi, main and Continuois Integration engine
  • pi: The pre-integration repository. Developers push directly here.
  • main: The stable repository. A manual merge is done before the release time.
  • builds: Integrated with Hudson tool, that executes about 12 builds/test on main and pi.
This is how pi and main fit in our Continuous Integration engine:
  • Developers commit locally and push their changes to pi.
  • There are around 12 builds and tests that are polling the SCM for changes to the pi repository If there are changes the builds are run. If they fail developers are notified in the IRC channel #openbravo and in the development mailing list.
The main repository comes to action on the release time:
  • When the release start date comes, we make sure that all the tests in pi run successfully.
  • Once everything is green and good, we merge pi into main.
  • QA starts the manual test process, testing those parts that are not automated yet.
  • If major issues are found, we transplant changesets from pi into main.
Issues/Drawbacks of this model
  • Main does not always contain stable code. Whenever we do a transplant there's a potential risk that so we need to run all the tests in main again. So main can only be considered reliable on release time. And we want the tip of the repository to always be trustworthy.
  • Experience has shown that pi tends to be unstable. And this annoys developers and the release engineering team (us).
  • Depending on the number of the commits and changesets pushed to pi since the merge to main was done, if a transplant is required there is a potential risk. Because pi is more advanced in features and fixes. And we don't want to freeze pi.
Proposed Model
We use Mercurial as our SCM, so its distributed nature will be an invaluable help solving this problem. We are already running bunch of builds and tests. This is great. We now wanted to automatically mark those revisions as "good" or "tested".

Need for integration stage
Doing integration in main is a bad idea, doing it in pi is even worse. We already have pre-integration and the final main repository. So having a integration repository is a natural choice that fits in this model.

Now we have pi, main, int and builds
How would this work then with a integration repository? (let's call it "int")
  • Developers work on pi. A set of tests are run on pi to detect silly problems.
  • int pulls from pi from time to time, and triggers a set of builds and tests.
  • If all the builds and tests are run successful, we can consider the tip of int as "good". So int pushes all the changesets to main.
  • Repeat the process forever.
Step 1: Create first/top job that does
* Pulls the last successfull changesets from pi into int.
* Clone int locally in the system (int-1)

Step 2: Incremental build for PostgreSQL (erp_devel_int-inc-pgsql).
* This job polls from int-1. If there are changes it runs the job.
* If the job is successful, it pushes the changesets to a new repository, int-2.

Step3: Incremental build for Oracle (erp_devel_int-inc-oracle).
* This job polls from int-2. If there are changes it runs the job.
* If the job is successful, it pushes the changesets to a new repository, int-3.
(. . .)
Step 11: Smoke test on Oracle (erp_devel_int-oracle-smoke-test).
* This job polls from int-10. If there are changes it runs the job.

Step 12: Promote pi to main (erp_promote_pi_to_main)
* If the job is successful, it pushes the changesets from int-11 to main

What do we achieve with this model?
  • Logical order: if an incremental job fails a full build will not be triggered. Because the full job is polling for changes from int-3, but as the job failed no push to int-3 has happened.
  • The revision tested by the last job has been tested in all the jobs. So we have the guarantee that it has passed all the tests. And we can push it to main.
  • The short-time jobs do not have to wait for the long time jobs. Not all the revisions tested by job 1 are tested by job 2. But the opposite is always true. All the jobs tested by 2 have been tested by 1.
  • The model does not depend on a specific Continuous Integration software.

No comments:

Post a Comment