Buildhub2

Buildhub2

Buildhub2 is an index of build information for Firefox, Firefox Dev Edition, Thunderbird, and Fennec.

Circle CI status pyup status rennovate status What's Deployed ReadTheDocs status

Production server: https://buildhub.moz.tools/

Overview

Every time Taskcluster builds a version of Firefox, Fennec, etc. the built files are put into an S3 bucket. One of the files that is always accompanied is a file called buildhub.json that we download, validate an index into a PostgreSQL database as well as Elasticsearch.

When files are saved to the S3 bucket, the filename gets added to the SQS queue which is consumed by the daemon. The daemon looks at the filenames and indexes the buildhub.json ones into Buidlhub2.

Buildhub2 has a webapp which is a single-page-app that helps you make Elasticsearch queries and displays the results.

Buildhub2 has an API which you can use to query the data.

For more on these, see the user docs.

First Principles

Buildhub2 reflects data on archive.mozilla.org.

Buildhub2 will never modify, create, or remove build data from the buildhub.json files that are discovered and indexed. If the data is wrong, it needs to be fixed on archive.mozilla.org.

Buildhub2 records are immutable.

If a certain buildhub.json file is created, its primary key becomes a hash of its content. If, under the same URL, the buildhub.json is modified, it will lead to a new record in Buildhub.

Using Buildhub2

Products supported by Buildhub2

Buildhub2 indexes build information that exists on archive.mozilla.org.

If you want build information for your product in Buildhub2, you’ll need to change the release process to additionally add files to archive.mozilla.org in the same way that Firefox does.

Fields in Buildhub2

Buildhub2 records have the same structure as buildhub.json files on archive.mozilla.org.

Example: https://archive.mozilla.org/pub/firefox/candidates/68.0b7-candidates/build1/linux-x86_64/en-US/buildhub.json

{
  "build": {
    "as": "/builds/worker/workspace/build/src/clang/bin/clang -std=gnu99",
    "cc": "/builds/worker/workspace/build/src/clang/bin/clang -std=gnu99",
    "cxx": "/builds/worker/workspace/build/src/clang/bin/clang++",
    "date": "2019-06-03T18:14:08Z",
    "host": "x86_64-pc-linux-gnu",
    "id": "20190603181408",
    "target": "x86_64-pc-linux-gnu"
  },
  "download": {
    "date": "2019-06-03T20:49:46.559307+00:00",
    "mimetype": "application/octet-stream",
    "size": 63655677,
    "url": "https://archive.mozilla.org/pub/firefox/candidates/68.0b7-candidates/build1/linux-x86_64/en-US/firefox-68.0b7.tar.bz2"
  },
  "source": {
    "product": "firefox",
    "repository": "https://hg.mozilla.org/releases/mozilla-beta",
    "revision": "ed47966f79228df65b6326979609fbee94731ef0",
    "tree": "mozilla-beta"
  },
  "target": {
    "channel": "beta",
    "locale": "en-US",
    "os": "linux",
    "platform": "linux-x86_64",
    "version": "68.0b7"
  }
}

If you want different fields, the Taskcluster task will need to be changed to include the new information. Additionally, Buildhub2 will need to adjust the schema. Please open up an issue with your request.

Website

You can query build information using the website at https://buildhub.moz.tools/.

The search box uses Elasticsearch querystring syntax.

Example: All records for a given build id

Search for:

build.id:20170713200529

API

The API endpoint is at: https://buildhub.moz.tools/api/search

You can query it by passing in Elasticsearch search queries as HTTP POST payloads.

Example: Is this an official build id?

Is 20170713200529 an official build id?

We can query for records where build.id has that value, limit the size to 0 so we’re not getting records, back, and then check the total.

$ curl -s -X POST https://buildhub.moz.tools/api/search \
    -d '{"size": 0, "query": {"term": {"build.id": "20170713200529"}}}' | \
    jq .hits.total

Example: What is the Mercurial commit ID for a given build id?

What is the Mercurial commit ID for a given build id?

Query for the build id and only return 1 record. Extract the specific value using jq.

$ curl -s -X POST https://buildhub.moz.tools/api/search \
    -d '{"size": 1, "query": {"term": {"build.id": "20170713200529"}}}' | \
    jq '.hits.hits[] | ._source.source.revision'

Example: What platforms are available for a given build id?

What platforms are available for a given build id?

To get this, we want to do an aggregation on target.platform. We set the size to 0 so it doesn’t return aggregations and results for the query.

$ curl -s -X POST https://buildhub.moz.tools/api/search \
    -d '{"size": 0, "query": {"term": {"build.id": "20170713200529"}}, "aggs": {"platforms": {"terms": {"field": "target.platform"}}}}' | \
    jq '.aggregations.platforms.buckets'

Architecture and Overview

High-level

Mozilla builds versions of Firefox, Fennec etc. and the built files are uploaded to an S3 bucket. With each build a buildhub.json file is created that has all the possible information we intend to store and make searchable.

When a new file is added (or edited) in S3 it triggers an event notification that goes to an AWS SQS queue.

The daemon consumes the SQS queue. The daemon script looks for the exact file match. Since the SQS message only contains the name of the S3 object, it triggers a function that downloads that file, validates its content and stores it in PostgreSQL and also in Elasticsearch.

The four parts of Buildhub are:

  1. The Django web server

  2. The SQS consumer daemon script

  3. The PostgreSQL and Elasticsearch that makes it possible to search

  4. A create-react-app based React app for the UI which essentially runs SearchKit

Flow

  1. TaskCluster builds a, for example, Firefox-79-installer.exe and a buildhub.json

  2. TaskCluster uploads these files into S3.

  3. An S3 configuration triggers an SQS event that puts this S3-write into the queue.

  4. Buildhub2 processor daemon polls the SQS queue and gets the file creation event.

  5. Buildhub2 processor daemon downloads the buildhub.json file from S3 using Python boto3.

  6. Buildhub2 processor daemon reads its payload and checks the JSON Schema validation.

  7. Buildhub2 processor daemon inserts the JSON into PostgreSQL using the Django ORM.

  8. The JSON is then inserted into Elasticsearch.

Validation

The validation step before storing anything is to check that the data in the buildhub.json file matches the schema.yaml file. Since TaskCluster builds the buildhub.json file and this service picks it up asynchronous and delayed, there is at the moment no easy way to know an invalid buildhub.json file was built.

If you want to change the schema.yaml make sure it matches the schema used inside mozilla-central when the buildhub.json files are created.

Keys

The following keys are tracked in the code. Each one with a different purpose. Generally the pattern is that every key starts with a “context” keyword followed by an underscore. For example sqs_. That prefix is primarily to be able to trace it back to the source code, but also as a form of namespace.

sqs_process_buildhub_json_key

Timer.

How long it takes to consider a buildhub.json S3 key. This involves both downloading it from S3 and to attempt to insert it into our database. That “attempt to insert” means the hash is calculated, looked up and depending on if it was found makes an insert or does nothing.

sqs_inserted

Incr.

Count of inserts that were new and actually inserted into the database coming from the SQS queue.

sqs_not_inserted

Incr.

Count of inserts, from a buildhub.json that were attempted to be inserted but were rejected because it was already in the database.

sqs_messages

Incr.

This is a count of messages received by consuming the SQS queue. Assume this to be equal to the number of messages deleted from the SQS queue. It can be less messages deleted and than received in the unexpected cases where messages trigger an unexpected Python exception (caught in Sentry).

Note! The total number of sqs_inserted + sqs_not_inserted is not equal to the sqs_messages because of files that are not matching what we’re looking to process.

sqs_key_matched

Incr.

Every time an S3 record is received whose S3 key we match. Expect this number to match sqs_inserted + sqs_not_inserted.

sqs_not_key_matched

Incr.

Every message received (see sqs_messages) can contain multiple types of messages. We only look into the S3 records. Of those, some S3 keys we can quickly ignore as not matched. That is what this increment is counting.

So roughly, this number is sqs_messages minus sqs_insert minus sqs_not_inserted.

api_search_records

Gauge.

A count of the number of builds found by Elasticsarch in each API/search request.

api_search_requests

Incr.

Measurement of the number of requests received to be proxied to Elasticsearch. Note that every incr is accompanied with a tag. That is method:$METHOD. For example, method:POST.

backfill_inserted

Incr.

When a build is inserted from the backfill job that we did not already have. If this number goes up it means the SQS consumption is failing.

backfill_not_inserted

Incr.

When running the backfill, we iterate through all keys in the S3 bucket and to avoid having to download every single matched key, we maintain a the keys’ full path and ETag in the database to make the lookups faster. If a key and ETag is not recognized and we attempt to download and insert it but end up not needing to, then this increment goes up. Expect this number to stay very near zero in a healthy environment.

backfill_listed

Incr.

When running the backfill, this is a count of the number of S3 objects we download per page. To get an insight into the number of S3 objects considered, in total, use this number but over a window of time.

backfill_matched

Incr.

When running the backfill, we quickly filter all keys, per batch, down to the ones that we consider. This is a count of that. It’s an increment per batch. Similar to backfill_listed, to get an insight into the total, look at this count over a window of time.

backfill

Timer.

How long it takes to run the whole backfill job. This includes iterating over every single S3 key.

kinto_migrated

Incr.

When we run the migration from Kinto, a count of the number of messages (per batch) that we received from batch fetching from the legacy Kinto database.

kinto_inserted

Incr.

A count of the number of builds that are inserted from the Kinto migration. One useful use of this is to that you can run the Kinto migration repeatedly until this number does not increment.

Development environment

You can set up a Buildhub2 development environment that runs on your local machine for development and testing.

Setting up

To set up a dev environment, install the following:

  • Docker

  • make

  • git

  • bash

Clone the repo from GitHub at https://github.com/mozilla-services/buildhub2.

Then do the following:

# Build the Docker images
$ make build

# Wipe and initialize services
$ make setup

If make setup fails, run the following command to see detailed logs:

$ docker-compose up

Once you’ve done that, you can run Buildhub2.

Configuration

The Django settings depends on there being an environment variable called DJANGO_CONFIGURATION.

# If production
DJANGO_CONFIGURATION=Prod

# If stage
DJANGO_CONFIGURATION=Stage

You need to set a random DJANGO_SECRET_KEY. It should be predictably random and a decent length:

DJANGO_SECRET_KEY=sSJ19WAj06QtvwunmZKh8yEzDdTxC2IPUXfea5FkrVGNoM4iOp

The ALLOWED_HOSTS needs to be a list of valid domains that will be used to from the outside to reach the service. If there is only one single domain, it doesn’t need to list any others. For example:

DJANGO_ALLOWED_HOSTS=buildhub.mozilla.org

For Sentry the key is SENTRY_DSN which is sensitive but for the front-end (which hasn’t been built yet at the time of writing) we also need the public key called SENTRY_PUBLIC_DSN. For example:

SENTRY_DSN=https://bb4e266xxx:d1c1eyyy@sentry.prod.mozaws.net/001
SENTRY_PUBLIC_DSN=https://bb4e266xxx@sentry.prod.mozaws.net/001

Content Security Policy (CSP) headers are on by default. To change the URL for where violations are sent you can change DJANGO_CSP_REPORT_URI. By default it’s set to ''. Meaning, unless set it won’t be included as a header. See the MDN documentation on report-uri for more info.

To configure writing to BigQuery, the following variables will need to be set:

DJANGO_BQ_ENABLED=True
DJANGO_BQ_PROJECT_ID=...
DJANGO_BQ_DATASET_ID=...
DJANGO_BQ_TABLE_ID=...

The project and dataset will need to be provisioned before running the server with this functionality enabled. Additionally, credentials will need to be passed to the server. If it is running in Google Compute Engine, this is configured through the default service account. To run this via docker-compose, the following lines in docker-compose.yml will need to be un-commented:

volumes:
  ...
  # - ${GOOGLE_APPLICATION_CREDENTIALS}:/tmp/credentials

In addition, set the following variable after downloading the service account credentials from IAM & admin > Service accounts in the Google Cloud Platform console for the project.

GOOGLE_APPLICATION_CREDENTIALS=/path/to/keyfile.json

Run make test and check that none of the tests are skipped.

Adding data

FIXME: How to add data to your local instance?

Running the webapp

The webapp consists of a part that runs on the server powered by Django and a part that runs in the browser powered by React.

To run all the services required and the server and a service that builds static assets needed by the browser ui, do:

$ make run

This will start the server on port 8000 and the web ui on port 3000.

You can use http://localhost:3000 with your browser to use the web interface and curl/requests/whatever to use the API.

Running the daemon

Buildhub2 has a daemon that polls SQS for events and processes new files on archive.mozilla.org.

You can run the daemon with:

$ make daemon

You can quit it with Ctrl-C.

Troubleshooting

Below are some known issues you might run into and their workarounds.

  • ElasticSearch fails with following error during make setup:

elasticsearch    | ERROR: [1] bootstrap checks failed
elasticsearch    | [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

This can be worked around by running:

$ sysctl -w vm.max_map_count=262144

If you want this to be permanent across restarts, you also need to add this value to /etc/sysctl.conf.

Development conventions and howto

Conventions

License preamble

All code files need to start with the MPLv2 header:

# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.

Linting

We use flake8 for linting Python code. See https://github.com/mozilla-services/buildhub2/blob/master/.flake8 for rules.

We use black for fixing Python code formatting. The specific version is in requirements.txt so use that one in your IDE.

We use eslint for linting and fixing JS code.

CI will fail if linting raises any errors.

To run linting tasks on Python and JS files:

$ make lintcheck

To run lint-based fixing tasks on Python and JS files:

$ make lintfix

Documentation

We use Sphinx to generate documentation. Documentation is written using restructured text.

To build the docs, do:

$ make docs

You can view the docs in docs/_build/html/index.html in your browser.

Documentation is published at https://buildhub2.readthedocs.io/ every time changes land in master branch.

Backend (webapp server and daemon)

The backend is written in Python using Django. This covers both the backend webserver as well as the daemon.

Maintaining dependencies

All Python requirements needed for development and production needs to be listed in requirements.txt with sha256 hashes.

The most convenient way to modify this is to run hashin. For example:

$ pip install hashin
$ hashin Django==1.10.99
$ hashin other-new-package

This will automatically update your requirements.txt but it won’t install the new packages. To do that, you need to exit the shell and run:

$ make build

To check which Python packages are outdated, use piprot in a shell:

$ make shell
root@...:/app# pip install piprot
root@...:/app# piprot -o requirements.txt

The -o flag means it only lists requirements that are out of date.

Note

A good idea is to install hashin and piprot globally on your computer instead. It doesn’t require a virtual environment if you use pipx.

Frontend (ui)

The ui is a React single-page-app. It makes API calls to the backend to retrieve data.

All source code is in the ./ui directory. More specifically the ./ui/src which are the files you’re most likely going to edit to change the front-end.

All CSS is loaded with yarn by either drawing from .css files installed in the node_modules directory or from imported .css files inside the ./ui/src directory.

The project is based on create-react-app so the main rendering engine is React. There is no server-side rendering. The idea is that all (unless explicitly routed in Nginx) requests that don’t immediately find a static file should fall back on ./ui/build/index.html. For example, loading https://buildhub.moz.tools/uploads/browse` will actually load ./ui/build/index.html which renders the .js bundle which loads react-router which, in turn, figures out which component to render and display based on the path (“/uploads/browse” for example).

Handling dependencies

A “primitive” way of changing dependencies is to edit the list of dependencies in ui/package.json and running docker-compose build ui. This is not recommended.

A much better way to change dependencies is to use yarn. Use the yarn installed in the Docker ui container. For example:

$ docker-compose run ui bash
> yarn outdated                   # will display which packages can be upgraded today
> yarn upgrade date-fns --latest  # example of upgrading an existing package
> yarn add new-hotness            # adds a new package

When you’re done, you have to rebuild the ui Docker container:

$ docker-compose build ui

Your change should result in changes to ui/package.json and ui/yarn.lock which needs to both be checked in and committed.

Tools

Postgres/psql

To access the Postgres database, do:

$ make psql

Elasticsearch

To access Elasticsearch, you can use the Elasticsearch API against http://localhost:9200.

Deployment

Buildhub2 has two server environments: stage and prod.

Buildhub2 images are located on Docker Hub.

Notifications for deployment status are in #buildhub on Slack.

Deploy to Stage

Stage is at: https://stage.buildhub2.nonprod.cloudops.mozgcp.net/

To deploy to stage, tag the master branch and push the tag:

$ make tag

Deploy to Prod

Prod is at: https://buildhub.moz.tools/

To deploy to prod, ask ops to promote the tag on stage.

Backfilling

There’s a ./manage.py backfill command that uses the S3 API to iterate over every single key in an S3 bucket, filter out those called *buildhub.json and then check to see if we have those records.

The script takes FOREVER to run. The Mozilla production S3 bucket used for all builds is over 60 million records and when listing over them you can only read 1,000 keys at a time.

When iterating over all S3 keys it first filter out the *buildhub.json ones, compares the S3 keys and ETags with what is in the database, and inserts/updates accordingly.

Configuration

The S3 bucket it uses is called net-mozaws-prod-delivery-inventory-us-east-1 in us-east-1. It’s left as default in the configuration. If you need to override it set, for example:

DJANGO_S3_BUCKET_URL=https://s3-us-west-2.amazonaws.com/buildhub-sqs-test

If you know, in advance, what the S3 bucket that is mentioned in the SQS payloads is, you can set that up with:

DJANGO_SQS_S3_BUCKET_URL=https://s3-us-west-2.amazonaws.com/mothership

If either of these are set, they are tested during startup to make sure you have relevant read access.

Reading the S3 bucket is public and doesn’t require AWS_ACCESS_KEY_ID and AWS_ACCESS_KEY_ID but to read the SQS queue these need to be set up.

AWS_ACCESS_KEY_ID=AKI....H6A
AWS_SECRET_ACCESS_KEY=....

Note

The access key ID and secret access keys are not prefixed with DJANGO_.

How to run it

Get ops to run:

$ ./manage.py backfill

This uses settings.S3_BUCKET_URL which is the DJANGO_S3_BUCKET_URL environment variable.

The script will dump information about files it’s seen into a .json file on disk (see settings.RESUME_DISK_LOG_FILE aka. DJANGO_RESUME_DISK_LOG_FILE which is /tmp/backfill-last-successful-key.json by default). With this file, it’s possible to resume the backfill from where it last finished. This is useful if the backfill breaks due to an operational error or even if you Ctrl-C the command the first time. To make it resume, you have to set the flag --resume:

$ ./manage.py backfill --resume

You can set this from the very beginning too. If there’s no disk to get information about resuming from, it will just start from scratch.

Migrating from Kinto (over HTTP)

Note

This can be removed after Buildhub has been decomissioned.

If you intend to migrate from the old Buildhub’s Kinto database you need to run:

$ ./manage.py kinto-migration http://localhost:8888

That URL obviously depends on where the Kinto server is hosted. If the old Kinto database contains old legacy records that don’t conform you might get errors like:

Traceback (most recent call last):
...
jsonschema.exceptions.ValidationError: ['c:/builds/moz2_slave/m-rel-w64-00000000000000000000/build/', 'src/vs2015u3/VC/bin/amd64/link.exe'] is not of type 'string'

Failed validating 'type' in schema['properties']['build']['properties']['ld']:
    {'description': 'Executable', 'title': 'Linker', 'type': 'string'}

On instance['build']['ld']:
    ['c:/builds/moz2_slave/m-rel-w64-00000000000000000000/build/',
    'src/vs2015u3/VC/bin/amd64/link.exe']

Then simply run:

$ ./manage.py kinto-migration http://localhost:8888 --skip-validation

Note, during an early period, where the old Kinto database is still getting populated you can run this command repeatedly and it will continue where it left off.

Note

If you have populated a previously empty PostgreSQL from records from the Kinto database, you have to run ./manage.py reindex-elasticsearch again.

Migrating from Kinto (by PostgreSQL)

Note

This can be removed after Buildhub has been decomissioned.

A much faster way to migrate from Kinto (legacy Buildhub) is to have a dedicated PostgreSQL connection.

Once that’s configured you simply run:

$ ./manage.py kinto-database-migration

This will validate every single record and crash if any single record is invalid. If you’re confident that all the records, about to be migrated, are valid, you can run:

$ ./manage.py kinto-database-migration --skip-validation

Another option is to run the migration and run validation on each record, but instead of crashing, simply skip the invalid ones. In fact, this is the recommended way to migrate:

$ ./manage.py kinto-database-migration --skip-invalid

Keep an eye on the log output about the number of invalid records skipped.

It will migrate every single record in one sweep (but broken up into batches of 10,000 rows at a time). If it fails, you can most likely just try again.

Also, see the note about about the need to run ./manage.py reindex-elasticsearch afterwards.

Configuration

When doing the migration from Kinto you can either rely on HTTP, or, you can connect directly to a Kinto database. The way this works is it, optionally, sets up a separate PostgreSQL connection. The kinto-migration script will then be able to talk directly to this database. It’s disabled by default.

To enable it, it’s the same “rules” as for DATABASE_URL except it’s called KINTO_DATABASE_URL. E.g.:

KINTO_DATABASE_URL="postgres://username:password@hostname/kinto"

Testing

Unit tests

Buildhub2 has a suite of unit tests for Python. We use pytest to run them.

$ make test

If you need to run specific tests or pass in different arguments, you can run bash in the base container and then run pytest with whatever args you want. For example:

$ make shell
root@...:/app# pytest

SQS Functional testing

By default, for local development you can consume the SQS queue set up for Dev. For this you need AWS credentials. You need to set up your AWS IAM Dev credentials in ~/.aws/credentials (under default) or in .env.

The best tool for putting objects into S3 and populate the Dev SQS queue is to run s3-file-maker. To do that run, on your host:

cd "$GOPATH/src"
git clone https://github.com/mostlygeek/s3-file-maker.git
cd s3-file-maker
dep ensure
go build main.go
./main [--help]

Note

This SQS queue can only be consumed by one person at a time.

Indices and tables