Cluster Telemetry

Contents

Overview

Cluster Telemetry allows you to run telemetry‘s benchmarks, lua scripts and other tasks using multiple repository patches through Alexa’s top 1 million web pages. Developers can use the framework to measure the performance of their patch against the top subset of the internet on both Desktop and Android.

SKP files are a binary format for the draw commands Chromium sends to Skia for rasterization. The goal of the project started off with wanting to collect a large repository of 10k SKP files. This repository, after incremental changes in approaches, has since grown to ~900k and now supports running all telemetry benchmarks. The top level feature request of this project was skia:1268.

A web application has been created on App Engine that automates the process of capturing new archives and running telemetry benchmarks at a click of a button; results are emailed to the requester and the web application contains complete history of runs with links to results. You can run telemetry benchmarks at http://ct.skia.org.

The framework also contains the ability to run lua scripts on the SKP repository to scrape web pages. It only takes a few minutes to run a lua scrapping script on ~900k SKP files.

These are the different parts of the framework:

  • Chromium Perf Tryserver. Documentation here. Webpage here.
  • Skia Correctness Tryserver. Documentation here. Webpage here.
  • Run Lua Scripts. Documentation about lua bindings is here. Webpage here.

Framework Usage

The Chromium Perf tryserver in CT has been used to gather perf data over the top 10k web pages for the following Chromium projects:

  • Slimming paint
  • Performance data for layer squashing and compositing overlap map
  • SkPaint in Graphics Context
  • Culling
  • New paint dictionary

blink-dev threads discussing how to make Chrome faster using the results gathered from CT:

Documents detailing data generated by the framework:

The framework has also been used to run multiple lua scripts to scrape the SKP repositories for the the following: chars-vs-glyphs, bitmap transform types, gradient color counter, 3 color gradient checks, etc. This has been very useful for the Skia team to help determine which parts of the library to optimize and focus on.

All runs are recorded here.

System Architecture

System Diagram

CT System Diagram

Detailed explanation of steps

  1. User submits an Admin task (rebuild chrome, recreate pagesets, recreate webpage archives), Lua script task, or Telemetry benchmark task using the App Engine web application here.

  2. Each task is exposed by the web application in JSON. The CT master polls the web application and picks up new tasks.

  3. The master pushes new tasks to all the workers using the master scripts here (in a new process so that the poller is not blocked). The master scripts then check to see when the workers are done with the requested task.

  4. The workers execute the task using the worker scripts here. All generated artifacts (CSV files, logs, SKP files, archives, etc) are then stored locally and copied to Google Storage.

  5. The master scripts periodically check the workers to see when they are done with the requested task. Once the workers are done the generated artifacts are then read from Google Storage and consolidated (if required).

  6. The master scripts then email results of the task to the user who requested it. The master scripts also update the status of the task to completed on App Engine.

Code

Cluster Telemetry is primarily written in Go with a few python scripts. The framework lives in master/ct and the appengine code lives in master/appengine_scripts.

Contact Us

If you have questions, please email cluster-telemetry@chromium.org or contact rmistry@ directly.