Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark #19

Merged
merged 25 commits into from
May 6, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
1b22445
WASM translator to accept options for native/embedded and number of w…
jelmervdl May 3, 2022
95fbeea
Add benchmark page
jelmervdl May 3, 2022
0b4a386
Less log spamming plz
jelmervdl May 3, 2022
b494c9d
Measure startup, quality of life improvements
jelmervdl May 3, 2022
f13820a
Add controls for lang/html config, add first translation timing for l…
jelmervdl May 4, 2022
82ccbc3
Add code to record page chunks during usage
jelmervdl May 4, 2022
e18790e
Allow for configuration of batch size in WASM translator
jelmervdl May 4, 2022
83202fa
More benchmarks, and make them toggleable
jelmervdl May 4, 2022
f7b6d3b
Make cache size configurable (and disable in benchmarks)
jelmervdl May 4, 2022
50968c4
Add Configure command for TranslateLocally
jelmervdl May 4, 2022
968542f
Merge branch 'main' into benchmark
jelmervdl May 4, 2022
5085732
Add test for cache size
jelmervdl May 4, 2022
821db60
Add little bar charts in the table background
jelmervdl May 4, 2022
f2c94cc
Less "smart" implementation for `lazy`
jelmervdl May 5, 2022
3263321
Fix untar issue in untar itself
jelmervdl May 5, 2022
229655a
Support multiple runs
jelmervdl May 5, 2022
cc03f8c
Do the warm-up, but describe columns better
jelmervdl May 5, 2022
1d2eccd
batch-size 8 is default right now, makes more sense to test that one
jelmervdl May 5, 2022
cb1f7cf
Not using it but I like my Python `asyncio.as_completed` implementation
jelmervdl May 5, 2022
fa0300d
Fix up popup
jelmervdl May 5, 2022
0cc6897
Clean up record functionality
jelmervdl May 5, 2022
65a6d5d
Hide all of this behind a "developer mode" toggle
jelmervdl May 5, 2022
548621f
Disable a bunch of benchmarks by default
jelmervdl May 5, 2022
b6dbc9a
Remove debug prints
jelmervdl May 5, 2022
88f3a98
Remove the GEMM patching bit
jelmervdl May 5, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions .github/workflows/publish_artifact.yml
Original file line number Diff line number Diff line change
Expand Up @@ -96,9 +96,6 @@ jobs:
- name: ccache epilog
run: |
ccache -s # Print current cache stats
- name: Import GEMM library from a separate wasm module
working-directory: bergamot-translator/build-wasm-without-wormhole
run: bash ../wasm/patch-artifacts-import-gemm-module.sh
- name: Upload wasm artifact
uses: actions/upload-artifact@v2
with:
Expand Down
2 changes: 1 addition & 1 deletion extension/3rd_party/js-untar/untar.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/* globals Blob: false, Promise: false, console: false, Worker: false, ProgressivePromise: false */

var workerScriptUri = '3rd_party/js-untar/untar-worker.js';
var workerScriptUri = browser.runtime.getURL('3rd_party/js-untar/untar-worker.js');

var global = window || this;

Expand Down
10 changes: 9 additions & 1 deletion extension/contentScript.js
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ on('Update', diff => {
}
});

const sessionID = new Date().getTime();

const inPageTranslation = new InPageTranslation({
translate(text, user) {
console.assert(state.from !== undefined && state.to !== undefined);
Expand All @@ -88,7 +90,13 @@ const inPageTranslation = new InPageTranslation({
user,

// data useful for the scheduling
priority: user.priority || 0
priority: user.priority || 0,

// data useful for recording
session: {
id: sessionID,
url: document.location.href
}
}
});
}
Expand Down
102 changes: 100 additions & 2 deletions extension/controller/backgroundScript.js
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,11 @@ class Tab extends EventTarget {
totalTranslationRequests: 0,
modelDownloadRead: undefined,
modelDownloadSize: undefined,
record: false,
recordedPagesCount: undefined,
recordedPagesURL: undefined
};

this.frames = new Map();

this._scheduledUpdateEvent = null;
Expand Down Expand Up @@ -225,6 +229,69 @@ function showPopup(event) {
}
}

class Recorder {
#pages;

constructor(backing) {
this.#pages = new Map();
}

record({from, text, html, session: {url}}) {
// Unique per page url
if (!this.#pages.has(url))
this.#pages.set(url, {
url,
from,
texts: [],
});

this.#pages.get(url).texts.push(text);
}

get size() {
return this.#pages.size;
}

clear() {
this.#pages.clear();
}

exportAXML() {
const root = document.implementation.createDocument('', '', null);
const dataset = root.createElement('dataset');

this.#pages.forEach(page => {
const doc = root.createElement('doc');
doc.setAttribute('origlang', page.from);
doc.setAttribute('href', page.url);

const src = root.createElement('src');
src.setAttribute('lang', page.from);

page.texts.forEach((text, i) => {
const p = root.createElement('p');

const seg = root.createElement('seg');
seg.setAttribute('id', i + 1);

seg.appendChild(root.createTextNode(text));
p.appendChild(seg);

src.appendChild(p);
});

doc.appendChild(src);
dataset.appendChild(doc);
});

root.appendChild(dataset);

const serializer = new XMLSerializer();
const xml = serializer.serializeToString(root);
return new Blob([xml], {type: 'application/xml'});
}
}

// Supported translation providers
const providers = {
'translatelocally': TLTranslationHelper,
Expand All @@ -233,7 +300,13 @@ const providers = {

// Global state (and defaults)
const state = {
provider: 'wasm'
provider: 'wasm',
options: {
workers: 1, // be kind to the user's pc
cacheSize: 20000, // remember website boilerplate
useNativeIntGemm: true // faster is better (unless it is buggy: https://github.com/browsermt/marian-dev/issues/81)
},
developer: false // should we show the option to record page translation requests?
};

// State per tab
Expand Down Expand Up @@ -262,7 +335,7 @@ let provider = new class {
state.provider = 'wasm';
}

this.#provider = new providers[state.provider]();
this.#provider = new providers[state.provider](state.options);

this.#provider.onerror = err => {
console.error('Translation provider error:', err);
Expand All @@ -286,6 +359,8 @@ let provider = new class {
}
};

const recorder = new Recorder();

/**
* Connects the port of a content-script or popup with the state management
* mechanism of the tab. This allows the content-script to make UpdateRequest
Expand Down Expand Up @@ -384,6 +459,17 @@ function connectContentScript(contentScript) {
pendingTranslationRequests: state.pendingTranslationRequests + 1,
totalTranslationRequests: state.totalTranslationRequests + 1
}));

// If we're recording requests from this tab, add the translation
// request. Also disabled when developer setting is false since
// then there are no controls to turn it on/off.
if (state.developer && tab.state.record) {
recorder.record(message.data);
tab.update(state => ({
recordedPagesCount: recorder.size
}));
}

provider.get().translate({...message.data, _abortSignal})
.then(response => {
if (!response.request._abortSignal.aborted) {
Expand Down Expand Up @@ -476,6 +562,18 @@ function connectPopup(popup) {
case 'TranslateAbort':
tab.abort();
break;

case 'ExportRecordedPages':
popup.postMessage({
command: 'DownloadRecordedPages',
data: {
name: 'recorded-pages.xml',
url: URL.createObjectURL(recorder.exportAXML())
}
});
recorder.clear();
tab.update(state => ({recordedPagesCount: 0}));
break;
}
});
}
Expand Down
31 changes: 19 additions & 12 deletions extension/controller/translation/TLTranslationHelper.js
Original file line number Diff line number Diff line change
Expand Up @@ -23,18 +23,16 @@ class PortChannel {
return new PromiseWithProgress((resolve, reject, update) => {
const id = ++this.serial;
this.pending.set(id, {resolve, reject, update});
console.log('Sending', {id, command, data})
this.port.postMessage({id, command, data});
})
}

onMessage(message) {
console.log('Received', message);

if (message.id === undefined) {
console.warn('Ignoring message from translateLocally that was missing the id', message);
return;
}

const {resolve, reject, update} = this.pending.get(message.id);

if (!message.update)
Expand All @@ -54,7 +52,11 @@ class PortChannel {
*/
class TLTranslationHelper {

constructor() {
constructor(options) {
this.threads = Math.max(options?.workers || 1, 1);

this.cacheSize = Math.max(options?.cacheSize || 0, 0);

this.client = lazy(this.loadNativeClient.bind(this));

// registry of all available models and their urls: Promise<List<Model>>
Expand All @@ -78,16 +80,21 @@ class PortChannel {
}

async loadNativeClient() {
return new Promise((resolve, reject) => {
const port = browser.runtime.connectNative('translatelocally');
const port = browser.runtime.connectNative('translatelocally');

port.onDisconnect.addListener(() => {
if (port.error)
this.onerror(port.error);
});

port.onDisconnect.addListener(() => {
if (port.error)
this.onerror(port.error);
});
const channel = new PortChannel(port);

resolve(new PortChannel(port));
await channel.request('Configure', {
threads: this.threads,
cacheSize: this.cacheSize
});

return channel;
}

/**
Expand Down
40 changes: 26 additions & 14 deletions extension/controller/translation/WASMTranslationHelper.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,10 @@
/* global modelRegistryRootURL, modelRegistryRootURLTest, modelRegistry,importScripts */


const BATCH_SIZE = 8; // number of requested translations

const CACHE_NAME = "bergamot-translations";

const MAX_DOWNLOAD_TIME = 60000; // TODO move this

const MAX_WORKERS = 1;

/**
* Little wrapper around the message passing API to keep track of messages and
* their responses in such a way that you can just wait for them by awaiting
Expand All @@ -22,7 +18,6 @@ const MAX_WORKERS = 1;
class WorkerChannel {
constructor(worker) {
this.worker = worker;
this.worker.onerror = this.onerror.bind(this);
this.worker.onmessage = this.onmessage.bind(this);
this.serial = 0;
this.pending = new Map();
Expand All @@ -36,10 +31,6 @@ class WorkerChannel {
})
}

onerror(error) {
throw new Error(`Error in worker: ${error.message}`);
}

onmessage({data: {id, message, error}}) {
if (id === undefined)
return;
Expand All @@ -61,7 +52,16 @@ class WorkerChannel {
*/
class WASMTranslationHelper {

constructor() {
/**
* options:
* cacheSize: 0
* useNativeIntGemm: false
* workers: 1
* batchSize: 8
*/
constructor(options) {
this.options = options || {};

// registry of all available models and their urls: Promise<List<Model>>
this.registry = lazy(this.loadModelRegistery.bind(this));

Expand All @@ -74,12 +74,21 @@ class WorkerChannel {
// List of active workers (and a flag to mark them idle or not)
this.workers = [];

// Maximum number of workers
this.workerLimit = Math.max(this.options.workers || 0, 1);

// List of batches we push() to & shift() from
this.queue = [];

// batch serial to help keep track of batches when debugging
this.batchSerial = 0;

// Number of requests in a batch before it is ready to be translated in
// a single call. Bigger is better for throughput (better matrix packing)
// but worse for latency since you'll have to wait for the entire batch
// to be translated.
this.batchSize = Math.max(this.options.batchSize || 8, 1);

// Error handler for all errors that are async, not tied to a specific
// call and that are unrecoverable.
this.onerror = err => console.error('WASM Translation Worker error:', err);
Expand All @@ -94,8 +103,11 @@ class WorkerChannel {
loadWorker() {
// TODO is this really not async? Can I just send messages to it from
// the start and will they be queued or something?
const worker = new Worker('controller/translation/WASMTranslationWorker.js');
worker.onerror = (err) => this.onerror(err);
const worker = new Worker(browser.runtime.getURL('controller/translation/WASMTranslationWorker.js'));
worker.onerror = this.onerror.bind(this);

// Initialisation options
worker.postMessage({options: this.options});

// Little wrapper around the message passing api of Worker to make it
// easy to await a response to a sent message.
Expand Down Expand Up @@ -345,7 +357,7 @@ class WorkerChannel {
let worker = this.workers.find(worker => worker.idle);

// No worker free, but space for more?
if (!worker && this.workers.length < MAX_WORKERS) {
if (!worker && this.workers.length < this.workerLimit) {
worker = {
idle: true,
worker: this.loadWorker()
Expand Down Expand Up @@ -451,7 +463,7 @@ class WorkerChannel {
let batch = this.queue.find(batch => {
return batch.key === key
&& batch.priority === priority
&& batch.requests.length < BATCH_SIZE
&& batch.requests.length < this.batchSize
});

// No batch or full batch? Queue up a new one
Expand Down
Loading