Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

havelessbemore - New Submission #2

Open
wants to merge 68 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
5ede677
Commit solution
havelessbemore May 21, 2024
92f19b5
Remove unused CHANGELOG.md
havelessbemore May 21, 2024
7535b1a
Add benchmark and README.md
havelessbemore May 22, 2024
d23d1c2
Flatten value arrays from 1 per worker to 1 global. Allows removal of…
havelessbemore May 22, 2024
e511efe
Rename trie to utf8Trie
havelessbemore May 22, 2024
128a8a2
Refactor utf8 constants
havelessbemore May 22, 2024
111786e
Rename utf8Trie constants
havelessbemore May 22, 2024
eb5ba80
Rename utf8Trie constants again
havelessbemore May 22, 2024
5a880aa
Refactor print() function for utf8trie. Stop update for variable numb…
havelessbemore May 22, 2024
9ee4bf6
Update best time in README.md
havelessbemore May 22, 2024
360f17e
Merge mins and maxes arrays
havelessbemore May 22, 2024
a483f80
Merge values in memory
havelessbemore May 23, 2024
981cfe5
Rename utf8Trie constants again, Refactor main()
havelessbemore May 23, 2024
ea5d0c8
Parallelize trie merging
havelessbemore May 23, 2024
e7d8f0f
Minify output
havelessbemore May 23, 2024
3e83ee7
Update best time in README.md
havelessbemore May 23, 2024
54c0d04
Add utility functions for working with Workers
havelessbemore May 23, 2024
6402106
Format files
havelessbemore May 23, 2024
9819a4f
Refactor worker message listener
havelessbemore May 23, 2024
6026a8f
Refactor main
havelessbemore May 23, 2024
4918ed6
Add MIT license
havelessbemore May 23, 2024
8110879
Update linter config
havelessbemore May 23, 2024
473bbd7
Add get() utf8trie function, Fix bug in mergeLeft() function where re…
havelessbemore May 23, 2024
5d8d084
Update comments, Rename utf8 constants, Remove unused constants
havelessbemore May 23, 2024
20f3d17
Format files
havelessbemore May 23, 2024
4fb575b
Refactor index
havelessbemore May 23, 2024
e800428
Update specs in README.md
havelessbemore May 23, 2024
1b06173
Comment out possible constants from utf8Trie
havelessbemore May 24, 2024
955536e
Simplify the chunk processing loop in the worker
havelessbemore May 24, 2024
cbfb74d
Update README.md
havelessbemore May 24, 2024
df29090
Try different methods for parsing temperature
havelessbemore May 24, 2024
dc1a881
Update README.md to reflect new avg
havelessbemore May 24, 2024
fc8ed7d
Format files
havelessbemore May 24, 2024
44da3a3
Remove 'type' property from message responses as they are unused, Sho…
havelessbemore May 25, 2024
526a936
Refactor parseDouble
havelessbemore May 25, 2024
c8b5027
Refactor trie merging; Previously tries were merged as if they were p…
havelessbemore May 25, 2024
e41c6cc
Format files
havelessbemore May 25, 2024
7e6ccb1
Disable eslint rule for Response type
havelessbemore May 25, 2024
a5cd0c8
Change builder from rollup to esbuild
havelessbemore May 25, 2024
da0ef67
Change CHAR_CODE_ constants into CharCode const enum
havelessbemore May 25, 2024
79dd58f
Change UTF8_ constants into UTF8 const enum
havelessbemore May 25, 2024
c327ef0
Update worker, stream and constraint constants
havelessbemore May 25, 2024
4c90e7d
Update utf8trie constants into const enums
havelessbemore May 25, 2024
2518251
Update README.md
havelessbemore May 25, 2024
998734b
Update benchmark results
havelessbemore May 25, 2024
7060be3
Add eslint ignore rules
havelessbemore May 25, 2024
1caffc3
Referencing const enum value A from a separate file prevented some va…
havelessbemore May 26, 2024
05dbbd8
Simplify finding an entry's semicolon
havelessbemore May 26, 2024
c2a68a3
Refactor worker's processing loop
havelessbemore May 26, 2024
38fd452
Remove unused NOTICE file, Add '' to package.json
havelessbemore May 26, 2024
1952208
Open file once and share fd with workers
havelessbemore May 26, 2024
ea182b0
Change message type from strings to numbers, Refactor mergeLeft() fun…
havelessbemore May 26, 2024
4ed9b06
Update highWaterMark constants
havelessbemore May 26, 2024
847f2b0
Replace createReadStream with manual streaming. This allows for bette…
havelessbemore May 26, 2024
7a4ed23
Format files
havelessbemore May 26, 2024
8d24cde
Update benchmark results
havelessbemore May 26, 2024
b7c7f19
Remove config constant CHUNK_SIZE_MIN
havelessbemore May 27, 2024
1a3de36
Replace fs/promises with sync fs functions
havelessbemore May 27, 2024
de75285
Make worker process() function synchronous to avoid async overhead
havelessbemore May 27, 2024
da6ba82
Replace Math.max and Math.min with ternaries
havelessbemore May 27, 2024
8160806
Update benchmark results
havelessbemore May 27, 2024
9043357
Refactor worker
havelessbemore May 27, 2024
a716e87
Refactor worker again
havelessbemore May 28, 2024
b92c185
Refactor worker hot loop
havelessbemore May 28, 2024
da092eb
Align reads with system page size
havelessbemore May 31, 2024
a9e603c
Refactor worker hot loop
havelessbemore May 31, 2024
9acc20a
Refactor constants
havelessbemore May 31, 2024
9e8604a
Update benchmark results
havelessbemore May 31, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions calculate_average_havelessbemore.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/sh
#
# Copyright 2023 The original authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

time node --enable-source-maps src/main/nodejs/havelessbemore/dist/index.mjs measurements.txt
3 changes: 3 additions & 0 deletions src/main/nodejs/havelessbemore/.prettierignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.husky
dist
docs
1 change: 1 addition & 0 deletions src/main/nodejs/havelessbemore/AUTHORS
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Michael Rojas <dev.michael.rojas@gmail.com>
21 changes: 21 additions & 0 deletions src/main/nodejs/havelessbemore/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (C) 2024-2024 Michael Rojas <dev.michael.rojas@gmail.com>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
58 changes: 58 additions & 0 deletions src/main/nodejs/havelessbemore/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# The One Billion Row Challenge with Node.js

## Run

1. If needed, follow steps 1-2 in [Running the challenge](../../../../README.md#running-the-challenge) to create a `measurements.txt` file.

1. Run:

```bash
./calculate_average_havelessbemore.sh measurements.txt
```

## Benchmark

### Results

- Min: 11.1s
- Avg: 11.6s
- Max: 12.0s
- Samples: 10 runs, 5s apart

#### Specs:

- Machine:

- Model: MacBook Air
- Chip: Apple M2
- Cores: 8 (4 performance + 4 efficiency)
- Memory: 8 GB
- OS: MacOS Sonoma

- Other:
- NodeJS: v20.14.0
- Input file: ~13.8 GB

## Build

If you'd like to rebuild the project:

1. Navigate to the project subdirectory

```bash
cd ./src/main/nodejs/havelessbemore
```

2. Install dev dependencies (TypeScript, bundler, etc)

```bash
npm install
```

3. Build

```bash
npm run build
```

Output is built in the `dist/` directory as ECMAScript (`.mjs`) modules.
65 changes: 65 additions & 0 deletions src/main/nodejs/havelessbemore/benchmarks/bench.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
import { mkdtemp, rm } from "fs/promises";
import { join, resolve } from "path";
import { stdout } from "process";
import { availableParallelism, tmpdir } from "os";
import { fileURLToPath } from "url";

import { Bench, Task } from "tinybench";

import { run } from "../src/main";

// INPUT
const filePath = process.argv[2];
const maxWorkers = availableParallelism();
const workerPath = resolve(
fileURLToPath(import.meta.url),
"../../dist/index.mjs",
);

// OUTPUT
const dir = await mkdtemp(join(tmpdir(), "1brc-"));

// BENCHMARK
let i = 0;
let t0 = 0;
const bench = new Bench({ iterations: 5 });
bench.add(
`1BRC`,
async () => {
const outPath = join(dir, `out_${i}.txt`);
return run(filePath, workerPath, maxWorkers, outPath);
},
{
beforeAll: () => {
t0 = performance.now();
},
beforeEach: function (): void {
const elapsed = toSeconds(performance.now() - t0);
console.log(`${this.name} (${elapsed}s): Running iteration ${++i}...`);
},
},
);

await bench.run();

// CLEANUP
await rm(dir, { recursive: true, force: true });

// REPORTING
function toRecord(task: Task): Record<string, unknown> {
const out: Record<string, unknown> = {};
out.Name = task.name;
out["Min (s)"] = toSeconds(task.result?.min);
out["Max (s)"] = toSeconds(task.result?.max);
out["Avg (s)"] = toSeconds(task.result?.mean);
out.Samples = +(task.result?.samples ?? []).length;
return out;
}

function toSeconds(ms: number | undefined): number {
return Math.floor(ms ?? 0) / 1000;
}

console.table(bench.tasks.map((task) => toRecord(task)));
const time = bench.tasks.reduce((sum, t) => sum + t.result!.totalTime, 0);
stdout.write(`Total time: ${toSeconds(time)}s\n`);
3 changes: 3 additions & 0 deletions src/main/nodejs/havelessbemore/dist/index.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
import{availableParallelism as J}from"node:os";import{fileURLToPath as z}from"node:url";import{isMainThread as ee,parentPort as w}from"node:worker_threads";import{closeSync as K,createWriteStream as V,fstatSync as Q,openSync as $}from"node:fs";import{stdout as j}from"node:process";function h(e,r,t){return e>r?e<=t?e:t:r}function N(e){return e=Math.ceil(e*1),e+=16384-e%16384,h(e,16384,16777216)}function P(e,r){return e=Math.ceil(e/r),e+=16384-e%16384,h(e,16384,16777216)}function T(e,r,t){for(;--t>=0;)if(e[t]===r)return t;return-1}function O(e,r,t,a){let f=1;for(;t<a;){f+=2+1*(r[t++]-32);let I=e[f+0];I===0&&(I=e[0],I+218>e.length&&(e=U(e,I+218)),e[0]+=218,e[f+0]=I,e[I+0]=e[1]),f=I}return[e,f]}function C(e=0,r=655360){r=Math.max(219,r);let t=new Int32Array(new SharedArrayBuffer(r<<2));return t[0]=219,t[1]=e,t}function U(e,r=0){let t=e[0];r=Math.max(r,Math.ceil(t*1.618033988749895));let a=new Int32Array(new SharedArrayBuffer(r<<2));for(let f=0;f<t;++f)a[f]=e[f];return a}function q(e,r,t,a){let f=[],I=[[r,1,t,1]];do{let m=I.length;for(let i=0;i<m;++i){let[n,s,R,E]=I[i],p=e[R][E+1];if(p!==0){let c=e[n][s+1];c!==0?a(c,p):e[n][s+1]=p}s+=2,E+=2;let D=E+216;for(;E<D;){let c=e[R][E+0];if(c!==0){let u=e[R][c+0];R!==u&&(c=e[R][c+1]);let o=e[n][s+0];if(o===0)o=e[n][0],o+2>e[n].length&&(e[n]=U(e[n],o+2),f.push(n)),e[n][0]+=2,e[n][s+0]=o,e[n][o+0]=u,e[n][o+1]=c;else{let l=e[n][o+0];n!==l&&(o=e[n][o+1]),I.push([l,o,u,c])}}s+=1,E+=1}}I.splice(0,m)}while(I.length>0);return f}function Z(e,r,t,a,f="",I){let m=new Array(r.length+1);m[0]=[t,3,0];let i=0,n=!1;do{let[s,R,E]=m[i];if(E>=216){--i;continue}m[i][1]+=1,++m[i][2];let p=e[s][R+0];if(p===0)continue;let D=e[s][p+0];s!==D&&(p=e[s][p+1],s=D),r[i]=E+32,m[++i]=[s,p+2,0];let c=e[s][p+1];c!==0&&(n&&a.write(f),n=!0,I(a,r,i,c))}while(i>=0)}import{Worker as F}from"node:worker_threads";function B(e){let r=new F(e);return r.on("error",t=>{throw t}),r.on("messageerror",t=>{throw t}),r.on("exit",t=>{if(t>1||t<0)throw new Error(`Worker ${r.threadId} exited with code ${t}`)}),r}function b(e,r){return new Promise(t=>{e.once("message",t),e.postMessage(r)})}async function H(e,r,t,a=""){t=h(t,1,256);let f=$(e,"r"),m=Q(f).size,i=P(m,t),n=N(i),s=new SharedArrayBuffer(1e4*t+1<<4),R=new Uint32Array(s,0,1),E=new Int16Array(s),p=new Int16Array(s,2),D=new Uint32Array(s,4),c=new Float64Array(s,8),u=new Array(t),o=[],l=new Array(t);for(let _=0;_<t;++_){let S=B(r);l[_]=b(S,{type:0,id:_,fd:f,fileSize:m,pageSize:i,chunkSize:n,counts:D,maxes:p,mins:E,page:R,sums:c}).then(async g=>{let y=g.id;for(u[y]=g.trie;o.length>0;){let d=await b(S,{type:1,a:y,b:o.pop(),counts:D,maxes:p,mins:E,sums:c,tries:u});for(let L of d.ids)u[L]=d.tries[L]}return o.push(y),S.terminate()})}await Promise.all(l),K(f);let A=V(a,{fd:a.length<1?j.fd:void 0,flags:"a",highWaterMark:1048576}),M=Buffer.allocUnsafe(100);A.write("{"),Z(u,M,o[0],A,", ",X),A.end(`}
`);function X(_,S,g,y){let d=Math.round(c[y<<1]/D[y<<2]);_.write(S.toString("utf8",0,g)),_.write("="),_.write((E[y<<3]/10).toFixed(1)),_.write("/"),_.write((d/10).toFixed(1)),_.write("/"),_.write((p[y<<3]/10).toFixed(1))}}import{readSync as v}from"fs";var k=11*48,W=111*48;function G(e,r,t){return e[r]===45?(++r,r+4>t?k-10*e[r]-e[r+2]:W-100*e[r]-10*e[r+1]-e[r+3]):r+4>t?10*e[r]+e[r+2]-k:100*e[r]+10*e[r+1]+e[r+3]-W}function x({id:e,fd:r,fileSize:t,pageSize:a,chunkSize:f,counts:I,maxes:m,mins:i,page:n,sums:s}){let R=(u,o)=>{i[u<<3]=o,m[u<<3]=o,I[u<<2]=1,s[u<<1]=o},E=(u,o)=>{u<<=3,i[u]=i[u]<=o?i[u]:o,m[u]=m[u]>=o?m[u]:o,++I[u>>1],s[u>>2]+=o},p=Buffer.allocUnsafe(f+16384),D=e*1e4,c=C(e);for(;;){let u=a*Atomics.add(n,0,1);if(u>=t)break;let o=u>0?16384:0;v(r,p,0,o,u-o);let l=T(p,10,o),A=Math.min(t,u+a);for(++l;u<A;u+=f){let M=16384-o+l;for(p.copyWithin(M,l,o),o=16384,l=M,M=Math.min(f,A-u),M=v(r,p,o,M,u),M+=o;o<M;++o){if(p[o]!==10)continue;let X=o-5;p[X]!==59&&(X+=1|1+~(p[X-1]===59));let _;[c,_]=O(c,p,l,X),l=o+1;let S=G(p,X+1,o);_+=1,c[_]!==0?E(c[_],S):(c[_]=++D,R(D,S))}}}return{id:e,trie:c}}function Y({a:e,b:r,tries:t,counts:a,maxes:f,mins:I,sums:m}){return{ids:q(t,e,r,(n,s)=>{n<<=3,s<<=3,I[n]=I[n]<=I[s]?I[n]:I[s],f[n]=f[n]>=f[s]?f[n]:f[s],a[n>>1]+=a[s>>1],m[n>>2]+=m[s>>2]}),tries:t}}if(ee){let e=z(import.meta.url);H(process.argv[2],e,J())}else w.addListener("message",e=>{if(e.type===0)w.postMessage(x(e));else if(e.type===1)w.postMessage(Y(e));else throw new Error("Unknown message type")});
//# sourceMappingURL=index.mjs.map
7 changes: 7 additions & 0 deletions src/main/nodejs/havelessbemore/dist/index.mjs.map

Large diffs are not rendered by default.

19 changes: 19 additions & 0 deletions src/main/nodejs/havelessbemore/esbuild.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import * as esbuild from "esbuild";

import pkg from "./package.json" with { type: "json" };

/** @type {import('esbuild').BuildOptions} */
const options = {
entryPoints: ["src/index.ts"],
bundle: true,
minify: true,
platform: "node",
sourcemap: true,
target: "ESNext",
};

await esbuild.build({
...options,
format: "esm",
outfile: pkg.module,
});
12 changes: 12 additions & 0 deletions src/main/nodejs/havelessbemore/eslint.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import eslint from "@eslint/js";
import prettierConfig from "eslint-config-prettier";
import tseslint from "typescript-eslint";

/** @type {import('@typescript-eslint/utils').TSESLint.FlatConfig.ConfigArray} */
export default tseslint.config(
{ ignores: ["dist"] },
eslint.configs.recommended,
...tseslint.configs.recommended,
...tseslint.configs.stylistic,
prettierConfig,
);
Loading