-
Notifications
You must be signed in to change notification settings - Fork 1
feat(rst): Add bee-remote performance benchmarking #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
- Add hidden arguments for simulating job workloads and uploading results to remote storage targetswq - Support mocking BeeGFS and storage target - Fix MockNode.disconnect Error index
115626c to
e19bf00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall benchmarking functionality looks good! I had a handful of suggestions on how to integrate this into Remote and polish some of the UX around the developer options (not all of which were introduced in this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: what do you have to do to get make generate-notices to work on your machine? I was going to add a KB for that.
| pflag.Bool("developer.dump-config", false, "Dump the full configuration and immediately exit.") | ||
| pflag.CommandLine.MarkHidden("developer.dump-config") | ||
| pflag.Uint32("developer.benchmark-target", 0, "Specify a target to perform an upload sync operation (0 uses mock target)") | ||
| pflag.Int("developer.benchmark-count", 0, "Specify a target to perform an upload sync operation (0 disables performance benchmarking)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo(blocking): I don't think the help text is correct on this option (seems to be a partial copy/paste of benchmark-target).
| dbPath := "/var/lib/beegfs/remote/benchmark.badger" | ||
| defer os.RemoveAll(dbPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion(blocking): We should also double check whatever path we use to the benchmark DB does not already exist and refuse to startup if it does (and don't delete it).
| if err := pflag.CommandLine.Set("job.path-db", dbPath); err != nil { | ||
| fmt.Printf("unable to set temporary BadgerDB path for performance benchmarking: %s\n", err) | ||
| os.Exit(1) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion(non-blocking): We should probably support if the user wants to use a custom BadgerDB path (say to a NVMeOf attached disk). It would be helpful to be able to benchmark using the actual device used for BadgerDB.
Maybe user the job.path-db specified by the user (or the default) but create the benchmark DB directory under the parent directory of the actual DB? Maybe as bench.path.badger to avoid conflicts with Sync if we add a benchmark mode there later on.
| pprofPort := "16060" | ||
| if err := pflag.CommandLine.Set("developer.perf-profiling-port", pprofPort); err != nil { | ||
| fmt.Printf("unable to set temporary BadgerDB path for performance benchmarking: %s\n", err) | ||
| os.Exit(1) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion(blocking): Now that we use it for multiple things how about we rename developer.perf-profiling-port to developer.address and set a default IP and port (maybe 127.0.0.1:6060 - I think 6060 is canonical for pprof) using the flag default instead of here?
Then have a separate developer.enable-pprof option to control enabling pprof instead of controlling if pprof is enabled based on whether a port is set.
I also think this is important because I'm not sure we should use the gRPC server address as the "dev tools" benchmark debug/vars address, and right now pprof is listening on 0.0.0.0 by default. I assume neither of these people will want to expose externally.
| // If we hang here, probably BeeGFS itself is not reachable. | ||
| mountPoint, err := filesystem.NewFromMountPoint(initialCfg.MountPoint) | ||
| if err != nil { | ||
| logger.Fatal("unable to access BeeGFS mount point", zap.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: It'd be nice to let people benchmark without having BeeGFS mounted. This way always requires a mount even if they have set BenchmarkMockBeeGFS.
| inMountPath := "performance-testing.tmp" | ||
| path := filepath.Join(mountPath, inMountPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion(non-blocking): Maybe make this user configurable? For example you could rework developer.benchmark-mock-beegfs into something like developer.bench-beegfs-path and have it default to an empty string which will use a mock file system.
Also I noticed the temp path isn't cleaned up after the test. I think this is fine (presuming we reuse the files for subsequent tests) but should probably log at the end of the test if there were temp files created in case the user wants to clean up.
| if err := benchmarkSubmit(benchmarkCtx, jobMgr, measurementChan, target, targetType, workRequests, inMountPath, count); err != nil { | ||
| return fmt.Errorf("failed to complete benchmark: %w", err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion(blocking): When the benchmark completes this returns immediately without waiting for benchmarkMeasure() to finish printing results. If the target type is mocked this means the results are interspersed with other logs from shutting down:
2025-04-07T21:25:21.776Z info connected to node {"component": "workermgr", "nodeID": "0", "nodeName": "test-node-0"}
Create temporary files...
Temporary files created. Path: performance-testing.tmp
Run performance benchmark against mocked target...
Performance benchmark complete.
2025-04-07T21:25:33.765Z info shutdown signal received
2025-04-07T21:25:33.765Z info attempting to stop gRPC server {"component": "server"}
2025-04-07T21:25:33.765Z info shutting down because the app is shutting down {"component": "job"}
2025-04-07T21:25:33.765Z info Lifetime L0 stalled for: 0s {"component": "job", "database": "pathDB"}
Benchmark Complete.
Files transferred: 100000
Duration: 6.17 sec
Average: 16200.22 jobs/s
Average Without Outliers: 17050.87 jobs/s
Maximum: 22781.38 jobs/s
Minimum: 10681.30 jobs/s
2025-04-07T21:25:33.815Z info
Level 0 [ ]: NumTables: 03. Size: 60 MiB of 0 B. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 64 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 5 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 6 [B]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level Done {"component": "job", "database": "pathDB"}
2025-04-07T21:25:33.819Z info stopped manager {"component": "job"}
2025-04-07T21:25:33.819Z info shutdown all components, exiting
| if target == 0 { | ||
| // target type is mocked so cancel immediately after testing | ||
| defer cancel() | ||
| } | ||
|
|
||
| benchmarkCtx, benchmarkCancel := context.WithCancel(ctx) | ||
| defer benchmarkCancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion(blocking): Is there ever a reason to leave Remote running after a benchmark considering the DB is a temp/benchmarking DB that will be removed when the app shuts down? It seems this just has the potential to cause confusion if people start actually using the app normally.
|
|
||
| logger.Info("shutdown all components, exiting") | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion(blocking): I'd prefer to keep the main.go file limited to the main function as much as possible. Could we add an internal/dev package with the benchmark functionality?
I think we could set this up similarly to the other components where there is a dev.NewBench() function that returns a BenchMgr type that also has a logger so we can eliminate the Printf statements in the benchmark functions and ensures the benchmark output always is logged where the user expects.
As part of this you could make the Developer section of the AppConfig a named struct that lives in the dev package so you can just pass that to NewBench(). That would eliminate a lot of the setup inside the main() function. For example ideally above we would just call BenchMgr.Run() if benchmarkCount != 0 above.
Support for (https://github.com/ThinkParQ/bee-remote/issues/31): Add Benchmarking Support for Remote Targets
This change introduces developer-facing benchmarking support to assist with evaluating performance during development.
Key updates:
--developer.benchmark-*arguments for simulating job workloads and uploading results to remote storage targets.MockNode.disconnect()where theErrorargument index was incorrect.These changes help facilitate performance validation and enable development workflows without depending on live infrastructure. Here's an example output and artifacts produced.
$ go run rst/remote/cmd/beegfs-remote/main.go \ --cfg-file /etc/beegfs/beegfs-remote.toml \ --mount-point /mnt/beegfs \ --developer.benchmark-count 100000 \ --developer.benchmark-target 0 \ --developer.benchmark-mock-work-requests 1 \ --developer.benchmark-mock-beegfs Mocking BeeGFS mount point Create temporary files... Temporary files created. Path: performance-testing.tmp Run performance benchmark against mocked target... Performance benchmark complete. Benchmark Complete. Files transferred: 100000 Duration: 6.07 sec Average: 16466.71 jobs/s Average Without Outliers: 16533.14 jobs/s Maximum: 17791.62 jobs/s Minimum: 15124.06 jobs/sresults.csv
summary.log
pprof-debug-vars.ndjson