Skip to content

Conversation

@swartzn
Copy link
Contributor

@swartzn swartzn commented Apr 2, 2025

Support for (https://github.com/ThinkParQ/bee-remote/issues/31): Add Benchmarking Support for Remote Targets

This change introduces developer-facing benchmarking support to assist with evaluating performance during development.

Key updates:

  • Introduces hidden --developer.benchmark-* arguments for simulating job workloads and uploading results to remote storage targets.
  • Add mock remote storage target implementation with an option to mock BeeGFS mount point.
  • Fixes an issue in MockNode.disconnect() where the Error argument index was incorrect.

These changes help facilitate performance validation and enable development workflows without depending on live infrastructure. Here's an example output and artifacts produced.

$ go run rst/remote/cmd/beegfs-remote/main.go \
      --cfg-file /etc/beegfs/beegfs-remote.toml \
      --mount-point /mnt/beegfs \
      --developer.benchmark-count 100000 \
      --developer.benchmark-target 0 \
      --developer.benchmark-mock-work-requests 1 \
      --developer.benchmark-mock-beegfs
Mocking BeeGFS mount point
Create temporary files...
Temporary files created. Path: performance-testing.tmp
Run performance benchmark against mocked target...
Performance benchmark complete.
Benchmark Complete.

Files transferred: 100000
Duration: 6.07 sec
Average: 16466.71 jobs/s
Average Without Outliers: 16533.14 jobs/s
Maximum: 17791.62 jobs/s
Minimum: 15124.06 jobs/s
$ tree benchmark-1743605081/
benchmark-1743605081/
├── pprof-debug-vars.ndjson
├── results.csv
└── summary.log

results.csv
summary.log
pprof-debug-vars.ndjson

- Add hidden arguments for simulating job workloads and uploading results to remote storage targetswq
- Support mocking BeeGFS and storage target
- Fix MockNode.disconnect Error index
@swartzn swartzn force-pushed the swartzn/add-developer-benchmarking branch from 115626c to e19bf00 Compare April 2, 2025 19:13
@swartzn swartzn requested a review from iamjoemccormick April 3, 2025 13:28
Copy link
Member

@iamjoemccormick iamjoemccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall benchmarking functionality looks good! I had a handful of suggestions on how to integrate this into Remote and polish some of the UX around the developer options (not all of which were introduced in this PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what do you have to do to get make generate-notices to work on your machine? I was going to add a KB for that.

pflag.Bool("developer.dump-config", false, "Dump the full configuration and immediately exit.")
pflag.CommandLine.MarkHidden("developer.dump-config")
pflag.Uint32("developer.benchmark-target", 0, "Specify a target to perform an upload sync operation (0 uses mock target)")
pflag.Int("developer.benchmark-count", 0, "Specify a target to perform an upload sync operation (0 disables performance benchmarking)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo(blocking): I don't think the help text is correct on this option (seems to be a partial copy/paste of benchmark-target).

Comment on lines +137 to +138
dbPath := "/var/lib/beegfs/remote/benchmark.badger"
defer os.RemoveAll(dbPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion(blocking): We should also double check whatever path we use to the benchmark DB does not already exist and refuse to startup if it does (and don't delete it).

Comment on lines +140 to +143
if err := pflag.CommandLine.Set("job.path-db", dbPath); err != nil {
fmt.Printf("unable to set temporary BadgerDB path for performance benchmarking: %s\n", err)
os.Exit(1)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion(non-blocking): We should probably support if the user wants to use a custom BadgerDB path (say to a NVMeOf attached disk). It would be helpful to be able to benchmark using the actual device used for BadgerDB.

Maybe user the job.path-db specified by the user (or the default) but create the benchmark DB directory under the parent directory of the actual DB? Maybe as bench.path.badger to avoid conflicts with Sync if we add a benchmark mode there later on.

Comment on lines +144 to +148
pprofPort := "16060"
if err := pflag.CommandLine.Set("developer.perf-profiling-port", pprofPort); err != nil {
fmt.Printf("unable to set temporary BadgerDB path for performance benchmarking: %s\n", err)
os.Exit(1)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion(blocking): Now that we use it for multiple things how about we rename developer.perf-profiling-port to developer.address and set a default IP and port (maybe 127.0.0.1:6060 - I think 6060 is canonical for pprof) using the flag default instead of here?

Then have a separate developer.enable-pprof option to control enabling pprof instead of controlling if pprof is enabled based on whether a port is set.

I also think this is important because I'm not sure we should use the gRPC server address as the "dev tools" benchmark debug/vars address, and right now pprof is listening on 0.0.0.0 by default. I assume neither of these people will want to expose externally.

// If we hang here, probably BeeGFS itself is not reachable.
mountPoint, err := filesystem.NewFromMountPoint(initialCfg.MountPoint)
if err != nil {
logger.Fatal("unable to access BeeGFS mount point", zap.Error(err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: It'd be nice to let people benchmark without having BeeGFS mounted. This way always requires a mount even if they have set BenchmarkMockBeeGFS.

Comment on lines +332 to +333
inMountPath := "performance-testing.tmp"
path := filepath.Join(mountPath, inMountPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion(non-blocking): Maybe make this user configurable? For example you could rework developer.benchmark-mock-beegfs into something like developer.bench-beegfs-path and have it default to an empty string which will use a mock file system.

Also I noticed the temp path isn't cleaned up after the test. I think this is fine (presuming we reuse the files for subsequent tests) but should probably log at the end of the test if there were temp files created in case the user wants to clean up.

Comment on lines +353 to +355
if err := benchmarkSubmit(benchmarkCtx, jobMgr, measurementChan, target, targetType, workRequests, inMountPath, count); err != nil {
return fmt.Errorf("failed to complete benchmark: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion(blocking): When the benchmark completes this returns immediately without waiting for benchmarkMeasure() to finish printing results. If the target type is mocked this means the results are interspersed with other logs from shutting down:

2025-04-07T21:25:21.776Z	info	connected to node	{"component": "workermgr", "nodeID": "0", "nodeName": "test-node-0"}
Create temporary files...
Temporary files created. Path: performance-testing.tmp
Run performance benchmark against mocked target...
Performance benchmark complete.
2025-04-07T21:25:33.765Z	info	shutdown signal received
2025-04-07T21:25:33.765Z	info	attempting to stop gRPC server	{"component": "server"}
2025-04-07T21:25:33.765Z	info	shutting down because the app is shutting down	{"component": "job"}
2025-04-07T21:25:33.765Z	info	Lifetime L0 stalled for: 0s	{"component": "job", "database": "pathDB"}
Benchmark Complete.

Files transferred: 100000
Duration: 6.17 sec
Average: 16200.22 jobs/s
Average Without Outliers: 17050.87 jobs/s
Maximum: 22781.38 jobs/s
Minimum: 10681.30 jobs/s

2025-04-07T21:25:33.815Z	info	
Level 0 [ ]: NumTables: 03. Size: 60 MiB of 0 B. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 64 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 5 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 6 [B]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level Done	{"component": "job", "database": "pathDB"}
2025-04-07T21:25:33.819Z	info	stopped manager	{"component": "job"}
2025-04-07T21:25:33.819Z	info	shutdown all components, exiting

Comment on lines +324 to +330
if target == 0 {
// target type is mocked so cancel immediately after testing
defer cancel()
}

benchmarkCtx, benchmarkCancel := context.WithCancel(ctx)
defer benchmarkCancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion(blocking): Is there ever a reason to leave Remote running after a benchmark considering the DB is a temp/benchmarking DB that will be removed when the app shuts down? It seems this just has the potential to cause confusion if people start actually using the app normally.


logger.Info("shutdown all components, exiting")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion(blocking): I'd prefer to keep the main.go file limited to the main function as much as possible. Could we add an internal/dev package with the benchmark functionality?

I think we could set this up similarly to the other components where there is a dev.NewBench() function that returns a BenchMgr type that also has a logger so we can eliminate the Printf statements in the benchmark functions and ensures the benchmark output always is logged where the user expects.

As part of this you could make the Developer section of the AppConfig a named struct that lives in the dev package so you can just pass that to NewBench(). That would eliminate a lot of the setup inside the main() function. For example ideally above we would just call BenchMgr.Run() if benchmarkCount != 0 above.

@iamjoemccormick iamjoemccormick changed the base branch from main to develop April 24, 2025 16:23
@iamjoemccormick iamjoemccormick requested a review from a team as a code owner April 24, 2025 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants