feat(rst): Add bee-remote performance benchmarking #147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

swartzn wants to merge 2 commits into develop from swartzn/add-developer-benchmarking

Contributor

swartzn commented Apr 2, 2025 •

edited

Loading

Support for (https://github.com/ThinkParQ/bee-remote/issues/31): Add Benchmarking Support for Remote Targets

This change introduces developer-facing benchmarking support to assist with evaluating performance during development.

Key updates:

Introduces hidden --developer.benchmark-* arguments for simulating job workloads and uploading results to remote storage targets.
Add mock remote storage target implementation with an option to mock BeeGFS mount point.
Fixes an issue in MockNode.disconnect() where the Error argument index was incorrect.

These changes help facilitate performance validation and enable development workflows without depending on live infrastructure. Here's an example output and artifacts produced.

$ go run rst/remote/cmd/beegfs-remote/main.go \
      --cfg-file /etc/beegfs/beegfs-remote.toml \
      --mount-point /mnt/beegfs \
      --developer.benchmark-count 100000 \
      --developer.benchmark-target 0 \
      --developer.benchmark-mock-work-requests 1 \
      --developer.benchmark-mock-beegfs
Mocking BeeGFS mount point
Create temporary files...
Temporary files created. Path: performance-testing.tmp
Run performance benchmark against mocked target...
Performance benchmark complete.
Benchmark Complete.

Files transferred: 100000
Duration: 6.07 sec
Average: 16466.71 jobs/s
Average Without Outliers: 16533.14 jobs/s
Maximum: 17791.62 jobs/s
Minimum: 15124.06 jobs/s

$ tree benchmark-1743605081/
benchmark-1743605081/
├── pprof-debug-vars.ndjson
├── results.csv
└── summary.log

results.csv
summary.log
pprof-debug-vars.ndjson


          feat(rst): Add bee-remote performance benchmarking

e19bf00

- Add hidden arguments for simulating job workloads and uploading results to remote storage targetswq
- Support mocking BeeGFS and storage target
- Fix MockNode.disconnect Error index

swartzn force-pushed the swartzn/add-developer-benchmarking branch from 115626c to e19bf00 Compare

April 2, 2025 19:13


          Update BeeRemote notice

78c4f58

swartzn requested a review from iamjoemccormick

April 3, 2025 13:28

iamjoemccormick requested changes

View reviewed changes

Member

iamjoemccormick left a comment

Overall benchmarking functionality looks good! I had a handful of suggestions on how to integrate this into Remote and polish some of the UX around the developer options (not all of which were introduced in this PR).

rst/remote/NOTICE.md

Member

iamjoemccormick Apr 7, 2025

question: what do you have to do to get make generate-notices to work on your machine? I was going to add a KB for that.

rst/remote/cmd/beegfs-remote/main.go

    
              	pflag.Bool("developer.dump-config", false, "Dump the full configuration and immediately exit.")

              	pflag.CommandLine.MarkHidden("developer.dump-config")

              	pflag.Uint32("developer.benchmark-target", 0, "Specify a target to perform an upload sync operation (0 uses mock target)")

              	pflag.Int("developer.benchmark-count", 0, "Specify a target to perform an upload sync operation (0 disables performance benchmarking)")

Member

iamjoemccormick Apr 7, 2025

todo(blocking): I don't think the help text is correct on this option (seems to be a partial copy/paste of benchmark-target).

rst/remote/cmd/beegfs-remote/main.go

Comment on lines +137 to +138

    
              		dbPath := "/var/lib/beegfs/remote/benchmark.badger"

              		defer os.RemoveAll(dbPath)

Member

iamjoemccormick Apr 7, 2025

suggestion(blocking): We should also double check whatever path we use to the benchmark DB does not already exist and refuse to startup if it does (and don't delete it).

rst/remote/cmd/beegfs-remote/main.go

Comment on lines +140 to +143

    
              		if err := pflag.CommandLine.Set("job.path-db", dbPath); err != nil {

              			fmt.Printf("unable to set temporary BadgerDB path for performance benchmarking: %s\n", err)

              			os.Exit(1)

              		}

Member

iamjoemccormick Apr 7, 2025

suggestion(non-blocking): We should probably support if the user wants to use a custom BadgerDB path (say to a NVMeOf attached disk). It would be helpful to be able to benchmark using the actual device used for BadgerDB.

Maybe user the job.path-db specified by the user (or the default) but create the benchmark DB directory under the parent directory of the actual DB? Maybe as bench.path.badger to avoid conflicts with Sync if we add a benchmark mode there later on.

rst/remote/cmd/beegfs-remote/main.go

Comment on lines +144 to +148

    
              		pprofPort := "16060"

              		if err := pflag.CommandLine.Set("developer.perf-profiling-port", pprofPort); err != nil {

              			fmt.Printf("unable to set temporary BadgerDB path for performance benchmarking: %s\n", err)

              			os.Exit(1)

              		}

Member

iamjoemccormick Apr 7, 2025

suggestion(blocking): Now that we use it for multiple things how about we rename developer.perf-profiling-port to developer.address and set a default IP and port (maybe 127.0.0.1:6060 - I think 6060 is canonical for pprof) using the flag default instead of here?

Then have a separate developer.enable-pprof option to control enabling pprof instead of controlling if pprof is enabled based on whether a port is set.

I also think this is important because I'm not sure we should use the gRPC server address as the "dev tools" benchmark debug/vars address, and right now pprof is listening on 0.0.0.0 by default. I assume neither of these people will want to expose externally.

rst/remote/cmd/beegfs-remote/main.go

    
              	// If we hang here, probably BeeGFS itself is not reachable.

              	mountPoint, err := filesystem.NewFromMountPoint(initialCfg.MountPoint)

              	if err != nil {

              		logger.Fatal("unable to access BeeGFS mount point", zap.Error(err))

Member

iamjoemccormick Apr 7, 2025

nitpick: It'd be nice to let people benchmark without having BeeGFS mounted. This way always requires a mount even if they have set BenchmarkMockBeeGFS.

rst/remote/cmd/beegfs-remote/main.go

Comment on lines +332 to +333

    
              	inMountPath := "performance-testing.tmp"

              	path := filepath.Join(mountPath, inMountPath)

Member

iamjoemccormick Apr 7, 2025

suggestion(non-blocking): Maybe make this user configurable? For example you could rework developer.benchmark-mock-beegfs into something like developer.bench-beegfs-path and have it default to an empty string which will use a mock file system.

Also I noticed the temp path isn't cleaned up after the test. I think this is fine (presuming we reuse the files for subsequent tests) but should probably log at the end of the test if there were temp files created in case the user wants to clean up.

rst/remote/cmd/beegfs-remote/main.go

Comment on lines +353 to +355

    
              	if err := benchmarkSubmit(benchmarkCtx, jobMgr, measurementChan, target, targetType, workRequests, inMountPath, count); err != nil {

              		return fmt.Errorf("failed to complete benchmark: %w", err)

              	}

Member

iamjoemccormick Apr 7, 2025

suggestion(blocking): When the benchmark completes this returns immediately without waiting for benchmarkMeasure() to finish printing results. If the target type is mocked this means the results are interspersed with other logs from shutting down:

2025-04-07T21:25:21.776Z	info	connected to node	{"component": "workermgr", "nodeID": "0", "nodeName": "test-node-0"}
Create temporary files...
Temporary files created. Path: performance-testing.tmp
Run performance benchmark against mocked target...
Performance benchmark complete.
2025-04-07T21:25:33.765Z	info	shutdown signal received
2025-04-07T21:25:33.765Z	info	attempting to stop gRPC server	{"component": "server"}
2025-04-07T21:25:33.765Z	info	shutting down because the app is shutting down	{"component": "job"}
2025-04-07T21:25:33.765Z	info	Lifetime L0 stalled for: 0s	{"component": "job", "database": "pathDB"}
Benchmark Complete.

Files transferred: 100000
Duration: 6.17 sec
Average: 16200.22 jobs/s
Average Without Outliers: 17050.87 jobs/s
Maximum: 22781.38 jobs/s
Minimum: 10681.30 jobs/s

2025-04-07T21:25:33.815Z	info	
Level 0 [ ]: NumTables: 03. Size: 60 MiB of 0 B. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 64 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 5 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 6 [B]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level Done	{"component": "job", "database": "pathDB"}
2025-04-07T21:25:33.819Z	info	stopped manager	{"component": "job"}
2025-04-07T21:25:33.819Z	info	shutdown all components, exiting

rst/remote/cmd/beegfs-remote/main.go

Comment on lines +324 to +330

    
              	if target == 0 {

              		// target type is mocked so cancel immediately after testing

              		defer cancel()

              	}

              	benchmarkCtx, benchmarkCancel := context.WithCancel(ctx)

              	defer benchmarkCancel()

Member

iamjoemccormick Apr 7, 2025

suggestion(blocking): Is there ever a reason to leave Remote running after a benchmark considering the DB is a temp/benchmarking DB that will be removed when the app shuts down? It seems this just has the potential to cause confusion if people start actually using the app normally.

rst/remote/cmd/beegfs-remote/main.go

    
              	logger.Info("shutdown all components, exiting")

              }

Member

iamjoemccormick Apr 7, 2025

suggestion(blocking): I'd prefer to keep the main.go file limited to the main function as much as possible. Could we add an internal/dev package with the benchmark functionality?

I think we could set this up similarly to the other components where there is a dev.NewBench() function that returns a BenchMgr type that also has a logger so we can eliminate the Printf statements in the benchmark functions and ensures the benchmark output always is logged where the user expects.

As part of this you could make the Developer section of the AppConfig a named struct that lives in the dev package so you can just pass that to NewBench(). That would eliminate a lot of the setup inside the main() function. For example ideally above we would just call BenchMgr.Run() if benchmarkCount != 0 above.

iamjoemccormick changed the base branch from main to develop

April 24, 2025 16:23

iamjoemccormick requested a review from a team as a code owner

April 24, 2025 16:23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet