Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
77488cf
commit
Oct 8, 2025
feb4771
commit
Oct 8, 2025
41ceaa4
update backend role typehints and enum
Oct 8, 2025
8a24e71
update where we check FORGE_DISABLE_METRICS
Oct 8, 2025
3f3bc51
remove protected import
Oct 8, 2025
d82c354
Merge branch 'timestamp_logging_diff1' into timestamp_logging_diff2
Oct 8, 2025
4fe2611
protect import
Oct 8, 2025
8759bc8
Merge branch 'timestamp_logging_diff1' into timestamp_logging_diff2
Oct 8, 2025
fbb4a9e
Merge branch 'main' of https://github.com/meta-pytorch/forge into tim…
Oct 8, 2025
d81a4ed
record_metric uses dataclass Metric
Oct 8, 2025
1e2255d
commit
Oct 8, 2025
a94c612
Merge branch 'main' of https://github.com/meta-pytorch/forge into tim…
Oct 8, 2025
5b477e8
commit
Oct 9, 2025
f2b3eed
commit
Oct 9, 2025
471b88a
revert
Oct 9, 2025
1a02784
Merge branch 'timestamp_logging_diff2_5' into timestamp_logging_diff3
Oct 9, 2025
fa4895f
remove unnecessary code
Oct 9, 2025
7bb1fe7
better logging
Oct 9, 2025
43d5d27
docs/names
Oct 9, 2025
c97eb98
Merge branch 'timestamp_logging_diff2_5' into timestamp_logging_diff3
Oct 9, 2025
75355a2
commit
Oct 9, 2025
70e9c67
Merge branch 'main' of https://github.com/meta-pytorch/forge into tim…
Oct 9, 2025
12f77c9
Merge branch 'timestamp_logging_diff3' into timestamp_logging_diff4
Oct 9, 2025
1186aec
update cfg back to true
Oct 9, 2025
a02ea75
Merge branch 'main' of https://github.com/meta-pytorch/forge into tim…
Oct 13, 2025
aa00898
Merge branch 'timestamp_logging_diff3' into timestamp_logging_diff4
Oct 13, 2025
7d89f5c
Merge branch 'main' of https://github.com/meta-pytorch/forge into tim…
Oct 14, 2025
370c4e4
remove callstack, get meshname in provisioner
Oct 14, 2025
9e77930
get name from proc mesh
Oct 14, 2025
93b0cad
simplify + unit tests
Oct 14, 2025
84363b1
Merge branch 'main' of https://github.com/meta-pytorch/forge into tim…
Oct 14, 2025
e3c7a99
Merge branch 'timestamp_logging_diff3' into timestamp_logging_diff4
Oct 14, 2025
77e426b
Merge branch 'main' of https://github.com/meta-pytorch/forge into tim…
Oct 15, 2025
e901ad5
address comments
Oct 15, 2025
e42059b
Merge branch 'timestamp_logging_diff3' into timestamp_logging_diff4
Oct 15, 2025
f52408e
Merge branch 'main' of https://github.com/meta-pytorch/forge into tim…
Oct 15, 2025
6fc11bb
fix merge
Oct 15, 2025
72660f5
simplify comments
Oct 15, 2025
69f9f8c
renaming of var + better docs
Oct 15, 2025
7e74326
Merge branch 'main' of https://github.com/meta-pytorch/forge into tim…
Oct 20, 2025
3eedb8b
docs and logs
Oct 21, 2025
cd04223
Merge branch 'main' of https://github.com/meta-pytorch/forge into tim…
Oct 21, 2025
15f6496
update old configs
Oct 21, 2025
82b3e1a
increase timeout
Oct 21, 2025
9a5f5a9
raise error if cfg arg is missing
Oct 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .meta/mast/qwen3_14b_mast.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ metric_logging:
wandb:
project: "grpo-training"
group: "grpo_exp_${oc.env:USER}"
reduce_across_ranks: True
logging_mode: global_reduce
console:
reduce_across_ranks: True
logging_mode: global_reduce

# Dataset configuration
dataset:
Expand Down
4 changes: 2 additions & 2 deletions .meta/mast/qwen3_1_7b_mast.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ metric_logging:
wandb:
project: "grpo-training"
group: "grpo_exp_${oc.env:USER}"
reduce_across_ranks: True
logging_mode: global_reduce
console:
reduce_across_ranks: True
logging_mode: global_reduce

# Dataset configuration
dataset:
Expand Down
4 changes: 2 additions & 2 deletions .meta/mast/qwen3_32b_mast.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ metric_logging:
wandb:
project: "grpo-training"
group: "grpo_exp_${oc.env:USER}"
reduce_across_ranks: True
logging_mode: global_reduce
console:
reduce_across_ranks: True
logging_mode: global_reduce

# Dataset configuration
dataset:
Expand Down
4 changes: 2 additions & 2 deletions .meta/mast/qwen3_4b_mast.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ metric_logging:
wandb:
project: "grpo-training"
group: "grpo_exp_${oc.env:USER}"
reduce_across_ranks: True
logging_mode: global_reduce
console:
reduce_across_ranks: True
logging_mode: global_reduce

# Dataset configuration
dataset:
Expand Down
4 changes: 2 additions & 2 deletions .meta/mast/qwen3_8b_mast.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ metric_logging:
wandb:
project: "grpo-training"
group: "grpo_exp_${oc.env:USER}"
reduce_across_ranks: True
logging_mode: global_reduce
console:
reduce_across_ranks: True
logging_mode: global_reduce

# Dataset configuration
dataset:
Expand Down
2 changes: 1 addition & 1 deletion apps/grpo/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,7 @@ async def main(cfg: DictConfig):
else:
provisioner = await init_provisioner()

metric_logging_cfg = cfg.get("metric_logging", {"console": {"log_per_rank": False}})
metric_logging_cfg = cfg.get("metric_logging", {})
mlogger = await get_or_create_metric_logger(process_name="Controller")
await mlogger.init_backends.call_one(metric_logging_cfg)

Expand Down
8 changes: 4 additions & 4 deletions apps/grpo/qwen3_1_7b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@ rollout_threads: 1 # Recommended to set equal to policy.num_replicas
# Observability configuration
metric_logging:
wandb:
project: "grpo-training"
group: "grpo_exp_${oc.env:USER}"
reduce_across_ranks: True
project: grpo-training
group: grpo_exp_${oc.env:USER}
logging_mode: global_reduce # global_reduce, per_rank_reduce, per_rank_no_reduce
console:
reduce_across_ranks: True
logging_mode: global_reduce

# Dataset configuration
dataset:
Expand Down
8 changes: 4 additions & 4 deletions apps/grpo/qwen3_32b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@ rollout_threads: 32 # make this 4x the number of policy replicas seems to work w
# Observability configuration
metric_logging:
wandb:
project: "grpo-training"
group: "grpo_exp_${oc.env:USER}"
reduce_across_ranks: True
project: grpo-training
group: grpo_exp_${oc.env:USER}
logging_mode: global_reduce # global_reduce, per_rank_reduce, per_rank_no_reduce
console:
reduce_across_ranks: True
logging_mode: global_reduce

# Dataset configuration
dataset:
Expand Down
8 changes: 4 additions & 4 deletions apps/grpo/qwen3_8b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ off_by_n: 1 # Off by one by default
# Observability configuration
metric_logging:
wandb:
project: "grpo-training"
group: "grpo_exp_${oc.env:USER}"
reduce_across_ranks: True
project: grpo-training
group: grpo_exp_${oc.env:USER}
logging_mode: global_reduce # global_reduce, per_rank_reduce, per_rank_no_reduce
console:
reduce_across_ranks: True
logging_mode: global_reduce

# Dataset configuration
dataset:
Expand Down
3 changes: 3 additions & 0 deletions src/forge/observability/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@
from .metrics import (
BackendRole,
ConsoleBackend,
get_logger_backend_class,
LoggerBackend,
LoggingMode,
MaxAccumulator,
MeanAccumulator,
Metric,
Expand Down Expand Up @@ -43,6 +45,7 @@
"BackendRole",
# Enums
"Reduce",
"LoggingMode",
# Utility functions
"get_proc_name_with_rank",
# Actor classes
Expand Down
Loading
Loading