Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log details to metadata for run analytics #992

Merged
merged 67 commits into from
Mar 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
c0ac767
add `uses_llmfoundry`, `model_name`, and `llmfoundry_run_type`
angel-ruiz7 Feb 23, 2024
fccabd9
Merge branch 'main' into angel/log-data-for-run-analytics
angel-ruiz7 Feb 23, 2024
fe8948b
remove `uses_llmfoundry` flag, prefix with `mosaicml/llmfoundry/`, an…
angel-ruiz7 Feb 26, 2024
faede2c
Merge branch 'angel/log-data-for-run-analytics' of github.com:mosaicm…
angel-ruiz7 Feb 26, 2024
1e28dec
get `model_name` from `pretrained_model_name_or_path`
angel-ruiz7 Feb 28, 2024
a935c42
Merge branch 'main' of github.com:mosaicml/llm-foundry into angel/log…
angel-ruiz7 Feb 28, 2024
e8bab05
fix quotes
angel-ruiz7 Feb 28, 2024
f49d6b3
add TODO comments and remove redundant flushing from `eval.py`
angel-ruiz7 Feb 28, 2024
f726fe2
get `llmfoundry_run_subtype` for training runs
angel-ruiz7 Feb 28, 2024
40e3b83
check for `mosaicml_logger` before logging
angel-ruiz7 Feb 28, 2024
599807c
fix `reportUnboundVariable` linting error
angel-ruiz7 Feb 28, 2024
a46cc8b
add `tokenizer` and train/eval loader names
angel-ruiz7 Feb 28, 2024
ed2dead
use brackets to get `name` from `model_config`
angel-ruiz7 Feb 28, 2024
4165d59
add cloud provider from load / save paths
angel-ruiz7 Mar 2, 2024
598ccb1
add `num_workers` for `eval_loader` and `train_loader`
angel-ruiz7 Mar 2, 2024
7cb9e0f
Merge branch 'main' into angel/log-data-for-run-analytics
angel-ruiz7 Mar 2, 2024
43be314
try to fix key error
angel-ruiz7 Mar 2, 2024
09dcd56
Revert "try to fix key error"
angel-ruiz7 Mar 2, 2024
7aa5f34
add `d_model`, `callbacks`, and `vocab_size`
angel-ruiz7 Mar 4, 2024
8afa7ad
format, add `n_heads`
angel-ruiz7 Mar 5, 2024
e296693
Merge branch 'main' into angel/log-data-for-run-analytics
angel-ruiz7 Mar 5, 2024
336d697
format + support `ListConfig` and `DictConfig` for `eval_loader_config`
angel-ruiz7 Mar 6, 2024
850e587
fix access issues with `loader_config`
angel-ruiz7 Mar 6, 2024
d098c6c
use `get()` instead of brackets
angel-ruiz7 Mar 7, 2024
dcaf7a7
use `get` instead of brackets
angel-ruiz7 Mar 7, 2024
5e5fde1
Merge branch 'main' into angel/log-data-for-run-analytics
angel-ruiz7 Mar 7, 2024
dc074de
format code
angel-ruiz7 Mar 7, 2024
c123571
Merge branch 'angel/log-data-for-run-analytics' of github.com:mosaicm…
angel-ruiz7 Mar 7, 2024
347ff19
add `gauntlet_configured` and `icl_configured` fields
angel-ruiz7 Mar 7, 2024
2d66f5a
make sure to not pass in `mosaicml_logger` when `None`
angel-ruiz7 Mar 7, 2024
12e9d39
run formatter
angel-ruiz7 Mar 7, 2024
76fe448
Merge branch 'main' into angel/log-data-for-run-analytics
aspfohl Mar 11, 2024
65bfb66
make small tweaks to naming + access values more carefully
angel-ruiz7 Mar 11, 2024
c1c39ab
Merge branch 'angel/log-data-for-run-analytics' of github.com:mosaicm…
angel-ruiz7 Mar 11, 2024
71383d1
run formatter
angel-ruiz7 Mar 11, 2024
81af3d6
add `False` values to `gauntlet_configured` and `icl_configured`
angel-ruiz7 Mar 11, 2024
cebe1cc
for `eval.py` log metrics inside of `main`
angel-ruiz7 Mar 11, 2024
1f026a9
remove `tokenizer_name` from eval logs
angel-ruiz7 Mar 11, 2024
9edb0cb
fix typing of `metrics` Dictionaries
angel-ruiz7 Mar 11, 2024
2d20405
add a helper function to parse `cloud_provider_data` and `cloud_provi…
angel-ruiz7 Mar 11, 2024
ad46b31
Merge branch 'main' into angel/log-data-for-run-analytics
irenedea Mar 12, 2024
d8e288b
remove `mosaicml_logger = None`
angel-ruiz7 Mar 12, 2024
87a515c
remove comment
angel-ruiz7 Mar 12, 2024
482a949
Merge branch 'angel/log-data-for-run-analytics' of github.com:mosaicm…
angel-ruiz7 Mar 12, 2024
a95b2fd
move helpers into `mosaicmllogger_utils.py`
angel-ruiz7 Mar 12, 2024
265d508
give a description that makes `pydocstyle` happy
angel-ruiz7 Mar 12, 2024
6aecab8
log `cloud_provider_data` and `cloud_provider_checkpoints` from compo…
angel-ruiz7 Mar 12, 2024
ca0df5e
run formatters
angel-ruiz7 Mar 12, 2024
84720dc
Merge branch 'main' into angel/log-data-for-run-analytics
angel-ruiz7 Mar 13, 2024
50b284c
Merge branch 'main' into angel/log-data-for-run-analytics
angel-ruiz7 Mar 15, 2024
7f57a8a
remove TODOs
angel-ruiz7 Mar 15, 2024
31d3b79
Merge branch 'angel/log-data-for-run-analytics' of github.com:mosaicm…
angel-ruiz7 Mar 15, 2024
564fc41
fix import
angel-ruiz7 Mar 18, 2024
ccfb4fa
Revert "fix import"
angel-ruiz7 Mar 18, 2024
3e5f9a4
combine both import methodss for `mosiacmllogger_utils`
angel-ruiz7 Mar 19, 2024
5806493
only import from utils
angel-ruiz7 Mar 19, 2024
b5ed2f3
format files
angel-ruiz7 Mar 19, 2024
07abaa7
build loggers outside of `evaluate_model`
angel-ruiz7 Mar 22, 2024
309f683
merge and resolve conflicts
angel-ruiz7 Mar 22, 2024
0252283
run formatter on `__init__`
angel-ruiz7 Mar 22, 2024
a59fdc9
create `MosaicMLLogger` if it doesn't exist in `eval.py`
angel-ruiz7 Mar 22, 2024
830683e
Merge branch 'main' into angel/log-data-for-run-analytics
angel-ruiz7 Mar 22, 2024
54b78a9
docstring fixes
angel-ruiz7 Mar 22, 2024
be2294a
Merge branch 'angel/log-data-for-run-analytics' of github.com:mosaicm…
angel-ruiz7 Mar 22, 2024
78925c4
oops `loggers` is definitely supposed to be a `List`
angel-ruiz7 Mar 22, 2024
dd792cc
don't add `mosaicml_logger` if it's `None`
angel-ruiz7 Mar 22, 2024
e6b43a8
do the same thing for `train.py`
angel-ruiz7 Mar 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions llmfoundry/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@
from llmfoundry.utils.logging_utils import SpecificWarningFilter
from llmfoundry.utils.model_download_utils import (
download_from_hf_hub, download_from_http_fileserver, download_from_oras)
from llmfoundry.utils.mosaicmllogger_utils import (create_mosaicml_logger,
find_mosaicml_logger,
log_eval_analytics,
log_train_analytics)
from llmfoundry.utils.prompt_files import load_prompts, load_prompts_from_file
from llmfoundry.utils.registry_utils import (TypedRegistry,
construct_from_registry,
Expand Down Expand Up @@ -59,4 +63,8 @@
'create_registry',
'construct_from_registry',
'TypedRegistry',
'find_mosaicml_logger',
'log_eval_analytics',
'log_train_analytics',
'create_mosaicml_logger',
]
154 changes: 154 additions & 0 deletions llmfoundry/utils/mosaicmllogger_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# Copyright 2024 MosaicML LLM Foundry authors
# SPDX-License-Identifier: Apache-2.0
import json
import os
from typing import Any, Dict, List, Optional, Union

from composer.loggers import MosaicMLLogger
from composer.loggers.logger_destination import LoggerDestination
from composer.loggers.mosaicml_logger import (MOSAICML_ACCESS_TOKEN_ENV_VAR,
MOSAICML_PLATFORM_ENV_VAR)
from omegaconf import DictConfig, ListConfig


def create_mosaicml_logger() -> Union[MosaicMLLogger, None]:
"""Creates a MosaicMLLogger if the run was sent from the Mosaic platform."""
if os.environ.get(MOSAICML_PLATFORM_ENV_VAR, 'false').lower(
) == 'true' and os.environ.get(MOSAICML_ACCESS_TOKEN_ENV_VAR):
# Adds mosaicml logger to composer if the run was sent from Mosaic platform,
# access token is set, and mosaic logger wasn't previously added
return MosaicMLLogger()


def find_mosaicml_logger(
angel-ruiz7 marked this conversation as resolved.
Show resolved Hide resolved
loggers: List[LoggerDestination]) -> Union[MosaicMLLogger, None]:
"""Returns the first MosaicMLLogger from a list, and None otherwise."""
return next(
(logger for logger in loggers if isinstance(logger, MosaicMLLogger)),
None)


def log_eval_analytics(mosaicml_logger: MosaicMLLogger,
model_configs: ListConfig, icl_tasks: Union[str,
ListConfig],
eval_gauntlet_config: Optional[Union[str, DictConfig]]):
"""Logs analytics for runs using the `eval.py` script."""
metrics: Dict[str, Any] = {
'llmfoundry/script': 'eval',
}

if eval_gauntlet_config is not None:
metrics['llmfoundry/gauntlet_configured'] = True
else:
metrics['llmfoundry/gauntlet_configured'] = False

if isinstance(icl_tasks, str):
metrics['llmfoundry/icl_configured'] = True
elif len(icl_tasks) > 0:
metrics['llmfoundry/icl_configured'] = True
else:
metrics['llmfoundry/icl_configured'] = False

metrics['llmfoundry/model_configs'] = []
for model_config in model_configs:
model_config_data = {}
if model_config.get('vocab_size', None) is not None:
model_config_data['vocab_size'] = model_config.get('vocab_size')
if model_config.get('d_model', None) is not None:
model_config_data['d_model'] = model_config.get('d_model')
if model_config.get('n_heads', None) is not None:
model_config_data['n_heads'] = model_config.get('n_heads')

if len(model_config_data) > 0:
metrics['llmfoundry/model_configs'].append(
json.dumps(model_config_data, sort_keys=True))
mosaicml_logger.log_metrics(metrics)
mosaicml_logger._flush_metadata(force_flush=True)


def log_train_analytics(mosaicml_logger: MosaicMLLogger,
model_config: DictConfig,
train_loader_config: DictConfig,
eval_loader_config: Union[DictConfig, ListConfig, None],
callback_configs: Union[DictConfig, None],
tokenizer_name: str, load_path: Union[str, None],
icl_tasks_config: Optional[Union[ListConfig, str]],
eval_gauntlet: Optional[Union[DictConfig, str]]):
"""Logs analytics for runs using the `train.py` script."""
train_loader_dataset = train_loader_config.get('dataset', {})
metrics: Dict[str, Any] = {
'llmfoundry/tokenizer_name':
tokenizer_name,
'llmfoundry/script':
'train',
'llmfoundry/train_loader_name':
train_loader_config.get('name'),
'llmfoundry/train_loader_workers':
train_loader_dataset.get('num_workers'),
}

if callback_configs is not None:
metrics['llmfoundry/callbacks'] = [
name for name, _ in callback_configs.items()
]

if eval_gauntlet is not None:
metrics['llmfoundry/gauntlet_configured'] = True
else:
metrics['llmfoundry/gauntlet_configured'] = False

if icl_tasks_config is not None:
if isinstance(icl_tasks_config, str):
metrics['llmfoundry/icl_configured'] = True
elif len(icl_tasks_config) > 0:
metrics['llmfoundry/icl_configured'] = True
else:
metrics['llmfoundry/icl_configured'] = False
else:
metrics['llmfoundry/icl_configured'] = False

if train_loader_dataset.get('hf_name', None) is not None:
metrics['llmfoundry/train_dataset_hf_name'] = train_loader_dataset.get(
'hf_name', None)
if train_loader_config.get('name') == 'finetuning':
metrics['llmfoundry/train_task_type'] = 'INSTRUCTION_FINETUNE'
elif train_loader_config.get('name') == 'text':
if load_path is not None or model_config.get('pretrained') == True:
metrics['llmfoundry/train_task_type'] = 'CONTINUED_PRETRAIN'
else:
metrics['llmfoundry/train_task_type'] = 'PRETRAIN'

if eval_loader_config is not None:
metrics['llmfoundry/eval_loaders'] = []

if isinstance(eval_loader_config, ListConfig):
eval_loader_configs: ListConfig = eval_loader_config
else:
eval_loader_configs = ListConfig([eval_loader_config])

for loader_config in eval_loader_configs:
eval_loader_info = {}
eval_loader_dataset = loader_config.get('dataset', {})
eval_loader_info['name'] = loader_config.get('name')
eval_loader_info['num_workers'] = eval_loader_dataset.get(
'num_workers', None)
if eval_loader_dataset.get('hf_name', None) is not None:
eval_loader_info['dataset_hf_name'] = eval_loader_dataset.get(
'hf_name')

# Log as a key-sorted JSON string, so that we can easily parse it in Spark / SQL
metrics['llmfoundry/eval_loaders'].append(
json.dumps(eval_loader_info, sort_keys=True))

if model_config['name'] == 'hf_casual_lm':
metrics['llmfoundry/model_name'] = model_config.get(
'pretrained_model_name_or_path')
if model_config.get('vocab_size', None) is not None:
metrics['llmfoundry/vocab_size'] = model_config.get('vocab_size'),
if model_config.get('d_model', None) is not None:
metrics['llmfoundry/d_model'] = model_config.get('d_model')
if model_config.get('n_heads', None) is not None:
metrics['llmfoundry/n_heads'] = model_config.get('n_heads')

mosaicml_logger.log_metrics(metrics)
mosaicml_logger._flush_metadata(force_flush=True)
40 changes: 24 additions & 16 deletions scripts/eval/eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@

import pandas as pd
import torch
from composer.loggers import MosaicMLLogger
from composer.loggers.logger_destination import LoggerDestination
from composer.models.base import ComposerModel
from composer.trainer import Trainer
Expand All @@ -21,6 +20,9 @@
from rich.traceback import install
from transformers import PreTrainedTokenizerBase

from llmfoundry.utils import (create_mosaicml_logger, find_mosaicml_logger,
log_eval_analytics)

install()
from llmfoundry.models.model_registry import COMPOSER_MODEL_REGISTRY
from llmfoundry.utils.builders import (add_metrics_to_eval_loaders,
Expand Down Expand Up @@ -69,7 +71,7 @@ def evaluate_model(
eval_loader_config: Optional[Union[DictConfig, ListConfig]],
fsdp_config: Optional[Dict],
num_retries: int,
loggers_cfg: Dict[str, Any],
loggers: List[LoggerDestination],
python_log_level: Optional[str],
precision: str,
eval_gauntlet_df: Optional[pd.DataFrame],
Expand Down Expand Up @@ -103,20 +105,9 @@ def evaluate_model(
if eval_gauntlet_callback is not None:
callbacks.append(eval_gauntlet_callback)

loggers: List[LoggerDestination] = [
build_logger(name, logger_cfg)
for name, logger_cfg in loggers_cfg.items()
]

if metadata is not None:
# Flatten the metadata for logging
loggers_cfg.pop('metadata', None)
loggers_cfg.update(metadata, merge=True)

# Find the MosaicMLLogger
mosaicml_logger = next((
logger for logger in loggers if isinstance(logger, MosaicMLLogger)),
None)
mosaicml_logger = find_mosaicml_logger(loggers)

if mosaicml_logger is not None:
mosaicml_logger.log_metrics(metadata)
Expand Down Expand Up @@ -153,7 +144,6 @@ def evaluate_model(
assert composer_model is not None

log.info(f'Building trainer for {model_cfg.model_name}...')

trainer = Trainer(
run_name=run_name,
seed=seed,
Expand Down Expand Up @@ -297,6 +287,24 @@ def main(cfg: DictConfig) -> Tuple[List[Trainer], pd.DataFrame]:
models_df = None
composite_scores = None
trainers = []

loggers: List[LoggerDestination] = [
build_logger(name, logger_cfg)
angel-ruiz7 marked this conversation as resolved.
Show resolved Hide resolved
for name, logger_cfg in loggers_cfg.items()
]

mosaicml_logger = find_mosaicml_logger(loggers)
if mosaicml_logger is None:
mosaicml_logger = create_mosaicml_logger()
# mosaicml_logger will be None if run isn't on MosaicML platform
if mosaicml_logger is not None:
loggers.append(mosaicml_logger)

# mosaicml_logger will be None if the run isn't from the MosaicML platform
if mosaicml_logger is not None:
log_eval_analytics(mosaicml_logger, model_configs, icl_tasks,
eval_gauntlet_config)

for model_cfg in model_configs:
(trainer, logger_keys, eval_gauntlet_callback,
eval_gauntlet_df) = evaluate_model(
Expand All @@ -311,7 +319,7 @@ def main(cfg: DictConfig) -> Tuple[List[Trainer], pd.DataFrame]:
eval_loader_config=eval_loader_config,
fsdp_config=fsdp_config,
num_retries=num_retries,
loggers_cfg=loggers_cfg,
loggers=loggers,
python_log_level=python_log_level,
precision=precision,
eval_gauntlet_df=eval_gauntlet_df,
Expand Down
23 changes: 13 additions & 10 deletions scripts/train/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,6 @@
import torch
from composer import Trainer
from composer.core.callback import Callback
from composer.loggers import MosaicMLLogger
from composer.loggers.mosaicml_logger import (MOSAICML_ACCESS_TOKEN_ENV_VAR,
MOSAICML_PLATFORM_ENV_VAR)
from composer.metrics.nlp import InContextLearningMetric
from composer.profiler import (JSONTraceHandler, Profiler, TraceHandler,
cyclic_schedule)
Expand All @@ -23,6 +20,9 @@
from omegaconf import OmegaConf as om
from rich.traceback import install

from llmfoundry.utils import (create_mosaicml_logger, find_mosaicml_logger,
log_train_analytics)

install()

from transformers import PreTrainedTokenizerBase
Expand Down Expand Up @@ -449,14 +449,11 @@ def main(cfg: DictConfig) -> Trainer:
for name, logger_cfg in logger_configs.items()
] if logger_configs else []

mosaicml_logger = next(
(logger for logger in loggers if isinstance(logger, MosaicMLLogger)),
None)
mosaicml_logger = find_mosaicml_logger(loggers)
if mosaicml_logger is None:
if os.environ.get(MOSAICML_PLATFORM_ENV_VAR, 'false').lower(
) == 'true' and os.environ.get(MOSAICML_ACCESS_TOKEN_ENV_VAR):
# Adds mosaicml logger to composer if the run was sent from Mosaic platform, access token is set, and mosaic logger wasn't previously added
mosaicml_logger = MosaicMLLogger()
mosaicml_logger = create_mosaicml_logger()
if mosaicml_logger is not None:
# mosaicml_logger will be None if run isn't on MosaicML platform
loggers.append(mosaicml_logger)

if metadata is not None:
Expand Down Expand Up @@ -543,6 +540,12 @@ def main(cfg: DictConfig) -> Trainer:
if eval_gauntlet_callback is not None:
callbacks.append(eval_gauntlet_callback)

if mosaicml_logger is not None:
log_train_analytics(mosaicml_logger, model_config, train_loader_config,
eval_loader_config, callback_configs,
tokenizer_name, load_path, icl_tasks_config,
eval_gauntlet_config)

# Build Model
log.info('Initializing model...')
with init_context:
Expand Down
Loading