Skip to content

Commit

Permalink
[Fix] wandb group logging missing columns (#61)
Browse files Browse the repository at this point in the history
* Refactor logging in lmms_eval package

* Refactor variable names in lmms_eval package

* Update README.md with new features and installation instructions

* Update supported models and datasets

* Delete otter.py file

* Fix capitalization in README.md

* Update image sizes and add new features

* Refactor README.md to improve readability and add new features

* Add description for lmms-eval in README.md

* Update accelerator support in README.md

* Update lmms-eval README with improved description and additional features

* Update README.md with improved task grouping description

* change `Otter-AI/MME` to `lmms-lab/MME`

* Update README.md

* Update README.md

* Remove unused code in mme.yaml

* Squashed commit of the following:

commit b50a1d1
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Thu Feb 29 13:40:02 2024 +0800

    Dev/py add models (#57)

    * add instructblip

    * minicpm_v

    * remove <image> from qwen-vl

    * speed up postprocessing

    * Optimize build context speed

    ---------

    Co-authored-by: Pu Fanyi <FPU001@e.ntu.edu.sg>
    Co-authored-by: kcz358 <kaichenzhang358@outlook.com>

commit 671aacf
Author: Pu Fanyi <FPU001@e.ntu.edu.sg>
Date:   Wed Feb 28 14:49:07 2024 +0800

    Pufanyi/flickr30k refractor (#56)

    * refactor vizwizvqa task

    * Delete vqav2_test and vqav2_val YAML files

    * Refactor vqav2_process_results functions

    * Add a pack for vqav2

    * refactor okvqa

    * roll back vizwiz_vqa

    * Fix exact_match calculation in ok_vqa_process_results

    * Update OKVQA dataset name in readme

    * add model_specific_prompt_kwargs

    * add model_specific_prompt_kwargs to vizwiz_vqa

    * add model_specific_prompt_kwargs for vqav2

    * lint

    * fix a small bug for eval_logger

    * Refactor make_table function to display points as "  -  " if value is None

    * Merge commit '63fc8eee4dddfbe741e5a862e5ff30d19c34238e'

    * Refactor ok_vqa_aggreate_submissions function

    * Merge commit 'd16bbce134d453c624834e090af1e0f869fdde15'

    * Refactor VQA submission file saving

    * Update file utils

    * Merge commit '7332704263a45ab6fa69aad0c4303cd9cbc26813'

    * Refactor file path handling and submission generation

    * OKVQA path

    * vizwizvqa file

    * pack cmmmu

    * fix a small metric bug for cmmmu

    * Add higher_is_better flag to submission metric

    * Add CMMMU dataset to README.md

    * Add logging and refactor submission file generation in docvqa utils.py

    * pack docvqa

    * add traceback to print detailed error

    * Refactor docvqa_test_aggregate_results to accept additional arguments

    * Add metric check in evaluator.py and update test.yaml and val.yaml

    * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2

    * merge textvqa

    * textvqa

    * Modify submission file generation for COCO test results

    * Update test result storage path

    * update coco cap file name

    * Update COCO 2017 Caption dataset name

    * ferret

    * Add Ferret dataset

    * Refactor hb_doc_to_text function to include model-specific prompts

    * Add IconQA and its subtasks

    * Refactor image list creation in doc_to_visual function

    * Add process_results function to default template

    * Update process_results function in iconqa utils.py

    * refactor flickr30k

    * change aggregation function

    * Fix formatting issues and update logging message

    * Fix llava can not handle only text question (no visuals)

    * Fix qwen can not handle no image question (no visuals)

    * Add fuyu prepare accelerator scripts

    * refactor mme

    * naming consistency

    * aggregation_submissions consistency

    * flickr30k naming consistency

    * remove submissions for mme

    * remove unused submission function

    * Refactor infovqa_test.yaml and infovqa_val.yaml

    * Refactor code for improved readability and maintainability

    * stvqa

    * remane sqa

    * Update lmms_eval textcaps files and utils.py

    * Update default prompt for text captions

    * Refactor textcaps_aggregation_result function

    * Add generate_submission_file function and update mathvista_aggregate_results signature

    * Update nocaps_test.yaml and nocaps_val.yaml

    * refractor internal_eval

    * Add internal evaluation datasets

    * pack multidocvqa

    * mmvet

    * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating

    * Refractor llava wild

    * Refractor llava-bench-coco

    * Add JSON file generation for gpt evaluation details

    * mmmu

    * Remove MMBench English and Chinese tasks

    * Remove unnecessary return statement in mmbench_aggregate_test_results function

    * Fix distributed process group initialization

    * Update dataset paths and group names in mmbench test configs

    * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py

    * Add torch module import

    * lint

    * Remove IconQA dataset from README.md

    * Add Multi-DocVQA and its submodules

    * Add new datasets and update task names

    * Refactor flickr_aggregation_result function to accept additional arguments

    * Add timeout kwargs in Accelerator constructor

    * Add encoding to be utf-8 for cmmmu

    * Fix llava try and catch, remove torch.distributed.init in main

    * Ds prepare script for llava

    ---------

    Co-authored-by: JvThunder <joshuaadrianc@gmail.com>
    Co-authored-by: kcz358 <kaichenzhang358@outlook.com>

commit 521ece2
Author: Li Bo <drluodian@gmail.com>
Date:   Tue Feb 27 22:52:07 2024 +0800

    [Wandb Logger] add models, and args to wandb tables. (#55)

    * Refactor logging in lmms_eval package

    * Refactor variable names in lmms_eval package

* add llava main in pyproject

* Update README.md

* Remove unnecessary dependencies and add specific version for llava_repr

* Add dependencies for llava_repr***

* Update README.md

* add some docs on models and command line commands

* remove some lines

* typo

* Update model_guide.md

* Update model_guide.md

* Update README.md

* Update README.md

* Update README.md

* Fix refcocog dataset path

* Record gpt response in eval info

* Resolve conflict

* Fix hallusionbench gpt json saving path

* Rename hallubench gpt output path

* Change remove image to check by type instead of check by names

* More robust check by type

* Remove unnecessary img in data

* Forcing an empty commit.

* Testing

* Delete unnecessary things

* Fix seedbench2 image issue in doc_to_text

* Add conditional exclude for internal eval

* Fix small bugs in list_with_num

* Revise list_with_num model args

* Fix logging utils bug on wandb grouping

---------

Co-authored-by: Bo Li <drluodian@gmail.com>
Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
Co-authored-by: jzhang38 <a1286225768@gmail.com>
  • Loading branch information
4 people authored Mar 3, 2024
1 parent 5e1c9c7 commit f89a736
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions lmms_eval/logging_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,9 +192,15 @@ def make_table(columns: List[str], key: str = "results"):
se = dic[m + "_stderr" + "," + f]
if se != "N/A":
se = "%.4f" % se
table.add_data(*[model_name, model_args, k, version, f, n, m, str(v), str(se)])
data = [model_name, model_args, k, version, f, n, m, str(v), str(se)]
if key == "groups":
data = [self.group_names] + data
table.add_data(*data)
else:
table.add_data(*[model_name, model_args, k, version, f, n, m, str(v), ""])
data = [model_name, model_args, k, version, f, n, m, str(v), ""]
if key == "groups":
data = [self.group_names] + data
table.add_data(*data)

return table

Expand Down

0 comments on commit f89a736

Please sign in to comment.