revisited the README and max token lengths #30

shamanez · 2023-09-17T08:07:34Z

added max_token_len variables
corrected the path
checked readme executions for training end2end

SachiraKuruppu · 2023-09-17T08:14:11Z

dalm/training/rag_e2e/train_rage2e.py

        type=int,
        default=128,
        help=(
-            "The maximum total input sequence length after tokenization. Sequences longer than this will be truncated,"
+            "The maximum total passage sequence length after tokenization. Sequences longer than this will be truncated,"
+            " sequences shorter will be padded if `--pad_to_max_length` is passed."


Is the parameter --pad_to_max_length defined? I can't see it in the file.

SachiraKuruppu · 2023-09-17T08:16:01Z

dalm/training/retriever_only/train_retriever_only.py

+sys.path.append(os.getcwd())  # This is needed to import modules with absolute paths
+
+# ruff: noqa: E402
+


I would remove this empty line.

SachiraKuruppu · 2023-09-17T08:21:00Z

dalm/training/retriever_only/train_retriever_only.py

There is a --pad_to_max_length parameter referenced in argparser hints, but is not defined.

Should we also use query_max_length and passage_max_length parameters in this script similar to the train_rage2e.py script?

revisited the README and max token lengths

999449e

shamanez requested a review from SachiraKuruppu September 17, 2023 08:08

chaning the genenrator tokenizer for better visibility

c86f718

SachiraKuruppu suggested changes Sep 17, 2023

View reviewed changes

minpr modifcations

5385992

shamanez merged commit b4300e1 into main Sep 17, 2023
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revisited the README and max token lengths #30

revisited the README and max token lengths #30

shamanez commented Sep 17, 2023

SachiraKuruppu Sep 17, 2023

SachiraKuruppu Sep 17, 2023

SachiraKuruppu Sep 17, 2023

shamanez Sep 17, 2023

		sys.path.append(os.getcwd()) # This is needed to import modules with absolute paths

		# ruff: noqa: E402

revisited the README and max token lengths #30

revisited the README and max token lengths #30

Conversation

shamanez commented Sep 17, 2023

SachiraKuruppu Sep 17, 2023

Choose a reason for hiding this comment

SachiraKuruppu Sep 17, 2023

Choose a reason for hiding this comment

SachiraKuruppu Sep 17, 2023

Choose a reason for hiding this comment

shamanez Sep 17, 2023

Choose a reason for hiding this comment