-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple beams translate & evaluation with bleu #6
base: main
Are you sure you want to change the base?
Conversation
Added translation for multiple beams
adding few shot gpt translation
with open('flores-eng-devtest.csv', 'w') as csvfile: | ||
writer = csv.DictWriter(csvfile, fieldnames=["eng_Latn-ukr_Cyrl"]) | ||
writer.writeheader() | ||
for domain in list_of_emails: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list_of_emails?
eng = devtest["sentence_eng_Latn"] | ||
def write_to_csv(list_of_emails): | ||
with open('flores-eng-devtest.csv', 'w') as csvfile: | ||
writer = csv.DictWriter(csvfile, fieldnames=["eng_Latn-ukr_Cyrl"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fieldnames looks wrong.
|
||
@app.command() | ||
def eval_model_multpl_beams_ready_prep( | ||
source_file_path: Annotated[str, typer.Option()], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gosh, some docstrings are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahahhaah, agree
) | ||
source_sentences = [] | ||
with open(preprocessed_file_path) as f: | ||
source_sentences = f.readlines() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [1]: with open("/tmp/foo", "r") as fp_on:
...: lines = fp_on.readlines()
...:
In [2]: lines
Out[2]: ['1\n', '2\n', '3\n', '4\n']
you might want to strip newlines.
added mistral translate script
all_prompts.append(translation_prompt) | ||
print(f"Max tokens = {max(all_token_counts)}") | ||
inputs = tokenizer(all_prompts, return_tensors="pt", padding=True) | ||
model.to("cuda") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loading the model to cuda later is slower than loading it directly to cuda. Check out this patch: 33b3774
added calc metrics script
No description provided.