-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow for automated translations and removal of duplicate statements using Github Actions #18
Comments
Probably this wants to be a workflow dispatch action so that we don't make a ton of excess API calls. Could make it so that if the number of new statements is below some threshold, do it automatically, otherwise post an error or a message requiring manual approval. |
Interesting. That seems like a good setup. Perhaps we should also have a good way to check if transactions are complete and estimated costs of completing them? |
Should I use the standard text translation pricing tier of $15 per million characters found here: https://aws.amazon.com/translate/pricing/ ? |
Great! Should this be on the file level or the corpus level, e.g., something like:
|
It should be on the file level. So say we haven't run any translations on only test2.csv, then the intended output would be: Let me know if this fine! |
When new statements are added, we should have separate jobs within one workflow that can translate all our new statements and remove any duplicate statements that are generated after the translations are complete. These changes should then be committed to the repository.
One job called 'Translate Statements' will translate new files in the raw_statements folder. This workflow will be configured to run on workflow_dispatch, so it must be run manually using the Actions tab on Github. We should also be able to pass in the specific files we want to translate as parameters to the workflow before it is invoked, so we have more manual control over what can be translated in the folder. Steps to run the workflow are found here: https://docs.github.com/en/actions/using-workflows/manually-running-a-workflow.
The second job called 'Remove Any Duplicates' will remove any statements causing duplicates across translations from the original statement file as well as remove the same indices in the corresponding translation files to ensure consistent structure of statement files. This job should then run format_checker.py to check there are no errors at the end.
The text was updated successfully, but these errors were encountered: