Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding DeepL column for translation #124

Open
DhammaCharts opened this issue Jun 24, 2022 · 15 comments
Open

Adding DeepL column for translation #124

DhammaCharts opened this issue Jun 24, 2022 · 15 comments

Comments

@DhammaCharts
Copy link

Hi,

I'm wondering if it would be possible to add an optional column to include a DeepL automatic translation using its API. It would use the english input and give the desire language output.

Glossary are very helpful as they will consistanty translate specific input word in specific output word for a given translator choice. https://www.deepl.com/fr/docs-api/managing-glossaries/creating-a-glossary/

API doc https://www.deepl.com/docs-api

There is a free and a paid API. Is SuttaCentral a charity? and so could ask for a free license?
Thanks for your feedback

@sujato
Copy link
Contributor

sujato commented Jun 26, 2022

I'm definitely open to this, but there are a few considerations first.

  • Bilara is basically running on fumes at the moment, our main developer for Bilara is taken up with this strange malady called "life", so we need more programming juice to get stuff done.
  • We haven't built a plugin architecture, so it would have to be from scratch.
  • I'm cautious about using third-party services, so we'd have to be strongly convinced of the benefits before proceeding.
  • Yes, we are a charity and could in principle ask for a free license. Note though that we are then subject to the whims of their policies, and at any point may be faced with the choice to either start paying or stop using the service. If it proves worthwhile, I'm not opposed to paying, people need to earn a living.

So basically it comes down to (a) developer time and (b) proving the usefulness.

@DhammaCharts
Copy link
Author

DhammaCharts commented Jun 26, 2022

Thank you for your reply Bhante !

this strange malady called "life",

;_)

so it would have to be from scratch

I have try to build Bilara locally, not yet sucessful!

So basically it comes down to (a) developer time and (b) proving the usefulness.

(b) I've not met yet someone in the french dhamma community of translators thats doesn't use DeepL. I've been reading, listening and talking in english for 10+ years, plus living in english speaking monastery for 4+ years now and DeepL is better than me ;-) especially in english to french translation in terme of grammar, vocabulary and sentence structure. It lacks of course the context and meaning, but it is quite amazing in and of itself. Pronouns and verb tenses are difficult for DeepL without understanding the context. So every translation still needs very precise human check and correction.

(a) I've started a quick app that fetch the JSON from Bilara-data and create a DeepL equivalent, here is a sketch of how it works:

Bilara Assist drawio (1)

@noeismet
Copy link

Unfortunately, I do not have coding skills, but I would happily volunteer for testing and would definitely be an early adopter of a DeepL integration. I use DeepL extensively primarily for speed purposes. I review everything I give it to translate and I am very impressed with the results, which seem to improve over time. I guess that is the idea of AI and deep learning. I use it in two ways:

  1. for Suttas, I tend to take the entire sutta into DeepL because it makes a more coherent translation and in that way seems to understand the context pretty well. This approach requires a bit of editing work in DeepL but it's worth it.
  2. for Site, I do it paragraph by paragraph as the texts are generally more straightforward in their meaning, much less subject to various interpretations. So here no editing in DeepL is required, it's very quick.

@cittadhammo
Copy link

cittadhammo commented Jun 27, 2022

  1. for Suttas, I tend to take the entire sutta into DeepL because it makes a more coherent translation and in that way seems to understand the context pretty well. This approach requires a bit of editing work in DeepL but it's worth it.

OK that is interesting and I was wondering about that. I'll give it a try both ways and see how much difference it gives. thanks! The app I'm building will do it segment by segment and thus looses the context.

@noeismet
Copy link

noeismet commented Jun 27, 2022

I'll give it a try both ways and see how much difference it gives

In fact, both ways work, and I guess it's a matter of personal preference and workflow. And speaking of workflow precisely, a DeepL integration I think would improve that.

@blake-sc
Copy link
Contributor

From taking a glance at the DeepL API This seems like it'd potentially be very quick to implement.

And as an aside,I have to admit to having noticed that machine learning is getting disgustingly good at doing things. Actually one reason I'd say "do we need to bother?" is that machine translation is getting so good sites hardly need to be localized any more - though I'm sure the quality of machine translation still varies heavily on the language pair and domain.

Honestly the hard part with Bilara is usually the GUI: but I could make a proposal for two possible ways to implement machine translation (neither of which involve a new column).

  1. Hit a shortcut say ctrl-m that fetches the machine translation and inserts it into the field. This would be the easiest way.
  2. Include the machine translation as a distinct translation memory result, it obviously wouldn't have a source text perhaps put "Automatic translation by DeepL" where the source text goes.

The other way of course would be to initially do the entire translation with DeepL, then proofread it, though that doesn't do a good job of indicating progress.

for Suttas, I tend to take the entire sutta into DeepL because it makes a more coherent translation and in that way seems to understand the context pretty well. This approach requires a bit of editing work in DeepL but it's worth it.

I tried translating from Sabbamitta's German translation to English, and it's interesting how much difference this made. Translating the whole sutta (with segments separated by double-newlines) produced a substantially nicer result than translating segment by segment.

For example taking Mara's verses from sn5.1, when translated as a whole:

"There is no escape from the world,
What shall your seclusion bring you?
Enjoy the pleasures of the senses,
so that you will not regret it later."

When translated segment-by-segment:

"There is no escape from the world,
What good is your seclusion going to do you?
Enjoy the pleasures of the senses,
So you won't regret it later."

The difference is fairly small, but in the first case it clearly seems to have translated the verse as verse, for instance using "will not" instead of "won't" in the last line just to make it longer and fit better, while in the second case it can't recognize it as verse and so translates it as prose.

So it'd definitely be worthwhile feeding the entire english translation into DeepL rather than segment-by-segment. With respect to my earlier implementation suggestion this would just mean in the background the server, when requested for a machine translation, feeds the whole text into DeepL, caches the result, and returns segment by segment.

@cittadhammo
Copy link

Hi @blake-sc

this would just mean in the background the server, when requested for a machine translation, feeds the whole text into DeepL, caches the result, and returns segment by segment.

This would be an amazing solution!

Also, if a personal deepL glossary could be kept in the translation user folder that would be fantastic. You can create glossaries via the API from what I've read.

If this could be done in a relatively short time, I will give up my small app. I can already create a DeepL translated json segment by segment locally from what I've done, and I can push it manually to the repo. So I will try again to build Bilara on my computer to see if I can help there.

Thanks for your very nice comment.
Cittadhammo = DhammaCharts

@blake-sc
Copy link
Contributor

So I will try again to build Bilara on my computer to see if I can help there.

I can help you with that, to get fully functional you need some keys to connect to a repo. It's easiest to use the SuttaCentral Gitter to communicate. I sent you an invite.

@noeismet
Copy link

noeismet commented Jun 27, 2022

This would be an amazing solution!

I concur!

And, I wonder if the suggestion feature would also work in this way, integrated in Bilara? As it's rather useful I must admit.

image

@sujato
Copy link
Contributor

sujato commented Jun 27, 2022

@blake-sc as far as UI goes, I'd recommend the option of adding the ML to the ordinary TM results, just make sure it has a distinct class so we can color-code it or whatever. Also probably a good idea to include an "off" switch for those who distrust our robot overlords.

@blake-sc
Copy link
Contributor

And, I wonder if the suggestion feature would also work in this way, integrated in Bilara? As it's rather useful I must admit.

I can't really see a way of achieving that. The API can offer a "more" or "less" formal translation but I don't see any way to access suggest functionality via the API, and it'd be a huge pain to program. You'd be best off just copy-pasting into the DeepL web UI.

@sabbamitta
Copy link

Also probably a good idea to include an "off" switch for those who distrust our robot overlords.

Yes, please. 😄

@cittadhammo
Copy link

cittadhammo commented Jun 27, 2022

And, I wonder if the suggestion feature would also work in this way, integrated in Bilara? As it's rather useful I must admit.

In my opinion, this would be not possible to do in Bilara itself using the API as far as I can tell.

A solution would be to "intercept" the string (text file) before it goes to DeepL API, i.e. having an option to copy into clipboard the whole sutta (with segments separated by double-newlines). Then copy it into the DeepL desktop App and haivng it side by side with the pali and english in Bilara:

Capture d’écran, le 2022-06-27 à 18 41 31

Then going down simultaniously on the two windows (app) to reach the end of the sutta while correcting the DeepL output in its own app. Once finished, copy paste the resulting translation (with segments separated by double-newlines) into a Bilara input window.

This would allow to use DeepL suggestions but not Bilara suggestions directly... ;-( the nice thing though is that by suggesting changes to DeepL, it improves over time and keep your suggestions in mind throughout the sutta. But I don't know how much advantage this would give overall compared to the previously mentioned solution with API.

@noeismet
Copy link

I can't really see a way of achieving that. The API can offer a "more" or "less" formal translation but I don't see any way to access suggest functionality via the API, and it'd be a huge pain to program. You'd be best off just copy-pasting into the DeepL web UI.

In my opinion, this would be not possible to do in Bilara itself using the API as far as I can tell.

Yes, I understand, thank you for your looking into it.
Having DeepL translations in Bilara of its own would already be a great advantage, and one can always have the DeepL app open on the side if s/he needs the suggestions.

@cittadhammo
Copy link

one can always have the DeepL app open on the side if s/he needs the suggestions.

Yes this is what I thought I was gonna do ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants