Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thanks/questions/suggestions #2

Open
MBB232 opened this issue Mar 15, 2020 · 4 comments
Open

thanks/questions/suggestions #2

MBB232 opened this issue Mar 15, 2020 · 4 comments

Comments

@MBB232
Copy link

MBB232 commented Mar 15, 2020

Thanks
Thank you for this project. I was trying to (help to) translate something a few years ago, and disappointed in how few open translation programs were available.
Could this be used to create better grammar checking too (It needs to do that on the translated text anyway, right?)
I an neither a programmer nor have I tried it yet, but based on your video I have a few suggestions.

Hard to find
If you want more feedback/programmers, it would help if your project was easier to find.
Even the DocumentFoundation video on Youtube does not a link to this github, I had to type it over from the screen.
I could not find your plugin under the LO extentions page https://extensions.libreoffice.org/
Nor as new feature request on the LO bugzilla.

Translation GUI
I agree that the sidebar would not be very useful; especially when working with two texts side by side (like translating) I collaps the sidebar.

Won't it break regular annotations? (I suppose if it gets integrated a separate mode may exist) Would a better use of it not be to show when alternative translation options exist?

For comparing text (for versioning), there already exists split window mode/Separate window mode:
-show above each other so you can scroll through them (like in other compare document modes

Feed back translated data
Can you add an UPLOAD button to share documents to improve the AI with a larger database?

  • Get feedback on how much of auto-generated translation is accepted/rejected
  • Get unique code from server for the document(s), so future improvements can update it rather then duplicate the document on the server
    -> maybe directly upload to OPUS project? The more upstream the documents the larger the user group and better the quality, right?

You do not want to do this automatically because

  • may be proprietary
  • not all users may be fluent in the new language and may create bad translations rather then improve it. (Perhaps let users rate the quality of their translation?

In addition, it may be possible to get feedback from private data sets? - Maybe you can get a few school classes or public news where things need to be translated and corrected anyway as input?

In one of the last slides you mention wondering if you should include software translations. On the OPUS site upstream for the data sets, OpenOffice is listed as one of its contributors, as is KDE. I would argue that LibreOffice/Pootle probably has better documentation translated to more languages then those projects.

Donating GPU time
You explain how it takes a lot of computing time, but the setup to prepare and create them seemed quite complicated.
Also one of the sites talks about needing 8GB videoram, which is a bit much. But I've got high-ish gaming card that should be of some help. ( I'm not using it during wordprocessing anyway ;-) )
If you (or others in the opennmt or OPUS projects) were to prepare some data sets I would not mind running them a few days.
Even better would be if you were to set up a way to donate computer time by distributed computing, like BOINC.
Then as more people start using it more feedback to the language comes back, it can be fed into the dataset and run again through the AI.

@MBB232
Copy link
Author

MBB232 commented Mar 15, 2020

PS: The number of languages that LO supports is not 30 (as was said in your presentation) but 117.
If you do not only want to translate from English, but want to translate inbetween them, would you need to build data sets for all combinations? because that would be Factoral (117-1)=116! = 3.34 E 190
Even those 30 main languages would get 2.65 E 32 combinations.
(Or half that if you can calculate both directions in one go)

@t-vi
Copy link
Contributor

t-vi commented May 27, 2020

Hey, I'm too slow to answer on this, I'll try to get to it in a bit. Thanks for sending the detailed suggestions!

@MBB232
Copy link
Author

MBB232 commented May 27, 2020

Y.W.
SInce I've done a bit more research. You may want to take a look at existing (open) translation projects.
Not only do they offer a good example of the GUI and what features translation software need, but if you can offer your software as plug-in for them you may get access to a large user base and HQ feedback from both technicians and translators. (They already have apis to log into Google and Microsoft AI translate apis so should not be too hard)

Current translation of ODF files works by the Xliff interrimg format, so getting integration in LibreOffice would still be a worthy goal.

http://docs.translatehouse.org/projects/translate-toolkit/en/latest/commands/odf2xliff.html

KDE software: https://kde.org/applications/office/org.kde.lokalize
OmegaT java-based multiplatform https://omegat.org/
Pootle - used by LibreOffice https://pootle.translatehouse.org/
Online software used by many FOSS projects https://translations.launchpad.net/

@MBB232
Copy link
Author

MBB232 commented May 28, 2020

Thinking about it some further , distributed computing may be a solution for AI in open desktop suites in general.

It is a problem I have been thinking about for some time; how do programs like LibreOffice and GIMP keep up with companies like Microsoft and Adobe on AI features.
The programs themselves are maturing rapidly, but even these high-profile names often have trouble keeping their server cost covered. There is little chance of them hosting a free AI server for millions of users.

A lot of AI processes actually start out as open proof-of-concepts or are otherwise freely available. But they still need large data sets. And without server capacity for both storage and calculations, They are still not of much use for end-users.

So even if (processing time for) all dictionaries are donated freely (which I calculated would take immense processing power), your project may still have this problem. Without servers to run on, there is little use in having the code and data sets available. Reversibly, even with servers and code, there is little use without having free data sets.

However, as Torrents, BOINC and bitcoin have proven, if something offers personal value, a good cause or small financial benefit, massive computing power can be seen to be offered.
A point system that offers use for processing power on a progressive point system might be able to leverage this. Especially because cost of processing power scale reversibly.

Open AI for language translation may appeal to all three.
For small occasional end-time users, it may be free or light enough to run on mobiles without significant impact.
Regular use by writers may need a decent desktop to run it on, but they will probably need that anyway. If the program runs during low-load times they should see no significant impact.
Heavy users like publishing companies and professional translators would need servers to earn enough 'points' for all their translations. But they would probably need that anyway to cache and use all dictionaries and other benefits like centrally setting translation favorites.

There may even become companies that run dedicated servers for translation and rent out capacity. (Like seed boxes for torrents, ASICS for bitcoins etc). Which is fine if they all contribute back 'blocks' of dictionary improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants