-
-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low quality translation compared to google live translator #163
Comments
Well, that's strange. Google sometimes provides multiple translations. I'll debug and see how to get most relevant one. |
They also changed a way, how to deal with their server, new url came: i got only one user-agent to get same result, like we have in this library But even this one works through new endpoint. |
Hi, that's also what i've noticed. The Google Translate (website) know some common brand terms... like WordPress, Elemenetor, etc. But the Google used on the package doesn't know that and will try to convert it into a random similar term for example, I tried to translate the sentence: "How to create a mega menu with Elementor"... it will convert Elementor to "Emoror", "Emerer", "Elementaire", "Element"... Which doesn't means nothing on the destination language (french). I tried the approach of adding terms that shouldn't be translated on a tag with class "notranslate", but the translation is even worse. |
Looked through the response coming from the server while using current URL. It does come with multiple versions of translation, but none of them are as good as ones translated by Google Translate website. The new URL that @fissben mentioned. I guess we'll have to reverse engineer the new algorithm (cookies, etc). I'll post more updates here. |
Here is working example of current endpoint (curl) curl --location --request POST 'https://translate.google.com/_/TranslateWebserverUi/data/batchexecute?rpcids=MkEWBc&hl=ru&soc-app=1&soc-platform=1&soc-device=1&_reqid=53165&rt=c' \
--header 'authority: translate.google.com' \
--header 'pragma: no-cache' \
--header 'cache-control: no-cache' \
--header 'sec-ch-ua: "Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"' \
--header 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36' \
--header 'content-type: application/x-www-form-urlencoded;charset=UTF-8' \
--header 'origin: https://translate.google.com' \
--header 'referer: https://translate.google.com/' \
--header 'accept-language: en-US,en;q=0.9,ru-UA;q=0.8,ru;q=0.7,ja-JP;q=0.6,ja;q=0.5,zh-CN;q=0.4,zh-TW;q=0.3,zh;q=0.2,uk;q=0.1' \
--header 'Cookie: NID=213=jMxpp4AcB9CbhtqMEgj78zOxP-71uc_Q_ku6ov-Ffd9FJYrCtiF5xLiWOBZtmQnBnvOXFJMY9qOjEBIA1o5HjiJwWZNisKzNHRO2ekwlsIfQJLsVMdaCBV0X_tNl4QVHbu6sWYniCdkXjDtVMjwID7EAtwTD2WpnD4p_Pr6F48hb_ffQMYXaWYNQDxgmb30jgTi4u0vLfaE1KddtC7E' \
--data-raw 'f.req=%5B%5B%5B%22MkEWBc%22%2C%22%5B%5B%5C%22%D0%9F%D0%B5%D1%80%D1%88%D0%B8%D0%B9%20%D0%BD%D0%B0%D1%86%D1%96%D0%BE%D0%BD%D0%B0%D0%BB%D1%8C%D0%BD%D0%B8%D0%B9%20%D0%BE%D0%BD%D0%BB%D0%B0%D0%B9%D0%BD%20%D0%BF%D0%B5%D1%80%D0%B5%D0%BA%D0%BB%D0%B0%D0%B4%D0%B0%D1%87%5C%22%2C%5C%22uk%5C%22%2C%5C%22en%5C%22%2Ctrue%5D%2C%5Bnull%5D%5D%22%2Cnull%2C%22generic%22%5D%5D%5D&at=AD08yZn3jSHJ2pLXRNZ-gYpVGrLd%3A1618314364485&' After short investigation I've found that last part of payload |
Yes, quality of translation is rather low when comparing with google translate. Why is that? |
We noticed this sudden degradation of translation quality a couple of weeks ago as well. Just found this issue. Has anyone made any tests after 18th of Apr or there any new information about as to why the quality of the traslations changed suddenly? |
When I still test this curl now it returns an output... so probably the value of "at" doesn't expire? |
The work now is to extract the translated string from what looks like an incomplete json returned by Google.
|
It's possible that it doesn't expire, however the value of that parameter differs for each different string. It's some kind of hash but I cannot find out how to generate it |
Here is the request used by the extension. I think it can also be used :
|
If you change a single character in --data-raw, you'll get Your client does not have permission to get URL |
Hi @Blair2004 @henno |
I ended using this. I created a custom guzzle request and i used DomQuery to be able to extract the language here is how the code looks like : $client = new Client;
$request = $client->request( 'POST', 'https://www.google.com/async/translate?vet=12ahUKEwjT7Maf1O_wAhUDBGMBHdhEBykQqDgwAHoECAIQJg..i&ei=XZqyYJPKMIOIjLsP2ImdyAI&yv=3', [
'headers' => [
'authority' => 'www.google.com',
'sec-ch-ua' => 'Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
'sec-ch-ua-mobile' => '?0',
'user-agent' => collect( $this->randomUserAgent )->shuffle()->first(),
'content-type' => 'application/x-www-form-urlencoded;charset=UTF-8',
'accept' => '*/*',
'origin' => 'https://www.google.com',
'x-client-data' => 'CIe2yQEIpLbJAQipncoBCOH2ygEIqJ3LAQigoMsBCKygywEI8fDLAQiB8ssBCNzyywEIqPPLARiOnssBGJH1ywE=',
'sec-fetch-site' => 'same-origin',
'sec-fetch-mode' => 'cors',
'sec-fetch-dest' => 'empty',
'referer' => 'https://www.google.com/',
'accept-language' => 'en-US,en;q=0.9,fr-FR;q=0.8,fr;q=0.7,sw-TZ;q=0.6,sw;q=0.5,es;q=0.4,de;q=0.3',
'cookie' => 'SEARCH_SAMESITE=CgQIyZIB; SID=-AdTEiYBHQxwyi6tM21CKmx0c4Y1a4q433Cacx-mQACfzwIH0I1fT0wH7pVmUnd_kK_QOA.; __Secure-3PSID=-AdTEiYBHQxwyi6tM21CKmx0c4Y1a4q433Cacx-mQACfzwIHKrDd2zvJ_c5gnPl4a2MMeA.; HSID=Ayll6tzipONjmj1m4; SSID=AqEGOJ5K3d9zhN9JB; APISID=iY7F7EQuWx9eYks1/AAK2jCg-YLchJ2xIx; SAPISID=2ZivtOqq29XgIoHy/ArdWxr5KbTd4RtM9J; __Secure-3PAPISID=2ZivtOqq29XgIoHy/ArdWxr5KbTd4RtM9J; OTZ=5999998_52_52__52_; NID=216=Ni17mzF6uLOBNG4iasK6JP9GjDmN9BbP-VFSNdu6KgFipkAdhdzCVYo9IWOCbkvmHa6HYd7VAaWO40EnGURxQYczydEHbQFatNbk5wDnZwBw0I8aJN8xlpNDynCxs5vHahDdOSFuEt2ppr-BK90W816xk3QOlzDgU1pyHWv0dJqMEVbpSNDIxUCZAJz8GO1oJq5fv1JfJQDYYZ1BJO6EUXww8kdlmGIrNzhmAAvKHUnhu7PKv98OY6EHT39EMC187f1ewAVZV7zlSgcAKNEzgxcFh6PhtMHH6srqOkxkm0E-6oK1l5KBZZSXkvDvDXu_bD-2t8hj0m8-R7hASU5u9AiScP7zjcxumRtpEt1vRA9WHeLCY-EZQ5R8T1A7vpigqpsh9x8O9zOqRkgXcq4R7zL-ww3ohf3chjkQwLX5J9xLnMreSKQ; 1P_JAR=2021-05-29-19; DV=w-LWwaovX_lHMO5HzDUyvBMlILCam9c7hOlmyewo2gAAACAcuD9TMkTrYAAAAGDtw7_cZzBvRwAAAA; UULE=a+cm9sZTogMQpwcm9kdWNlcjogMTIKdGltZXN0YW1wOiAxNjIyMzE3NjY0MTU1MDAwCmxhdGxuZyB7CiAgbGF0aXR1ZGVfZTc6IDM4NDY5NjMyCiAgbG9uZ2l0dWRlX2U3OiAxMTUwMTU2ODAKfQpyYWRpdXM6IDQ3NTA0NDAKcHJvdmVuYW5jZTogNgo=; SIDCC=AJi4QfHkAEuJQkjqHKaVrOSGMBerdz9iiZVsPsE2rw2KWEfGkcMczh3Oo7pwg-Mjmz1EqsE-YrF3; __Secure-3PSIDCC=AJi4QfHUrZjD571gCm5-jqaOQULhDdqmb5ql92leEnpszMcN1eHpL0R-xACOJwmdgoQyIoE1zBBv',
],
'proxy' => $proxy,
'form_params' => [
'async' => 'translate,sl:' . $sourceLanguage . ',tl:' . $destination . ',st:' . urlencode( $text ) . ',id:1622317680604,qc:true,ac:true,_id:tw-async-translate,_pms:s,_fmt:pc'
]
]);
$dom = '<div>' . ( ( string ) $request->getBody() ) . '</div>';
$query = new DomQuery( $dom );
return $query->find( '#tw-answ-target-text' )->text(); So far it works, we only need to figure out the accuracy of the translation. |
Any news on this? Translating back and forth between Japanese and English and the ones I get back are much worse than the ones obtained via Google Translate's web interface directly |
Hi, the solution I've shared so far work for me, but i'm forced to do many requests to Google which makes me end up with a too many request exception. So I've investigated to see how Google generates the "tk" query parameters from the Google Translate extension. It looks like the value is generated based on the content, that's why as @henno has mentioned if the body of the request is modified, the whole request it's no more valid. So as in the below image, I've found the function that generates the token using the translated string. The function itself looks like this. I've just made the finding, I'll investigate more and see how i can create a similar function on PHP to generate that token. But this should be a nice improvement to the library as we'll also be able to send an array of strings to translate to Google. |
You're awesome!! |
Hi, i'm coming with some new updates. So, in order to use the function that generates the "tk" token, we need to get a key that is only available on a file provided by Google itself: https://translate.google.com/translate_a/element.js That token should be used with a class that generate the token. I created a sample class. class TokenGenerator {
function getKey( $text, $token ) {
$tokenExploded = explode( '.', $token );
$prefix = ( int ) $tokenExploded[0] ?? 0;
for(
$data = [],
$eIndex = 0,
$fIndex = 0;
$fIndex < strlen( $text ); $fIndex++
) {
$stringPosition = $this->charCodeAt( $text, $fIndex );
if ( 128 > $stringPosition ) {
$data[$eIndex++] = $stringPosition;
} else {
if ( 2048 > $stringPosition ) {
$data[$eIndex++] = $stringPosition >> 6 | 192;
} else if (
55296 == ( $stringPosition & 64512 ) &&
$fIndex + 1 < count( $text ) &&
56320 == $this->charCodeAt( $text, $fIndex + 1 ) & 64512
) {
$stringPosition = 65536 + ( ( $stringPosition & 1023 ) << 10 ) + $this->chartCodeAt( ++$fIndex ) & 1023;
$data[$eIndex++] = $stringPosition >> 18 | 240;
$data[$eIndex++] = $stringPosition >> 12 & 63 | 128;
} else {
$data[$eIndex++] = $stringPosition >> 12 | 224;
$data[$eIndex++] = $stringPosition >> 6 & 63 | 128;
$data[$eIndex++] = $stringPosition & 63 | 128;
}
}
}
$text = $token;
for( $e = 0; $e < count( $data ) ; $e++ ) {
$text += $data[$e];
$text = $this->jrChars( $text, '+-a^+6' );
}
$text = $this->jrChars( $text, '+-3^+b+-f' );
$text ^= ( int ) $tokenExploded[1] ?? 0;
if ( 0 > $text ) {
$text = ( ( $text & 2147483647 ) + 2147483648 );
}
return ( ( string ) $text %1E6 ) . ( '.' ) . ( $tokenExploded ^ $token );
}
function charCodeAt($string, $offset) {
$string = mb_substr($string, $offset, 1);
list(, $ret) = unpack('S', mb_convert_encoding($string, 'UTF-16LE'));
return $ret;
}
function jrChars($a, $b) {
for ($c = 0; $c < strlen( $b ) - 2; $c += 3) {
$d = substr( $b, $c + 2);
$d = "a" <= $d ? $this->charCodeAt( $d, 0 ) - 87 : ( int ) $d;
$d = "+" == substr( $b, $c + 1) ? $a >> $d : $a << $d;
$a = "+" == substr( $b, $c ) ? $a + $d & 4294967295 : ( $a ^ $d );
}
return $a;
}
}
$generator = new TokenGenerator;
$generator->getKey( 'Hello World', "451185.3571800534" ); // output : 493811.451184 I'll now do tests with Google to see whether it's effective or not. |
i tried my hand at it and your example class is a bit broken for me (I changed it to one implementing TokenProviderInterface and added the interface method, but it was wacky, especially around charCodeAt), so I spent a bit trying to dig up the source from within the gtranslate webapp page. After a bit of hacking around, I was able to produce this: https://gist.github.com/sudofox/3b7c5b75472392e15891537f0dae2325 It's what you see starting here: which is deeply nested inside more uglified evals inside JS objects, not going to track back to where I found it (just searched for one of the magic numbers in your example function to find it) Relevant part no. 1: jp = function(u, S, z, I, D, f, A, K, J, q, Q, x, k) {
for (f = (I = J = 0, []); J < S.length; J++) q = S.charCodeAt(J), 128 > q ? f[I++] = q : (2048 > q ? f[I++] = (D = q >> 6, -193 - 2 * ~(D | 192) + (~D | 192)) : (55296 == -~q + (~q ^ 64512) + (~q & 64512) && J + 1 < S.length && 56320 == (K = S.charCodeAt(J + 1), (K | 0) + (~K ^ 64512) - (K | -64513)) ? (q = 65536 + ((q & u) << 10) + (x = S.charCodeAt(++J), -2 * ~(x & u) - 1 + ~x + (x & -1024)), f[I++] = q >> 18 | 240, f[I++] = (Q = q >> 12 & 63, 128 + (Q & -129))) : f[I++] = (k = q >> 12, (k | 0) + ~(k & 224) - -225), f[I++] = (A = q >> 6 & 63, z - (~A ^ 128) - (~A & 128))), f[I++] = (q | 0) + (q & -64) - 2 * (q ^ 63) + 2 * (~q & 63) | 128);
return f
}, Going to add more comments when I get a sec. Let's solve this together! |
Hi yes I noticed my class doesn't generate the right token. I actually tried to covert the original javascript functions into a php, look like I did a mistake somewhere. |
Hi, Translator (JS)So, I created a class for JavaScript that translates a file (JSON) into a defined language. This uses the Google translation version used on the extension and this extension provide a better translation result. Major BenefitWhat I like with this approach is that you can submit a list of strings, have it translated and returned just using one request. Previously using the package, I wasn't able to do that, so for a text that has 20 paragraphs, I was forced to perform 20 requests and as I mentioned already I ended with a "Too Many Requests" Exception (maybe I was using that in a wrong way). How It works1 - I load the file provided by Google that has a key... I then need to create a virtual browser (on NodeJS) so that the token can be added to a window variable. Now, I'm not sure how we can make this work with this package. I have to highlight how easier it was to make this work with NodeJS, so I believe we need somehow to have JavaScript involved. I'm out of ideas for now, what do you think can be the possible steps to go here? |
@sudofox How is your progress? |
I hate to break it to you but I did a bit of cost analysis on my project and found that the usage of the paid API fell far under the "free" limit, so I switched to the official one. Good luck though :0 |
Same with me. Unfortunately this project is somewhat useless unless this issue is resolved. |
What happens when you replace
with
in |
I got the quality issue fixed by changing the client from webapp to gtx. Does anyone know what the gtx stands for and will there be any side effects from changing the client from webapp to gtx? |
@sudofox I also tried the "official way" but it was so complicated that after spending an hour trying fix permission issues with Google Cloud, I gave up. Then I found https://github.com/statickidz/php-google-translate-free which appeared to produce higher quality translations and then I snooped in its source code and found this commit which was made on the same day this issue was opened. I noticed that he had changed the client and tried the same in this project and what do you know, it worked. |
@Stichoza could you try to change the client and try if it fixes this issue for you and if it does, close this issue and release a new version. |
Hello @henno i've changed the client as you mentioned, but the quality is low still. I had see the statickidz branch, but i couldn't solve it. could you help me? Also i've deleted the others params in dt.
|
@arcanaer Did you change webapp to gtx in vendor/stichoza/google-translate-php/src/GoogleTranslate.php? |
@henno Yes, sorry, it was an error in my code implementation, but i can confirm that i you use my code above, it works with better translation. |
Thanks @henno! 🥳 It works and I released a new version Also added |
Hello all, I noticed that the quality has gone down again. Can anyone verify this? |
I noticed that current repo isn't accurate translate anymore. Looks like it happened few weeks ago.
For example, im trying to translate from "en" to "ru" this phrase:
My apologies about my messages, hope they weren't too inconvenient. Hope everything will get back to normal soon.
Here is what I got from google in browser
Приношу свои извинения по поводу моих сообщений, надеюсь, они не были слишком неудобными. Надеюсь, что скоро все вернется на круги своя.
While library translate it like this:
Мои извинения о моих сообщениях, надеюсь, они не были слишком неудобны. Надеюсь, что все скоро вернется к нормам.
Which have a way more direct translation.
Any thoughts ?
The text was updated successfully, but these errors were encountered: