Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low quality translation compared to google live translator #163

Closed
fissben opened this issue Apr 12, 2021 · 35 comments
Closed

Low quality translation compared to google live translator #163

fissben opened this issue Apr 12, 2021 · 35 comments

Comments

@fissben
Copy link

fissben commented Apr 12, 2021

I noticed that current repo isn't accurate translate anymore. Looks like it happened few weeks ago.

For example, im trying to translate from "en" to "ru" this phrase:
My apologies about my messages, hope they weren't too inconvenient. Hope everything will get back to normal soon.

Here is what I got from google in browser
Приношу свои извинения по поводу моих сообщений, надеюсь, они не были слишком неудобными. Надеюсь, что скоро все вернется на круги своя.

While library translate it like this:
Мои извинения о моих сообщениях, надеюсь, они не были слишком неудобны. Надеюсь, что все скоро вернется к нормам.

Which have a way more direct translation.

Any thoughts ?

@Stichoza
Copy link
Owner

Well, that's strange. Google sometimes provides multiple translations. I'll debug and see how to get most relevant one.

@fissben
Copy link
Author

fissben commented Apr 12, 2021

They also changed a way, how to deal with their server, new url came:
https://translate.google.com/_/TranslateWebserverUi/data/batchexecute..

i got only one user-agent to get same result, like we have in this library
Mozilla/5.0 (compatible; MSIE 10.0; Windows Phone 8.0; Trident/6.0; IEMobile/10.0; ARM; Touch; NOKIA; Lumia 520)

But even this one works through new endpoint.

@Blair2004
Copy link

Hi, that's also what i've noticed.

The Google Translate (website) know some common brand terms... like WordPress, Elemenetor, etc. But the Google used on the package doesn't know that and will try to convert it into a random similar term for example, I tried to translate the sentence: "How to create a mega menu with Elementor"... it will convert Elementor to "Emoror", "Emerer", "Elementaire", "Element"... Which doesn't means nothing on the destination language (french).

I tried the approach of adding terms that shouldn't be translated on a tag with class "notranslate", but the translation is even worse.

@Stichoza
Copy link
Owner

Looked through the response coming from the server while using current URL. It does come with multiple versions of translation, but none of them are as good as ones translated by Google Translate website. The new URL that @fissben mentioned.

I guess we'll have to reverse engineer the new algorithm (cookies, etc). I'll post more updates here.

@Stichoza Stichoza pinned this issue Apr 12, 2021
@fissben
Copy link
Author

fissben commented Apr 14, 2021

Here is working example of current endpoint (curl)

curl --location --request POST 'https://translate.google.com/_/TranslateWebserverUi/data/batchexecute?rpcids=MkEWBc&hl=ru&soc-app=1&soc-platform=1&soc-device=1&_reqid=53165&rt=c' \
--header 'authority: translate.google.com' \
--header 'pragma: no-cache' \
--header 'cache-control: no-cache' \
--header 'sec-ch-ua: "Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"' \
--header 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36' \
--header 'content-type: application/x-www-form-urlencoded;charset=UTF-8' \
--header 'origin: https://translate.google.com' \
--header 'referer: https://translate.google.com/' \
--header 'accept-language: en-US,en;q=0.9,ru-UA;q=0.8,ru;q=0.7,ja-JP;q=0.6,ja;q=0.5,zh-CN;q=0.4,zh-TW;q=0.3,zh;q=0.2,uk;q=0.1' \
--header 'Cookie: NID=213=jMxpp4AcB9CbhtqMEgj78zOxP-71uc_Q_ku6ov-Ffd9FJYrCtiF5xLiWOBZtmQnBnvOXFJMY9qOjEBIA1o5HjiJwWZNisKzNHRO2ekwlsIfQJLsVMdaCBV0X_tNl4QVHbu6sWYniCdkXjDtVMjwID7EAtwTD2WpnD4p_Pr6F48hb_ffQMYXaWYNQDxgmb30jgTi4u0vLfaE1KddtC7E' \
--data-raw 'f.req=%5B%5B%5B%22MkEWBc%22%2C%22%5B%5B%5C%22%D0%9F%D0%B5%D1%80%D1%88%D0%B8%D0%B9%20%D0%BD%D0%B0%D1%86%D1%96%D0%BE%D0%BD%D0%B0%D0%BB%D1%8C%D0%BD%D0%B8%D0%B9%20%D0%BE%D0%BD%D0%BB%D0%B0%D0%B9%D0%BD%20%D0%BF%D0%B5%D1%80%D0%B5%D0%BA%D0%BB%D0%B0%D0%B4%D0%B0%D1%87%5C%22%2C%5C%22uk%5C%22%2C%5C%22en%5C%22%2Ctrue%5D%2C%5Bnull%5D%5D%22%2Cnull%2C%22generic%22%5D%5D%5D&at=AD08yZn3jSHJ2pLXRNZ-gYpVGrLd%3A1618314364485&'

After short investigation I've found that last part of payload at=AD08yZn3jSHJ2pLXRNZ-gYpVGrLd%3A1618314364485& is most important. Param at is what we are looking for. Somehow it generating hash of payload and then checking it on the backend. How it helps in reverse-engineering

@ermeh
Copy link

ermeh commented Apr 18, 2021

Yes, quality of translation is rather low when comparing with google translate. Why is that?

@henno
Copy link

henno commented May 29, 2021

We noticed this sudden degradation of translation quality a couple of weeks ago as well. Just found this issue. Has anyone made any tests after 18th of Apr or there any new information about as to why the quality of the traslations changed suddenly?

@Blair2004
Copy link

curl --location --request POST 'https://translate.google.com/_/TranslateWebserverUi/data/batchexecute?rpcids=MkEWBc&hl=ru&soc-app=1&soc-platform=1&soc-device=1&_reqid=53165&rt=c' \
--header 'authority: translate.google.com' \
--header 'pragma: no-cache' \
--header 'cache-control: no-cache' \
--header 'sec-ch-ua: "Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"' \
--header 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36' \
--header 'content-type: application/x-www-form-urlencoded;charset=UTF-8' \
--header 'origin: https://translate.google.com' \
--header 'referer: https://translate.google.com/' \
--header 'accept-language: en-US,en;q=0.9,ru-UA;q=0.8,ru;q=0.7,ja-JP;q=0.6,ja;q=0.5,zh-CN;q=0.4,zh-TW;q=0.3,zh;q=0.2,uk;q=0.1' \
--header 'Cookie: NID=213=jMxpp4AcB9CbhtqMEgj78zOxP-71uc_Q_ku6ov-Ffd9FJYrCtiF5xLiWOBZtmQnBnvOXFJMY9qOjEBIA1o5HjiJwWZNisKzNHRO2ekwlsIfQJLsVMdaCBV0X_tNl4QVHbu6sWYniCdkXjDtVMjwID7EAtwTD2WpnD4p_Pr6F48hb_ffQMYXaWYNQDxgmb30jgTi4u0vLfaE1KddtC7E' \
--data-raw 'f.req=%5B%5B%5B%22MkEWBc%22%2C%22%5B%5B%5C%22%D0%9F%D0%B5%D1%80%D1%88%D0%B8%D0%B9%20%D0%BD%D0%B0%D1%86%D1%96%D0%BE%D0%BD%D0%B0%D0%BB%D1%8C%D0%BD%D0%B8%D0%B9%20%D0%BE%D0%BD%D0%BB%D0%B0%D0%B9%D0%BD%20%D0%BF%D0%B5%D1%80%D0%B5%D0%BA%D0%BB%D0%B0%D0%B4%D0%B0%D1%87%5C%22%2C%5C%22uk%5C%22%2C%5C%22en%5C%22%2Ctrue%5D%2C%5Bnull%5D%5D%22%2Cnull%2C%22generic%22%5D%5D%5D&at=AD08yZn3jSHJ2pLXRNZ-gYpVGrLd%3A1618314364485&'

When I still test this curl now it returns an output... so probably the value of "at" doesn't expire?

@Blair2004
Copy link

The work now is to extract the translated string from what looks like an incomplete json returned by Google.

464
[["wrb.fr","MkEWBc","[[\"Pershyy natsionalʹnyy onlayn perekladach\",null,null,[[[0,[[[null,37]\n]\n,[true]\n]\n]\n]\n,37]\n]\n,[[[null,null,null,null,null,[[\"First National On-line Translator\",[\"First National On-line Translator\",\"The first national online translator\"]\n]\n]\n]\n]\n,\"en\",1,\"uk\",[\"Перший національний онлайн перекладач\",\"uk\",\"en\",true]\n]\n]\n",null,null,null,"generic"]
,["di",156]
,["af.httprm",155,"6538938918244503432",158]
]
26
[["e",4,null,null,536]
]

@Stichoza
Copy link
Owner

so probably the value of "at" doesn't expire?

It's possible that it doesn't expire, however the value of that parameter differs for each different string. It's some kind of hash but I cannot find out how to generate it

@Blair2004
Copy link

What abotu using this

curl 'https://www.google.com/async/translate?vet=12ahUKEwjT7Maf1O_wAhUDBGMBHdhEBykQqDgwAHoECAIQJg..i&ei=XZqyYJPKMIOIjLsP2ImdyAI&yv=3' \
  -H 'authority: www.google.com' \
  -H 'sec-ch-ua: " Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36' \
  -H 'content-type: application/x-www-form-urlencoded;charset=UTF-8' \
  -H 'accept: */*' \
  -H 'origin: https://www.google.com' \
  -H 'x-client-data: CIe2yQEIpLbJAQipncoBCOH2ygEIqJ3LAQigoMsBCKygywEI8fDLAQiB8ssBCNzyywEIqPPLARiOnssBGJH1ywE=' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-dest: empty' \
  -H 'referer: https://www.google.com/' \
  -H 'accept-language: en-US,en;q=0.9,fr-FR;q=0.8,fr;q=0.7,sw-TZ;q=0.6,sw;q=0.5,es;q=0.4,de;q=0.3' \
  -H 'cookie: SEARCH_SAMESITE=CgQIyZIB; SID=-AdTEiYBHQxwyi6tM21CKmx0c4Y1a4q433Cacx-mQACfzwIH0I1fT0wH7pVmUnd_kK_QOA.; __Secure-3PSID=-AdTEiYBHQxwyi6tM21CKmx0c4Y1a4q433Cacx-mQACfzwIHKrDd2zvJ_c5gnPl4a2MMeA.; HSID=Ayll6tzipONjmj1m4; SSID=AqEGOJ5K3d9zhN9JB; APISID=iY7F7EQuWx9eYks1/AAK2jCg-YLchJ2xIx; SAPISID=2ZivtOqq29XgIoHy/ArdWxr5KbTd4RtM9J; __Secure-3PAPISID=2ZivtOqq29XgIoHy/ArdWxr5KbTd4RtM9J; OTZ=5999998_52_52__52_; NID=216=Ni17mzF6uLOBNG4iasK6JP9GjDmN9BbP-VFSNdu6KgFipkAdhdzCVYo9IWOCbkvmHa6HYd7VAaWO40EnGURxQYczydEHbQFatNbk5wDnZwBw0I8aJN8xlpNDynCxs5vHahDdOSFuEt2ppr-BK90W816xk3QOlzDgU1pyHWv0dJqMEVbpSNDIxUCZAJz8GO1oJq5fv1JfJQDYYZ1BJO6EUXww8kdlmGIrNzhmAAvKHUnhu7PKv98OY6EHT39EMC187f1ewAVZV7zlSgcAKNEzgxcFh6PhtMHH6srqOkxkm0E-6oK1l5KBZZSXkvDvDXu_bD-2t8hj0m8-R7hASU5u9AiScP7zjcxumRtpEt1vRA9WHeLCY-EZQ5R8T1A7vpigqpsh9x8O9zOqRkgXcq4R7zL-ww3ohf3chjkQwLX5J9xLnMreSKQ; 1P_JAR=2021-05-29-19; DV=w-LWwaovX_lHMO5HzDUyvBMlILCam9c7hOlmyewo2gAAACAcuD9TMkTrYAAAAGDtw7_cZzBvRwAAAA; UULE=a+cm9sZTogMQpwcm9kdWNlcjogMTIKdGltZXN0YW1wOiAxNjIyMzE3NjY0MTU1MDAwCmxhdGxuZyB7CiAgbGF0aXR1ZGVfZTc6IDM4NDY5NjMyCiAgbG9uZ2l0dWRlX2U3OiAxMTUwMTU2ODAKfQpyYWRpdXM6IDQ3NTA0NDAKcHJvdmVuYW5jZTogNgo=; SIDCC=AJi4QfHkAEuJQkjqHKaVrOSGMBerdz9iiZVsPsE2rw2KWEfGkcMczh3Oo7pwg-Mjmz1EqsE-YrF3; __Secure-3PSIDCC=AJi4QfHUrZjD571gCm5-jqaOQULhDdqmb5ql92leEnpszMcN1eHpL0R-xACOJwmdgoQyIoE1zBBv' \
  --data-raw 'async=translate,sl:fr,tl:en,st:Hello%20There,id:1622317680604,qc:true,ac:true,_id:tw-async-translate,_pms:s,_fmt:pc' \
  --compressed

Which is the endpoint used while on the SERP of Google
image

It doesn't seems to have a signature.

@Blair2004
Copy link

Also i'm using a Google Chrome extension for translating selected text on the web page... probably by looking at the source code, we can see how they proceed.

image

@Blair2004
Copy link

Here is the request used by the extension. I think it can also be used :

curl 'https://translate.googleapis.com/translate_a/t?anno=3&client=tee&format=html&v=1.0&key&logld=vTE_20210503_00&sl=auto&tl=it&tc=2&sr=1&tk=67691.518207&mode=1' \
  -H 'authority: translate.googleapis.com' \
  -H 'sec-ch-ua: " Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36' \
  -H 'content-type: application/x-www-form-urlencoded' \
  -H 'accept: */*' \
  -H 'origin: https://wptavern.com' \
  -H 'x-client-data: CIe2yQEIpLbJAQipncoBCKidywEIoKDLAQisoMsBCNzyywEIqPPLARiOnssB' \
  -H 'sec-fetch-site: cross-site' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-dest: empty' \
  -H 'referer: https://wptavern.com/' \
  -H 'accept-language: en-US,en;q=0.9,fr-FR;q=0.8,fr;q=0.7,sw-TZ;q=0.6,sw;q=0.5,es;q=0.4,de;q=0.3' \
  --data-raw 'q=Skip%20to%20content&q=WordPress%20Tavern&q=%C2%B7&q=WordPress%20News%20%E2%80%94%20Free%20as%20in%20Beer.&q=Search%20for%3A&q=%0A%09%09%09Navigation%09%09&q=About&q=Contact&q=Podcast&q=News&q=Opinion&q=Plugins&q=Themes&q=Events&q=The%20Automattic%20Theme%20Team%20Announces%20Blockbase%2C%20Its%20New%20Block%20Parent%20Theme&q=%3Ca%20i%3D0%3EJustin%20Tadlock%3C%2Fa%3E%3Ca%20i%3D1%3E%C2%B7%3C%2Fa%3E&q=May%2028%2C%202021&q=%3Ca%20i%3D0%3E%C2%B7%3C%2Fa%3E%3Ca%20i%3D1%3ENo%20Comments%3C%2Fa%3E&q=Any%20WordPress%20company%20that%20builds%20and%20maintains%20themes%20worth%20its%20salt%20is%20already%20doing%20at%20least%20some%20preliminary%20work%20as%E2%80%89%E2%80%A6%E2%80%89&q=Continue%20reading%C2%A0The%20Automattic%20Theme%20Team%20Announces%20Blockbase%2C%20Its%20New%20Block%20Parent%20Theme%C2%A0%E2%86%92&q=Happy%2018th%20Birthday%2C%20WordPress&q=%3Ca%20i%3D0%3ESarah%20Gooding%3C%2Fa%3E%3Ca%20i%3D1%3E%C2%B7%3C%2Fa%3E&q=May%2027%2C%202021&q=WordPress%20is%20celebrating%2018%20years%20today%20since%20the%20first%20release%20of%20the%20software%20to%20the%20general%20public.%20That%20release%20post%2C%E2%80%89%E2%80%A6%E2%80%89&q=Continue%20reading%C2%A0Happy%2018th%20Birthday%2C%20WordPress%C2%A0%E2%86%92&q=Gutenberg%2010.7%20Integrates%20With%20the%20Pattern%20Directory%2C%20Introduces%20New%20Block%20Design%20Controls' \
  --compressed

@henno
Copy link

henno commented Jun 6, 2021

Here is the request used by the extension. I think it can also be used :

curl 'https://translate.googleapis.com/translate_a/t?anno=3&client=tee&format=html&v=1.0&key&logld=vTE_20210503_00&sl=auto&tl=it&tc=2&sr=1&tk=67691.518207&mode=1' \
  -H 'authority: translate.googleapis.com' \
  -H 'sec-ch-ua: " Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36' \
  -H 'content-type: application/x-www-form-urlencoded' \
  -H 'accept: */*' \
  -H 'origin: https://wptavern.com' \
  -H 'x-client-data: CIe2yQEIpLbJAQipncoBCKidywEIoKDLAQisoMsBCNzyywEIqPPLARiOnssB' \
  -H 'sec-fetch-site: cross-site' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-dest: empty' \
  -H 'referer: https://wptavern.com/' \
  -H 'accept-language: en-US,en;q=0.9,fr-FR;q=0.8,fr;q=0.7,sw-TZ;q=0.6,sw;q=0.5,es;q=0.4,de;q=0.3' \
  --data-raw 'q=Skip%20to%20content&q=WordPress%20Tavern&q=%C2%B7&q=WordPress%20News%20%E2%80%94%20Free%20as%20in%20Beer.&q=Search%20for%3A&q=%0A%09%09%09Navigation%09%09&q=About&q=Contact&q=Podcast&q=News&q=Opinion&q=Plugins&q=Themes&q=Events&q=The%20Automattic%20Theme%20Team%20Announces%20Blockbase%2C%20Its%20New%20Block%20Parent%20Theme&q=%3Ca%20i%3D0%3EJustin%20Tadlock%3C%2Fa%3E%3Ca%20i%3D1%3E%C2%B7%3C%2Fa%3E&q=May%2028%2C%202021&q=%3Ca%20i%3D0%3E%C2%B7%3C%2Fa%3E%3Ca%20i%3D1%3ENo%20Comments%3C%2Fa%3E&q=Any%20WordPress%20company%20that%20builds%20and%20maintains%20themes%20worth%20its%20salt%20is%20already%20doing%20at%20least%20some%20preliminary%20work%20as%E2%80%89%E2%80%A6%E2%80%89&q=Continue%20reading%C2%A0The%20Automattic%20Theme%20Team%20Announces%20Blockbase%2C%20Its%20New%20Block%20Parent%20Theme%C2%A0%E2%86%92&q=Happy%2018th%20Birthday%2C%20WordPress&q=%3Ca%20i%3D0%3ESarah%20Gooding%3C%2Fa%3E%3Ca%20i%3D1%3E%C2%B7%3C%2Fa%3E&q=May%2027%2C%202021&q=WordPress%20is%20celebrating%2018%20years%20today%20since%20the%20first%20release%20of%20the%20software%20to%20the%20general%20public.%20That%20release%20post%2C%E2%80%89%E2%80%A6%E2%80%89&q=Continue%20reading%C2%A0Happy%2018th%20Birthday%2C%20WordPress%C2%A0%E2%86%92&q=Gutenberg%2010.7%20Integrates%20With%20the%20Pattern%20Directory%2C%20Introduces%20New%20Block%20Design%20Controls' \
  --compressed

If you change a single character in --data-raw, you'll get

Your client does not have permission to get URL /translate_a/t?anno=3&client=tee&format=html&v=1.0&key&logld=vTE_20210503_00&sl=auto&tl=it&tc=2&sr=1&tk=67691.518207&mode=1 from this server.

@taiviemthoi
Copy link

Hi @Blair2004 @henno
I know tk=67691.518207 was born based on content translate
and I tried generating token for new content using Stichoza\GoogleTranslate\Tokens\GoogleTokenGenerator
but not working
so can you tell me how the token is generated?

@Blair2004
Copy link

curl --location --request POST 'https://translate.google.com/_/TranslateWebserverUi/data/batchexecute?rpcids=MkEWBc&hl=ru&soc-app=1&soc-platform=1&soc-device=1&_reqid=53165&rt=c' \
--header 'authority: translate.google.com' \
--header 'pragma: no-cache' \
--header 'cache-control: no-cache' \
--header 'sec-ch-ua: "Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"' \
--header 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36' \
--header 'content-type: application/x-www-form-urlencoded;charset=UTF-8' \
--header 'origin: https://translate.google.com' \
--header 'referer: https://translate.google.com/' \
--header 'accept-language: en-US,en;q=0.9,ru-UA;q=0.8,ru;q=0.7,ja-JP;q=0.6,ja;q=0.5,zh-CN;q=0.4,zh-TW;q=0.3,zh;q=0.2,uk;q=0.1' \
--header 'Cookie: NID=213=jMxpp4AcB9CbhtqMEgj78zOxP-71uc_Q_ku6ov-Ffd9FJYrCtiF5xLiWOBZtmQnBnvOXFJMY9qOjEBIA1o5HjiJwWZNisKzNHRO2ekwlsIfQJLsVMdaCBV0X_tNl4QVHbu6sWYniCdkXjDtVMjwID7EAtwTD2WpnD4p_Pr6F48hb_ffQMYXaWYNQDxgmb30jgTi4u0vLfaE1KddtC7E' \
--data-raw 'f.req=%5B%5B%5B%22MkEWBc%22%2C%22%5B%5B%5C%22%D0%9F%D0%B5%D1%80%D1%88%D0%B8%D0%B9%20%D0%BD%D0%B0%D1%86%D1%96%D0%BE%D0%BD%D0%B0%D0%BB%D1%8C%D0%BD%D0%B8%D0%B9%20%D0%BE%D0%BD%D0%BB%D0%B0%D0%B9%D0%BD%20%D0%BF%D0%B5%D1%80%D0%B5%D0%BA%D0%BB%D0%B0%D0%B4%D0%B0%D1%87%5C%22%2C%5C%22uk%5C%22%2C%5C%22en%5C%22%2Ctrue%5D%2C%5Bnull%5D%5D%22%2Cnull%2C%22generic%22%5D%5D%5D&at=AD08yZn3jSHJ2pLXRNZ-gYpVGrLd%3A1618314364485&'

When I still test this curl now it returns an output... so probably the value of "at" doesn't expire?

I ended using this. I created a custom guzzle request and i used DomQuery to be able to extract the language here is how the code looks like :

$client 	=	new Client;
        $request 	=	$client->request( 'POST', 'https://www.google.com/async/translate?vet=12ahUKEwjT7Maf1O_wAhUDBGMBHdhEBykQqDgwAHoECAIQJg..i&ei=XZqyYJPKMIOIjLsP2ImdyAI&yv=3', [
            'headers'	=>	[
                'authority'			=>	'www.google.com',
                'sec-ch-ua'			=>	'Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
                'sec-ch-ua-mobile'	=>  '?0',
                'user-agent'		=>	collect( $this->randomUserAgent )->shuffle()->first(),
                'content-type'		=>	'application/x-www-form-urlencoded;charset=UTF-8',
                'accept'			=>	'*/*',
                'origin'			=>	'https://www.google.com',
                'x-client-data'		=>	'CIe2yQEIpLbJAQipncoBCOH2ygEIqJ3LAQigoMsBCKygywEI8fDLAQiB8ssBCNzyywEIqPPLARiOnssBGJH1ywE=',
                'sec-fetch-site'	=>	'same-origin',
                'sec-fetch-mode'	=>	'cors',
                'sec-fetch-dest'	=>	'empty',
                'referer'			=>	'https://www.google.com/',
                'accept-language'	=>	'en-US,en;q=0.9,fr-FR;q=0.8,fr;q=0.7,sw-TZ;q=0.6,sw;q=0.5,es;q=0.4,de;q=0.3',
                'cookie'			=>	'SEARCH_SAMESITE=CgQIyZIB; SID=-AdTEiYBHQxwyi6tM21CKmx0c4Y1a4q433Cacx-mQACfzwIH0I1fT0wH7pVmUnd_kK_QOA.; __Secure-3PSID=-AdTEiYBHQxwyi6tM21CKmx0c4Y1a4q433Cacx-mQACfzwIHKrDd2zvJ_c5gnPl4a2MMeA.; HSID=Ayll6tzipONjmj1m4; SSID=AqEGOJ5K3d9zhN9JB; APISID=iY7F7EQuWx9eYks1/AAK2jCg-YLchJ2xIx; SAPISID=2ZivtOqq29XgIoHy/ArdWxr5KbTd4RtM9J; __Secure-3PAPISID=2ZivtOqq29XgIoHy/ArdWxr5KbTd4RtM9J; OTZ=5999998_52_52__52_; NID=216=Ni17mzF6uLOBNG4iasK6JP9GjDmN9BbP-VFSNdu6KgFipkAdhdzCVYo9IWOCbkvmHa6HYd7VAaWO40EnGURxQYczydEHbQFatNbk5wDnZwBw0I8aJN8xlpNDynCxs5vHahDdOSFuEt2ppr-BK90W816xk3QOlzDgU1pyHWv0dJqMEVbpSNDIxUCZAJz8GO1oJq5fv1JfJQDYYZ1BJO6EUXww8kdlmGIrNzhmAAvKHUnhu7PKv98OY6EHT39EMC187f1ewAVZV7zlSgcAKNEzgxcFh6PhtMHH6srqOkxkm0E-6oK1l5KBZZSXkvDvDXu_bD-2t8hj0m8-R7hASU5u9AiScP7zjcxumRtpEt1vRA9WHeLCY-EZQ5R8T1A7vpigqpsh9x8O9zOqRkgXcq4R7zL-ww3ohf3chjkQwLX5J9xLnMreSKQ; 1P_JAR=2021-05-29-19; DV=w-LWwaovX_lHMO5HzDUyvBMlILCam9c7hOlmyewo2gAAACAcuD9TMkTrYAAAAGDtw7_cZzBvRwAAAA; UULE=a+cm9sZTogMQpwcm9kdWNlcjogMTIKdGltZXN0YW1wOiAxNjIyMzE3NjY0MTU1MDAwCmxhdGxuZyB7CiAgbGF0aXR1ZGVfZTc6IDM4NDY5NjMyCiAgbG9uZ2l0dWRlX2U3OiAxMTUwMTU2ODAKfQpyYWRpdXM6IDQ3NTA0NDAKcHJvdmVuYW5jZTogNgo=; SIDCC=AJi4QfHkAEuJQkjqHKaVrOSGMBerdz9iiZVsPsE2rw2KWEfGkcMczh3Oo7pwg-Mjmz1EqsE-YrF3; __Secure-3PSIDCC=AJi4QfHUrZjD571gCm5-jqaOQULhDdqmb5ql92leEnpszMcN1eHpL0R-xACOJwmdgoQyIoE1zBBv',
            ],
            'proxy'                 =>  $proxy,
            'form_params'			=>	[
                'async'	=>	'translate,sl:' . $sourceLanguage . ',tl:' . $destination . ',st:' . urlencode( $text ) . ',id:1622317680604,qc:true,ac:true,_id:tw-async-translate,_pms:s,_fmt:pc'
            ]
        ]);

        $dom 	=	'<div>' . ( ( string ) $request->getBody() ) . '</div>';
        $query 	=	new DomQuery( $dom );

        return $query->find( '#tw-answ-target-text' )->text();

So far it works, we only need to figure out the accuracy of the translation.

@sudofox
Copy link

sudofox commented Jun 17, 2021

Any news on this? Translating back and forth between Japanese and English and the ones I get back are much worse than the ones obtained via Google Translate's web interface directly

@Blair2004
Copy link

Hi, the solution I've shared so far work for me, but i'm forced to do many requests to Google which makes me end up with a too many request exception. So I've investigated to see how Google generates the "tk" query parameters from the Google Translate extension.

It looks like the value is generated based on the content, that's why as @henno has mentioned if the body of the request is modified, the whole request it's no more valid. So as in the below image, I've found the function that generates the token using the translated string.

screenshot-newtab-2021 06 20-20_20_09

The function itself looks like this.

image

I've just made the finding, I'll investigate more and see how i can create a similar function on PHP to generate that token. But this should be a nice improvement to the library as we'll also be able to send an array of strings to translate to Google.

@sudofox
Copy link

sudofox commented Jun 20, 2021

You're awesome!!

@Blair2004
Copy link

Hi, i'm coming with some new updates. So, in order to use the function that generates the "tk" token, we need to get a key that is only available on a file provided by Google itself: https://translate.google.com/translate_a/element.js

image

That token should be used with a class that generate the token. I created a sample class.

class TokenGenerator {
    function getKey( $text, $token ) {
      	$tokenExploded 	=	explode( '.', $token );
      	$prefix 		=	( int ) $tokenExploded[0] ?? 0;
      
      	for( 
          	$data 	=	[],
          	$eIndex 	=	0,
          	$fIndex		=	0;
          	$fIndex < strlen( $text ); $fIndex++
        ) {
          	$stringPosition 	=	$this->charCodeAt( $text, $fIndex );
          
          	if ( 128 > $stringPosition ) {
              	$data[$eIndex++] 	=	$stringPosition;
            } else {
              	if ( 2048 > $stringPosition ) {
                  	$data[$eIndex++] = $stringPosition >> 6 | 192;
                } else if ( 
                  55296 == ( $stringPosition & 64512 ) && 
                  $fIndex + 1 < count( $text ) && 
                  56320 == $this->charCodeAt( $text, $fIndex + 1 ) & 64512 
                ) {
                  	$stringPosition 	=	65536 + ( ( $stringPosition & 1023 ) << 10 ) + $this->chartCodeAt( ++$fIndex ) & 1023;
                  	$data[$eIndex++] 	=	$stringPosition >> 18 | 240;
                  	$data[$eIndex++] 	=	$stringPosition >> 12 & 63 | 128;
                } else {
            		$data[$eIndex++] 	=	$stringPosition >> 12 | 224;
                  	$data[$eIndex++] 	=	$stringPosition >> 6 & 63 | 128;
                  	$data[$eIndex++] 	=	$stringPosition & 63 | 128;
                }
            }
        }
      
      	$text 	=	$token;
      	
      	for( $e = 0; $e < count( $data ) ; $e++ ) {
          	$text 	+=	$data[$e];
          	$text	=	$this->jrChars( $text, '+-a^+6' );
        }
      
        $text 	=	$this->jrChars( $text, '+-3^+b+-f' );
        $text 	^=	( int ) $tokenExploded[1] ?? 0;

        if ( 0 > $text ) {
            $text 	=	( ( $text & 2147483647 ) + 2147483648 );
        }

        return ( ( string ) $text %1E6 ) . ( '.' ) . ( $tokenExploded ^ $token );         
    }
  
  	function charCodeAt($string, $offset) {
        $string = mb_substr($string, $offset, 1);
        list(, $ret) = unpack('S', mb_convert_encoding($string, 'UTF-16LE'));
        return $ret;
    }
  
  	function jrChars($a, $b) {
      	for ($c = 0; $c < strlen( $b ) - 2; $c += 3) {
            $d = substr( $b, $c + 2);
            $d = "a" <= $d ? $this->charCodeAt( $d, 0 ) - 87 : ( int ) $d;
            $d = "+" == substr( $b, $c + 1) ? $a >> $d : $a << $d;
            $a = "+" == substr( $b, $c ) ? $a + $d & 4294967295 : ( $a ^ $d );
        }
      
        return $a;
    }
}

$generator 	=	new TokenGenerator;
$generator->getKey( 'Hello World', "451185.3571800534" ); // output : 493811.451184

I'll now do tests with Google to see whether it's effective or not.

@sudofox
Copy link

sudofox commented Jun 22, 2021

i tried my hand at it and your example class is a bit broken for me (I changed it to one implementing TokenProviderInterface and added the interface method, but it was wacky, especially around charCodeAt), so I spent a bit trying to dig up the source from within the gtranslate webapp page. After a bit of hacking around, I was able to produce this: https://gist.github.com/sudofox/3b7c5b75472392e15891537f0dae2325

It's what you see starting here:

Screenshot from 2021-06-22 11-29-39

which is deeply nested inside more uglified evals inside JS objects, not going to track back to where I found it (just searched for one of the magic numbers in your example function to find it)

Relevant part no. 1:

      jp = function(u, S, z, I, D, f, A, K, J, q, Q, x, k) {
        for (f = (I = J = 0, []); J < S.length; J++) q = S.charCodeAt(J), 128 > q ? f[I++] = q : (2048 > q ? f[I++] = (D = q >> 6, -193 - 2 * ~(D | 192) + (~D | 192)) : (55296 == -~q + (~q ^ 64512) + (~q & 64512) && J + 1 < S.length && 56320 == (K = S.charCodeAt(J + 1), (K | 0) + (~K ^ 64512) - (K | -64513)) ? (q = 65536 + ((q & u) << 10) + (x = S.charCodeAt(++J), -2 * ~(x & u) - 1 + ~x + (x & -1024)), f[I++] = q >> 18 | 240, f[I++] = (Q = q >> 12 & 63, 128 + (Q & -129))) : f[I++] = (k = q >> 12, (k | 0) + ~(k & 224) - -225), f[I++] = (A = q >> 6 & 63, z - (~A ^ 128) - (~A & 128))), f[I++] = (q | 0) + (q & -64) - 2 * (q ^ 63) + 2 * (~q & 63) | 128);
        return f
      },

Going to add more comments when I get a sec. Let's solve this together!

@Blair2004
Copy link

i tried my hand at it and your example class is a bit broken for me (I changed it to one implementing TokenProviderInterface and added the interface method, but it was wacky, especially around charCodeAt), so I spent a bit trying to dig up the source from within the gtranslate webapp page. After a bit of hacking around, I was able to produce this: https://gist.github.com/sudofox/3b7c5b75472392e15891537f0dae2325

It's what you see starting here:

Screenshot from 2021-06-22 11-29-39

which is deeply nested inside more uglified evals inside JS objects, not going to track back to where I found it (just searched for one of the magic numbers in your example function to find it)

Relevant part no. 1:

      jp = function(u, S, z, I, D, f, A, K, J, q, Q, x, k) {
        for (f = (I = J = 0, []); J < S.length; J++) q = S.charCodeAt(J), 128 > q ? f[I++] = q : (2048 > q ? f[I++] = (D = q >> 6, -193 - 2 * ~(D | 192) + (~D | 192)) : (55296 == -~q + (~q ^ 64512) + (~q & 64512) && J + 1 < S.length && 56320 == (K = S.charCodeAt(J + 1), (K | 0) + (~K ^ 64512) - (K | -64513)) ? (q = 65536 + ((q & u) << 10) + (x = S.charCodeAt(++J), -2 * ~(x & u) - 1 + ~x + (x & -1024)), f[I++] = q >> 18 | 240, f[I++] = (Q = q >> 12 & 63, 128 + (Q & -129))) : f[I++] = (k = q >> 12, (k | 0) + ~(k & 224) - -225), f[I++] = (A = q >> 6 & 63, z - (~A ^ 128) - (~A & 128))), f[I++] = (q | 0) + (q & -64) - 2 * (q ^ 63) + 2 * (~q & 63) | 128);
        return f
      },

Going to add more comments when I get a sec. Let's solve this together!

Hi yes I noticed my class doesn't generate the right token. I actually tried to covert the original javascript functions into a php, look like I did a mistake somewhere.

@Blair2004
Copy link

Hi,
small update from my end. I haven't been able to make the token encoder "tk" work properly, then i decided to involve JavaScript. Since the code is made using Javascript, then the work turns easier then.

Translator (JS)

So, I created a class for JavaScript that translates a file (JSON) into a defined language. This uses the Google translation version used on the extension and this extension provide a better translation result.

Major Benefit

What I like with this approach is that you can submit a list of strings, have it translated and returned just using one request. Previously using the package, I wasn't able to do that, so for a text that has 20 paragraphs, I was forced to perform 20 requests and as I mentioned already I ended with a "Too Many Requests" Exception (maybe I was using that in a wrong way).

How It works

1 - I load the file provided by Google that has a key... I then need to create a virtual browser (on NodeJS) so that the token can be added to a window variable.
2 - I create a class that uses the functions extracted from the Google Chrome Translator extension, that will be used to generate the token.
3 - I read a provided JSON file to translate (as an argument on the CLI) and join them to issue the token.
4 - I run the class with all the necessary information and I get as a result an array of the translated string.

screenshot-www scrapingbee com-2021 06 26-11_44_46

Now, I'm not sure how we can make this work with this package. I have to highlight how easier it was to make this work with NodeJS, so I believe we need somehow to have JavaScript involved. I'm out of ideas for now, what do you think can be the possible steps to go here?

@henno
Copy link

henno commented Jul 10, 2021

@sudofox How is your progress?

@sudofox
Copy link

sudofox commented Jul 12, 2021

I hate to break it to you but I did a bit of cost analysis on my project and found that the usage of the paid API fell far under the "free" limit, so I switched to the official one. Good luck though :0

@henno
Copy link

henno commented Jul 25, 2021

I hate to break it to you but I did a bit of cost analysis on my project and found that the usage of the paid API fell far under the "free" limit, so I switched to the official one. Good luck though :0

Same with me. Unfortunately this project is somewhat useless unless this issue is resolved.

@henno
Copy link

henno commented Jul 25, 2021

What happens when you replace

'client' => 'webapp',

with

'client' => 'gtx',

in vendor/stichoza/google-translate-php/src/GoogleTranslate.php?

@henno
Copy link

henno commented Jul 26, 2021

I got the quality issue fixed by changing the client from webapp to gtx. Does anyone know what the gtx stands for and will there be any side effects from changing the client from webapp to gtx?

@henno
Copy link

henno commented Jul 26, 2021

@sudofox I also tried the "official way" but it was so complicated that after spending an hour trying fix permission issues with Google Cloud, I gave up. Then I found https://github.com/statickidz/php-google-translate-free which appeared to produce higher quality translations and then I snooped in its source code and found this commit which was made on the same day this issue was opened. I noticed that he had changed the client and tried the same in this project and what do you know, it worked.

@henno
Copy link

henno commented Aug 1, 2021

@Stichoza could you try to change the client and try if it fixes this issue for you and if it does, close this issue and release a new version.

@carlosvaldesweb
Copy link

carlosvaldesweb commented Aug 4, 2021

Hello @henno i've changed the client as you mentioned, but the quality is low still. I had see the statickidz branch, but i couldn't solve it. could you help me? Also i've deleted the others params in dt.

protected $urlParams = [
        'client'   => 'gtx',
        'hl'       => 'en',
        'dt'       => [
            't',   // Translate
        ],
        'sl'       => null, // Source language
        'tl'       => null, // Target language
        'q'        => null, // String to translate
        'ie'       => 'UTF-8', // Input encoding
        'oe'       => 'UTF-8', // Output encoding
        'multires' => 1,
        'otf'      => 0,
        'pc'       => 1,
        'trs'      => 1,
        'ssel'     => 0,
        'tsel'     => 0,
        'kc'       => 1,
        'tk'       => null,
    ];

@henno
Copy link

henno commented Aug 4, 2021

@arcanaer Did you change webapp to gtx in vendor/stichoza/google-translate-php/src/GoogleTranslate.php?

@carlosvaldesweb
Copy link

@henno Yes, sorry, it was an error in my code implementation, but i can confirm that i you use my code above, it works with better translation.

@Stichoza
Copy link
Owner

Stichoza commented Aug 5, 2021

Thanks @henno! 🥳 It works and I released a new version v4.1.5.

Also added ->setClient() method so if you want to use low quality translation you can type ->setClient('webapp')

@Stichoza Stichoza unpinned this issue Aug 5, 2021
@henno
Copy link

henno commented Oct 24, 2021

Hello all,

I noticed that the quality has gone down again. Can anyone verify this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants