-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There seems to be an issue with the Yandex translation module #4
Comments
Hello and thank you for taking the time to report a bug from
But now that you are reporting it, I think that it would be nice to have a way of using the Public APIs and fallback to the other APIs. (Maybe an authentification method added to enter your credentials) Are you using this API: https://cloud.yandex.com/docs/translate/operations/translate If so, could you explain to me how to use the API keys (how to pass them in the request as I don't really understand the folder system) Thank you for your report
|
I am using your yandex.py code and your API key (self._id) - I misspoke in my earlier email. I have been testing so many translation apis that I am having difficulty keeping them straight. :( I could not sign up with Yandex.Translate because you first have to have a Yandex.Cloud account, and the cloud account requires a cell phone number to activate the account. I did set up a yandex mail account, but now I have to add a cell phone before I can log in, so that is not working, either. I hope that helps! I am happy to help you debug or extend this code. Mark |
Sorry, I don't really know about the Yandex.Translate API ID system as the ID I'm using is from ssut/py-googletrans#268 (comment) I guess that if your ID doesn't work (403 means Forbidden), you might have mis-pasted it or the ID isn't valid. Let me know if you find any solution! |
My apologies for not being clear in my earlier posts. I am writing to you regarding the yandex.py module in this github account - https://github.com/Animenosekai/translate/blob/main/translatepy/translators/yandex.py. I assumed, perhaps incorrectly, that this line in that code represents some sort of API key.
This module, yandex.py, does not seem to be working.
I added a print statement in your yandex.py code to see the status_code and json code.
Are you seeing the same thing? Do you have a solution to make the yandex.py module in this github account working again? Thanks, Mark |
Yandex.Translate's API is currently used as a fallback when nothing worked as there is a low chance of it being working: When the Example: >>> from translatepy import Translator
>>> t = Translator()
>>> t.translate("Hello", "Japanese")
Source (English): Hello
Result (Japanese): こんにちは # worked Instead of >>> from translatepy.translators import yandex
>>> y = yandex.YandexTranslate()
>>> y.translate("Hello", "jp")
(None, None) # didn't work If For example, it now works fine for me, but Yandex gives very strict Rate-Limiting and bot detection rules and I'm pretty sure that refreshing it more than 10 times won't work
|
Your example works because you are getting the translation from Google or Bing, and not Yandex. When you call Can you show us how you were able to show that the The whole point of my post was to inform you that your yandex code may not be working. Perhaps Yandex no longer honors requests using v1.0 of the API because I believe the current version is 1.5, and you are using version 1.0. There may be other issues as well (e.g. GET versus POST requests). The translator Only Bing and Google translators work in your code, as far as I can tell. And, as shown below, they can be called individually without using your
It would help if you included some unit tests for each translation module to show that each translation service you offer in your code actually works. Also, if you know that a service does not work, I suggest removing that service from your code until you update your code and can show it is working Or, at a minimum, modify the README.md file to say which services work and which ones don't work. Mark |
I've just checked Reverso and it seems to work fine for me: >>> reverso.ReversoTranslate().translate("Hello", "fra")
('eng', 'Bonjour') Check that the source_language and the destination_language are in the correct format: You seem to use the ISO 639-1 Alpha 2 code format while Reverso uses ISO 639-2/3 Alpha-3 This is why using the Translator class is better as it fallbacks to different translators in case something goes wrong, provides a caching system and converts the languages to the correct format automatically, wether you gave the language name in any language, the alpha-2 code or the alpha-3 code. Also, Reverso uses POST requests on their site, so I'm also using it. As for Yandex, I just checked it does seem to be an issue from a misunderstanding of how their API work. I just committed a fix but make sure that _sid is set to something else than an empty string |
Thanks for looking into the yandex translator! I will check out the changes over the weekend. I will check for the _sid value. Do you have any idea when the rate limiting times out? As a suggestion, could you add an optional argument to I agree your current implementation of always going through your translators one at a time in a fixed order will work for many users. I am just making a suggestion to give your code more flexibility for different users. Thanks! Mark |
Yea that seem like a nice idea! I'm going to look into it when I finish some of my stuff on my other projects (and school yikes, holidays are coming soon though). |
Hmm it does seem to trigger the captcha system on yandex translate if its called to many times for me. Has anyone got this yet https://gyazo.com/c0e06ba89c8fea6047744ab74e548ad7 ? |
Yes it actually is what happens when you are "rate-limited", when they detect that a bot is using it |
Animenosekai, I have found a possible issue with your code that you may want to look at. I was trying to figure out why Yandex would not do any translations for me while I was testing other services (Google, Bing, Reverso). In looking at the output from the debug runs, I saw I was also hitting the Yandex server every time I made translation, even though I was not accessing the Yandex server for a translation, but was using Bing or Google or Reverso. I looked at your code and found the following. When you instantiate the Translator class in the
Which works well because Google, Bing, and Reverso either have In looking at the output, it takes only ~3 Translator objects instantiations to trigger the Yandex server to stop listening to requests from your code. If you implement my suggestion above to allow a user to decide which translation service to use, then you will have to make sure that using other services does not "turn off" the Yandex service as a side effect. Instead, you might want to hold off getting the A further enhancement would be to add an exponential backoff when using Yandex for translations. In the translate function for the What do you think? Mark |
Yes true, have you got that? Might have to make it store the _sid into a txt file and use it that way, and make it update it, if it gives a error. |
@Animenosekai "In a variety of computer networks, binary exponential backoff or truncated binary exponential backoff refers to an algorithm used to space out repeated retransmissions of the same block of data, often to avoid network congestion." In your particular use case, you want to retry getting the For example, Be sure to include a Finally, there are many many python implementations of this type of algorithm, usually as a decorator to a method, that performs the retry loop on either values or exceptions. Just google "python exponential backoff retry example" for lots of examples. It is also not hard to roll your own, as your use case is not complicated. I hope that helps! Mark |
@Animenosekai regarding 1ff4acb, it might be cleaner to update the
|
This seem like a great idea but it might won't it mean making function where the user waits for the SID and therefore making a very long blocking operation? Wouldn't it be better for example to add a |
I am not sure I understand your question, but let me try to answer what I think you are asking. The blocking stops after (1) a valid code is retrieved from Yandex, or (2) the I am assuming that you don't know (1) how long Yandex blocks a bot, and (2) how long the I am not sure what information the If you mean If you mean You could also reset An untested attempt at fleshing out the code above:
The function either returns the Let me know if I missed the idea behind your questions. Mark |
Basically the problem is that this would block the user for too long: Admitting that the user wants to translate with Yandex ( # the user request a translation
t.translate("Hello", "Japanese")
# then the refreshSID method will be called
# It will try to download the webpage (which is already quite long)
data = get("https://translate.yandex.com/", headers=self._headers).text
# it will do his lightweight operations
sid_position = data.find("Ya.reqid = '")
if sid_position != -1:
data = data[sid_position + 12:]
self._sid = data[:data.find("';")]
break; # great, the user had to wait only ~ the time to download the webpage
else: # but here comes what I consider too long for the user to wait
time.sleep((2 ** n) + (random.randint(0, 1000) / 1000)) # the user needs to wait ~ 2 seconds first
# then it will redownload the webpage
# then if the sid is not found the user will need to wait ~ 4 seconds and the download time
# then ~ 8 seconds
# And imagining that self.MAX_RETRIES == 2, the user will need to wait at worst more than ~ 15 seconds without any return What I meant with # the user request a translation
t.translate("Hello", "Japanese")
# if the SID is blank or is not working, the refreshSID method is called
if time() - self.lastTried > 600: # if the duration between the last time we tried to get the SID and now is greater than 10 minutes
# it will do the stuff to get the sid
data = get("https://translate.yandex.com/", headers=self._headers).text
sid_position = data.find("Ya.reqid = '")
if sid_position != -1: # yay it got it!
data = data[sid_position + 12:]
self._sid = data[:data.find("';")]
self.lastTried = time() # maybe keep that in a file
# we will retry the translation and it will return it
else:
self.lastTried = time() # maybe keep that in a file
# the translation will return None
else:
pass # do nothing as we know that yandex will rate-limit us if we ping too much their website
# the translation will basically return None |
I started the code for the retry with A sample set of wait times (seconds) for 5 iterations of the wait cycles:
The only reason for the wait is that the The reason for the growing wait time is to allow the "bot wait time" on the Yandex server to time out, so your code can get a valid Your second solution is also valid, but you are only allowing the user to access the Yandex server once every 10 minutes. This is equivalent to setting Some testing would tell us a little bit about the Yandex server's "bot wait time", but as I said earlier that number can change at any time. I don't think there is a need to store the |
This "bot wait time" is a captcha preventing requests from getting the desired webpage (with the SID in it) and displaying an action which requires the action of a human (typing the word shown). This "rate-limit" occurs quite frequently and is quite long (more than 10 minutes). The snippet I wrote isn't waiting for a good answer for 10 minutes but rather is saying "Well I've already tried not so long ago, I should a little before retrying to get an SID and I should just say that I couldn't proceed with the request". Also, the SID seems does not seem to be a one-time token as I already used it for multiple requests. It could maybe have a timeout of a day, an hour, etc.
Sorry for not being very clear...😔 |
Another method would be to implement sort of a mix between my snippet of code and your, running it endlessly (before finding a working SID), resuming it when the SID isn't working again. The refreshing function would be running in another thread, in the background so that it is seamless to the user + it would mean quicker results for the user |
The _sid seems to last like a few days or so before it expires, when it does it throws this error: {u'message': u'Session has expired', u'code': 406}. |
@Animenosekai Apologies for not fully understanding how Yandex treats the Based on your post, my "bot_wait_time" is the same as your "rate limit" - the time the Yandex server refuses to service a request (get the I like your idea of the background thread. Programming threads is fun and challenging to take care of all the timing issues and edge cases. Or, just save If you save the value to disk, you will have to figure out what to do in the use case that the Mark |
Nice! I'm planning on implementing it in the next week as I was working on another project ~
Maybe return None until I can get a new SID? |
Wow the _sid I had stored into a txt file lasted 8 days before it expired, my bot only updated it once, by removing the old _sid and replacing it with the new _sid. |
Lmao, yea that's why it's better using the base Also, what do you mean by "banned my ip": Is it just that you need to solve captchas when you go to their site or you are even forbidden to go to the website and verify that you are a human? |
Yeah they cut off the connection to my ip completely! Like "this site can't be reached" was what I was getting lol for the whole yandex.com. |
Can you access the website with a VPN though? |
Yes, thats how I noticed I got ip banned aha |
lmaoooo well I guess people shouldn't try to use Yandex Translate too much |
This comment has been minimized.
This comment has been minimized.
Yes they have a pretty strict rate limiting/bot detecting system which triggers captcha and even bans your IP if you use it too much (that's why I'm calling it last in the |
Yes, I noticed that Yandex very greedily does not want to give out SID (session ID). I did not know that the Russians are so greedy (although I myself am Russian)
Wow! It's overkill |
After 5 hours of experimentation, today I managed to get Yandex Translator - I found a bug (or feature) in the REST API method tr.json, which allows you not to use (and not parse) SID. In a few hours I think I will finally write it all into Python code. |
Hey everyone, is the yandex translate still working for you? |
Yea we fixed it and now that you are reminding me let me just publish the new version ~~ |
I just published v1.7 on PyPI. You should now be able to update pip install --upgrade translatepy |
Yeah I thought there was an update on yandex's end but it was just my file system messing up lol. But I think there was an update on the Bing translate now. |
The Bing translator has very strict limits on the number of requests per minute/per hour/per day (not exactly known), there are no methods to bypass the Bing API restriction yet (for more information, see the following message). As an option to use a proxy, or to use other, more stable services, such as Yandex, he did not care about the number of requests and the quality of the translation seems to me he is better than Google translator (at least the languages of the post-Soviet Union). But the restriction of requests of the Bing translator is not even nearby with restrictions DeepL the translator, this is just some kind of hell |
Let's go back to Bing Translator. In principle, there is one loophole that I think will allow to bypass all the restrictions of requests - it is to use Microsoft Translate. As far as I know, they both use the same engine. The only difference is that Microsoft Translate requires an API key that is linked to the account and charged for use, and I understand it is intended for the corporate segment. But if we look at the Microsoft Translate mobile application, we can see that the application generates the x-mt-signature and x-clienttraceid header based on some data, and the server makes a free translation. x-clienttraceid is just a regular v4 UUID, but x-mt-signature looks like the value of hmac sha256, time and some other unknown data. If we can solve this riddle, we will have a stable Bing translator. |
bing translate added in a token and key, they seem to expire pretty fast, maybe every 5 minutes. |
Yes, that's exactly right. See #13
I ran some tests - the token and the key are valid for at least 10 minutes |
Hi guys, after looking into the bing translate to find where the token and key is stored I have found it!
|
Yes, thanks. I have already implemented this in the upcoming alpha version 2.0: |
Nice!! |
I'm very sorry for not helping much... I'll have holidays on July and August so I'll fully be able to code on |
wow such a great thread sad to be needing to close this as v2 will come in no time solving the issue ~ |
The strangest behavior that ever happened to me with Python... Today I decided to run tests to see if all the translators work correctly. As it turns out Yandex is messed up again and it's not working. The weird thing about this situation is that Yandex Translate requests fine with cURL, but I can't get it to work with Python3 requests (I only managed to run it in Python 3.10 on my Android device using Termux). I do not even know how to explain this, I think we need to debug requests with Mitmproxy and see what the difference in the requests, maybe then we can understand the pattern cURL command: curl -X GET "https://translate.yandex.net/api/v1/tr.json/detect?sid=3ad888660a4e4bbe8f88b3e0e591e2be&srv=android&text=%D0%A1%D3%99%D0%BB%D0%B5%D0%BC%2C%20%D0%B1%D0%B5%D1%80%D1%96%D0%BF%20%D0%BE%D1%82%D1%8B%D1%80%20%D0%B5%D0%BA%D0%B5%D0%BD&hint=en" Python3 code converted from the cURL command with the utility https://curlconverter.com/ import requests
params = {
'sid': '3ad888660a4e4bbe8f88b3e0e591e2be',
'srv': 'android',
'text': 'Сәлем, беріп отыр екен',
'hint': 'en',
}
response = requests.get('https://translate.yandex.net/api/v1/tr.json/detect', params=params)
response.status_code
# Result - 403 Maybe it's because V1 is now deprecated and not working some requests do not work, but then why did I manage on a Android with Python 3.10 to make a successful request?! frosty@frost-pc:~$ curl https://translate.yandex.net/api/v1/tr.json/detect
{"code":410,"message":"Yandex.Translate API v1 is no longer active. Please migrate to API v1.5: https://tech.yandex.com/translate/."} |
With library "httpx" Yandex Translate works perfectly. |
It might be a problem with the headers that we set up by default. |
I acquired a key from Yandex per the documentation, and added it to the self._id in yandex.py. I then tried the following test:
YandexTranslate().translate("Er ist klug.", 'en', source_language='de')
And received this response:
When I googled for the error code from Yandex, I found this page: https://yandex.com/dev/translate/doc/dg/concepts/api-keys.html, which implies the free API was discontinued in May 2020 for non-residents of the Russian Federation.
What am I doing wrong, or has the api been discontinued?
Thanks!
Mark
The text was updated successfully, but these errors were encountered: