Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There seems to be an issue with the Yandex translation module #4

Closed
pmi123 opened this issue Feb 10, 2021 · 53 comments
Closed

There seems to be an issue with the Yandex translation module #4

pmi123 opened this issue Feb 10, 2021 · 53 comments
Assignees
Labels
enhancement New feature or request

Comments

@pmi123
Copy link

pmi123 commented Feb 10, 2021

I acquired a key from Yandex per the documentation, and added it to the self._id in yandex.py. I then tried the following test:

YandexTranslate().translate("Er ist klug.", 'en', source_language='de')

And received this response:

[2021-02-10 15:14:10] DEBUG [urllib3.connectionpool._new_conn:815] Starting new HTTPS connection (1): translate.yandex.com:443
[2021-02-10 15:14:11] DEBUG [urllib3.connectionpool._make_request:396] https://translate.yandex.com:443 "GET / HTTP/1.1" 302 None
[2021-02-10 15:14:11] DEBUG [urllib3.connectionpool._make_request:396] https://translate.yandex.com:443 "GET /showcaptcha?cc=1&retpath=https%3A//translate.yandex.com/%3F_d8b298bcbb08a8bc220250756257504f&t=0/1612995251/ed1baa2e6a042ec73b1baca9e7b53471&s=d8a68d97e1858e6fff641ad5605ad257 HTTP/1.1" 200 6231
[2021-02-10 15:14:11] DEBUG [urllib3.connectionpool._new_conn:815] Starting new HTTPS connection (1): translate.yandex.net:443
[2021-02-10 15:14:12] DEBUG [urllib3.connectionpool._make_request:396] https://translate.yandex.net:443 "GET /api/v1/tr.json/translate?id=1308a84a.6016deed.0c4881a2.74722d74657874-3-0&srv=tr-text&lang=en&reason=de&format=text HTTP/1.1" 403 44
[2021-02-10 15:14:12] DEBUG [translation_services.yandex.translate:85] status_code=403, json_code=406

When I googled for the error code from Yandex, I found this page: https://yandex.com/dev/translate/doc/dg/concepts/api-keys.html, which implies the free API was discontinued in May 2020 for non-residents of the Russian Federation.

What am I doing wrong, or has the api been discontinued?

Thanks!

Mark

@Animenosekai
Copy link
Owner

Animenosekai commented Feb 11, 2021

Hello and thank you for taking the time to report a bug from translatepy

translatepy is currently not using Yandex's Official Public API but rather the API used by Yandex.Translate

But now that you are reporting it, I think that it would be nice to have a way of using the Public APIs and fallback to the other APIs. (Maybe an authentification method added to enter your credentials)

Are you using this API: https://cloud.yandex.com/docs/translate/operations/translate

If so, could you explain to me how to use the API keys (how to pass them in the request as I don't really understand the folder system)
Or are you using another API?

Thank you for your report

Animenosekai

@Animenosekai Animenosekai self-assigned this Feb 11, 2021
@Animenosekai Animenosekai added the enhancement New feature or request label Feb 11, 2021
@pmi123
Copy link
Author

pmi123 commented Feb 12, 2021

I am using your yandex.py code and your API key (self._id) - I misspoke in my earlier email. I have been testing so many translation apis that I am having difficulty keeping them straight. :(

I could not sign up with Yandex.Translate because you first have to have a Yandex.Cloud account, and the cloud account requires a cell phone number to activate the account. I did set up a yandex mail account, but now I have to add a cell phone before I can log in, so that is not working, either. I hope that helps!

I am happy to help you debug or extend this code.

Mark

@Animenosekai
Copy link
Owner

Sorry, I don't really know about the Yandex.Translate API ID system as the ID I'm using is from ssut/py-googletrans#268 (comment)

I guess that if your ID doesn't work (403 means Forbidden), you might have mis-pasted it or the ID isn't valid.

Let me know if you find any solution!

@pmi123
Copy link
Author

pmi123 commented Feb 12, 2021

My apologies for not being clear in my earlier posts. I am writing to you regarding the yandex.py module in this github account - https://github.com/Animenosekai/translate/blob/main/translatepy/translators/yandex.py. I assumed, perhaps incorrectly, that this line in that code represents some sort of API key.

self._id = "1308a84a.6016deed.0c4881a2.74722d74657874-3-0"

This module, yandex.py, does not seem to be working.

Python 3.6.9 (default, Oct  8 2020, 12:12:24) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import translatepy
>>> print(translatepy.__version__)
translatepy v1.1.3
>>> import safeIO
>>> from translatepy.translators import yandex
>>> y = yandex.YandexTranslate()
>>> y.translate("Er ist ein kluger Kerl", 'en', source_language='de')
request.status_code=403, request.json()['code']=406
(None, None)
>>> y.translate("Bonjour, je m'appelle Mark", 'en', source_language='fr')
request.status_code=403, request.json()['code']=406
(None, None)
>>> y.translate("È un ragazzo intelligente.", 'en', source_language='it')
request.status_code=403, request.json()['code']=406
(None, None)
>>> 

I added a print statement in your yandex.py code to see the status_code and json code.

    def translate(self, text, destination_language, source_language="auto"):
        """
        Translates the given text to the given language
        """
        try:
            if source_language is None:
                source_language = "auto"
            if isinstance(source_language, Language):
                source_language = source_language.yandex_translate
            url = self._base_url + "translate?id=" + self._id + "&srv=tr-text&lang=" + str(destination_language) +"&reason=" + str(source_language) + "&format=text"
            request = get(url, headers=self._headers, data={'text': str(text), 'options': '4'})
            **print("request.status_code=%s, request.json()['code']=%s" % (request.status_code, request.json()['code']))**
            if request.status_code < 400 and request.json()["code"] == 200:
                data = loads(request.text)
                return str(data["lang"]).split("-")[0], data["text"][0]
            else:
                return None, None
        except:
            return None, None

Are you seeing the same thing? Do you have a solution to make the yandex.py module in this github account working again?

Thanks,

Mark

@Animenosekai
Copy link
Owner

Animenosekai commented Feb 12, 2021

Yandex.Translate's API is currently used as a fallback when nothing worked as there is a low chance of it being working:

When the YandexTranslate class is initialised, it calls the refreshSID method which parses the website to find a Session ID (SID, referenced as _sid). That's why it is recommended to use the default translate method from the high level Translator class and not directly use the translators' classes.

Example:
Use

>>> from translatepy import Translator
>>> t = Translator()
>>> t.translate("Hello", "Japanese")
Source (English): Hello
Result (Japanese): こんにちは # worked

Instead of

>>> from translatepy.translators import yandex
>>> y = yandex.YandexTranslate()
>>> y.translate("Hello", "jp")
(None, None) # didn't work

If refreshSID doesn't succeed on getting a Session ID, the translation won't work and None will be returned.

For example, it now works fine for me, but Yandex gives very strict Rate-Limiting and bot detection rules and I'm pretty sure that refreshing it more than 10 times won't work

(By the way, you don't need to import safeIO when using translatepy. It should be imported automatically)

Animenosekai

@pmi123
Copy link
Author

pmi123 commented Feb 12, 2021

Your example works because you are getting the translation from Google or Bing, and not Yandex.

When you call y = yandex.YandexTranslate() as shown above, refreshSID is being called, so it should work. I believe there is nothing different in the way I am calling the yandex translator than how your Translator class calls the yandex translator. If I am wrong, please let me know.

Can you show us how you were able to show that the yandex.py module is working?

The whole point of my post was to inform you that your yandex code may not be working. Perhaps Yandex no longer honors requests using v1.0 of the API because I believe the current version is 1.5, and you are using version 1.0. There may be other issues as well (e.g. GET versus POST requests).

The translator reverso.py also appears to not be working. It returns code 400. I have never used the reverso.py module before today and I only made one request to the server for my example below. I don't think it is an issue of accessing the server multiple times that is causing this error.

Only Bing and Google translators work in your code, as far as I can tell. And, as shown below, they can be called individually without using your Translator class. Since they are the first translator modules called in your Translator class, you always get a translation by using that class.

>>> from translatepy.translators import reverso
>>> r = reverso.ReversoTranslate()
>>> r.translate("È un ragazzo intelligente.", 'en', source_language='it')
request.status_code=400
(None, None)
>>> from translatepy.translators import bing
>>> b = bing.BingTranslate()
>>> b.translate("È un ragazzo intelligente.", 'en', source_language='it')
('it', "He's a smart guy.")
>>> from translatepy.translators import google
>>> g = google.GoogleTranslate()
>>> g.translate("È un ragazzo intelligente.", 'en', source_language='it')
('it', 'He is a smart guy.')
>>>

It would help if you included some unit tests for each translation module to show that each translation service you offer in your code actually works.

Also, if you know that a service does not work, I suggest removing that service from your code until you update your code and can show it is working Or, at a minimum, modify the README.md file to say which services work and which ones don't work.

Mark

@Animenosekai
Copy link
Owner

Animenosekai commented Feb 12, 2021

I've just checked Reverso and it seems to work fine for me:

>>> reverso.ReversoTranslate().translate("Hello", "fra")
('eng', 'Bonjour')

Check that the source_language and the destination_language are in the correct format: You seem to use the ISO 639-1 Alpha 2 code format while Reverso uses ISO 639-2/3 Alpha-3

This is why using the Translator class is better as it fallbacks to different translators in case something goes wrong, provides a caching system and converts the languages to the correct format automatically, wether you gave the language name in any language, the alpha-2 code or the alpha-3 code.

Also, Reverso uses POST requests on their site, so I'm also using it.

As for Yandex, I just checked it does seem to be an issue from a misunderstanding of how their API work.

I just committed a fix but make sure that _sid is set to something else than an empty string '' as it seems to be also the problem (verify that you are not rate-limited by going on their site in private browsing)

@pmi123
Copy link
Author

pmi123 commented Feb 12, 2021

Thanks for looking into the yandex translator! I will check out the changes over the weekend.

I will check for the _sid value. Do you have any idea when the rate limiting times out?

As a suggestion, could you add an optional argument to Translator.translate() that allows the user to specify the translation service (Bing, Google, Yandex, etc.)? Not all translation services are equal for every language. Some users may want the option to bypass Google or Bing for one of the other translation services. Or, compare translations from each service. One cannot do that with your code as it always starts with Google, and the user has no option to specify which translation service should be used.

I agree your current implementation of always going through your translators one at a time in a fixed order will work for many users. I am just making a suggestion to give your code more flexibility for different users.

Thanks!

Mark

@Animenosekai
Copy link
Owner

Yea that seem like a nice idea!

I'm going to look into it when I finish some of my stuff on my other projects (and school yikes, holidays are coming soon though).

@NawtJ0sh
Copy link

Hmm it does seem to trigger the captcha system on yandex translate if its called to many times for me. Has anyone got this yet https://gyazo.com/c0e06ba89c8fea6047744ab74e548ad7 ?

@Animenosekai
Copy link
Owner

Hmm it does seem to trigger the captcha system on yandex translate if its called to many times for me. Has anyone got this yet https://gyazo.com/c0e06ba89c8fea6047744ab74e548ad7 ?

Yes it actually is what happens when you are "rate-limited", when they detect that a bot is using it

@pmi123
Copy link
Author

pmi123 commented Feb 13, 2021

Animenosekai,

I have found a possible issue with your code that you may want to look at. I was trying to figure out why Yandex would not do any translations for me while I was testing other services (Google, Bing, Reverso). In looking at the output from the debug runs, I saw I was also hitting the Yandex server every time I made translation, even though I was not accessing the Yandex server for a translation, but was using Bing or Google or Reverso. I looked at your code and found the following.

When you instantiate the Translator class in the init method you have:

    def __init__(self) -> None:
        self.google_translate = GoogleTranslate()
        self.yandex_translate = YandexTranslate()
        self.bing_translate = BingTranslate()
        self.reverso_translate = ReversoTranslate()

Which works well because Google, Bing, and Reverso either have pass in their init methods, or some local assignments. However, in the YandexTranslate class, you make a call to the Yandex server to get the _sid value. This happens every time a Translator class is created. In your code as it is written now, it does not matter because 99.999% of the time your code will use Google or Bing or Reverso for the translation. However, in the event those services decide to not process your request, Yandex will also most likely not process the request because your code has been hitting their server for the _sid each time a new Translator object is created.

In looking at the output, it takes only ~3 Translator objects instantiations to trigger the Yandex server to stop listening to requests from your code. If you implement my suggestion above to allow a user to decide which translation service to use, then you will have to make sure that using other services does not "turn off" the Yandex service as a side effect.

Instead, you might want to hold off getting the _sid value from the Yandex server until you really plan to use the Yandex service for a translation. Perhaps, make if the first step in your translate function for the YandexTranslate class instead of in the init method.

A further enhancement would be to add an exponential backoff when using Yandex for translations. In the translate function for the YandexTranslate class, you could call self.refreshSID method, check the return value, and start an exponential backoff timer to see if waiting will open the connection. It would take some testing to see if this id feasible, as the Yandex server might block for an hour or more, to the backoff will not be feasible. Does the documentation for Yandex say anything about this?

What do you think?

Mark

@NawtJ0sh
Copy link

Hmm it does seem to trigger the captcha system on yandex translate if its called to many times for me. Has anyone got this yet https://gyazo.com/c0e06ba89c8fea6047744ab74e548ad7 ?

Yes it actually is what happens when you are "rate-limited", when they detect that a bot is using it

Yes true, have you got that? Might have to make it store the _sid into a txt file and use it that way, and make it update it, if it gives a error.

@Animenosekai
Copy link
Owner

Waiting for CodeQL but 1ff4acb seems to work.

I'll upload it soon!

@pmi123 What do you mean "exponential backoff"?

@pmi123
Copy link
Author

pmi123 commented Feb 14, 2021

@Animenosekai "In a variety of computer networks, binary exponential backoff or truncated binary exponential backoff refers to an algorithm used to space out repeated retransmissions of the same block of data, often to avoid network congestion."
source: https://en.wikipedia.org/wiki/Exponential_backoff

In your particular use case, you want to retry getting the _sid after waiting a certain amount of time. While waiting in a retry loop, instead of calling time.sleep(x) where x = const (i.e. for a fixed amount of time), use time.sleep(y), where y = an exponentially growing variable based on how many times one has waited for a response.

For example, y = (2 ** n) + (random.randint(0, 1000) / 1000), where n = the number of retries. The random part is just to keep the timing of the retires, well, random to a degree. If your retry code is in a method that accesses an api, and there are many users using that method at the same time, then all the retries could end up in lockstep, and no user will get a response. The random bit prevents this lockstep. It is not really needed in your use case, but it doesn't hurt. It would also defeat a Yandex server from "thinking", "Well, this request from url=z is coming in every x=2 seconds, so lets assume it is a bot and block it."

Be sure to include a MAX_RETRIES value in the retry loop to break out of the loop and admit the retires failed, or your retry method may run forever.

Finally, there are many many python implementations of this type of algorithm, usually as a decorator to a method, that performs the retry loop on either values or exceptions. Just google "python exponential backoff retry example" for lots of examples. It is also not hard to roll your own, as your use case is not complicated.

I hope that helps!

Mark

@pmi123
Copy link
Author

pmi123 commented Feb 14, 2021

@Animenosekai regarding 1ff4acb, it might be cleaner to update the refreshSID() method to check the value of _sid, refresh it, and also retry refreshing until you get a value, or return None for the translation. You only have to make a few simple changes to translate, language, etc. instead of sprinkling if self._sid == "" throughout your code. Just an idea.

def __init__(self):
    self._base_url = "https://translate.yandex.net/api/v1/tr.json/"
    self._sid = ""
    self._headers = self._header()
        
def refreshSID(self):
    while self._sid in ["", " "]:
        data = get("https://translate.yandex.com/", headers=self._headers).text
        sid_position = data.find("Ya.reqid = '")
        if sid_position != -1:
            data = data[sid_position + 12:]
            self._sid = data[:data.find("';")]
            break;
        else:
            # sleep some amount of time
            # keep track of retries and stop after MAX_RETRIES
    return self._sid

def translate(self, text, destination_language, source_language="auto"):
    if self.refreshSID():
        ....continue with the existing code.....
    else:
        return None, None

@Animenosekai
Copy link
Owner

Animenosekai commented Feb 14, 2021

This seem like a great idea but it might won't it mean making function where the user waits for the SID and therefore making a very long blocking operation?

Wouldn't it be better for example to add a lastTried variable and refresh only if (2 ** n) + (random.randint(0, 1000) / 1000) passed until last _sid retrieving trial?

@pmi123
Copy link
Author

pmi123 commented Feb 14, 2021

I am not sure I understand your question, but let me try to answer what I think you are asking.

The blocking stops after (1) a valid code is retrieved from Yandex, or (2) the MAX_RETRIES has been reached. Blocking is roughly 1 sec, 2 sec, 3 sec. That value of MAX_RETRIES is a trade off between the value of the translation, how long a user will wait, how long Yandex will block, and how long sid is valid.

I am assuming that you don't know (1) how long Yandex blocks a bot, and (2) how long the self._sid is valid. Some testing might shed some light on the values of these two quantities, and help determine if it is feasible to wait or the code should pick another translation service to use. If the results of some testing show that Yandex blocks a bot for too long (based on the user's expectations), then the strategy should be to use another translation service instead of waiting for Yandex.

I am not sure what information the lastTried variable holds. The self._sid has enough information to determine whether the _sid is valid (except as noted above).

If you mean lastTried is the number of retries used the last time the refreshSID was run, I think it would be better to bake that value into the initial value of n. In other words, if some testing shows that it takes ~2 seconds to get a valid _sid, (e.g Yandex blocks bots for roughly 2 seconds) then the code should start with n=3, and avoid the first two retires, as they will only aggravate the Yandex servers.

If you mean lastTried is the "age" of the _sid, then one could use that value to determine if a new value for _sid is needed. However, if you don't know the lifespan of the _sid, I am not sure how to use this value. The code above will eventually get a valid _sid if the current one has expired.

You could also reset _sid after each operation is concluded (ie. translate, language, etc.), if the documentation or some testing says that the _sid is just a one shot value and has to be determined for every access to the Yandex servers.

An untested attempt at fleshing out the code above:

    def refreshSID(self):
        n = 0
        while self._sid in ["", " "]:
            data = get("https://translate.yandex.com/", headers=self._headers).text
            sid_position = data.find("Ya.reqid = '")
            if sid_position != -1:
                data = data[sid_position + 12:]
                self._sid = data[:data.find("';")]
                break;
            else:
                # sleep some amount of time
                # keep track of retries and stop after MAX_RETRIES
                if n < self.MAX_RETRIES:
                    time.sleep((2 ** n) + (random.randint(0, 1000) / 1000))
                    n += 1
                else:
                    break;
        return self._sid    

The function either returns the _sid value, or "". in the translate function, one can test the self._sid value to see if it is valid.

Let me know if I missed the idea behind your questions.

Mark

@Animenosekai
Copy link
Owner

Basically the problem is that this would block the user for too long:

Admitting that the user wants to translate with Yandex (Translator(use_google=False, use_bing=False, use_reverso=False, use_yandex=True)):

# the user request a translation
t.translate("Hello", "Japanese")
# then the refreshSID method will be called
# It will try to download the webpage (which is already quite long)
data = get("https://translate.yandex.com/", headers=self._headers).text
# it will do his lightweight operations
sid_position = data.find("Ya.reqid = '")
if sid_position != -1:
    data = data[sid_position + 12:]
    self._sid = data[:data.find("';")]
    break; # great, the user had to wait only ~ the time to download the webpage
else: # but here comes what I consider too long for the user to wait
    time.sleep((2 ** n) + (random.randint(0, 1000) / 1000)) # the user needs to wait ~ 2 seconds first

# then it will redownload the webpage
# then if the sid is not found the user will need to wait ~ 4 seconds and the download time
# then ~ 8 seconds
# And imagining that self.MAX_RETRIES == 2, the user will need to wait at worst more than ~ 15 seconds without any return

What I meant with lastTried is this:

# the user request a translation
t.translate("Hello", "Japanese")
# if the SID is blank or is not working, the refreshSID method is called

if time() - self.lastTried > 600: # if the duration between the last time we tried to get the SID and now is greater than 10 minutes
    # it will do the stuff to get the sid
    data = get("https://translate.yandex.com/", headers=self._headers).text
    sid_position = data.find("Ya.reqid = '")
    if sid_position != -1: # yay it got it!
        data = data[sid_position + 12:]
        self._sid = data[:data.find("';")]
        self.lastTried = time() # maybe keep that in a file
        # we will retry the translation and it will return it
    else:
        self.lastTried = time() # maybe keep that in a file
        # the translation will return None
else:
    pass # do nothing as we know that yandex will rate-limit us if we ping too much their website
    # the translation will basically return None

@pmi123
Copy link
Author

pmi123 commented Feb 15, 2021

I started the code for the retry with n = 0, so the first wait time is around 1 second. The wait time on the 2nd iteration (MAX_RETRIES = 2) is ~3 seconds.

A sample set of wait times (seconds) for 5 iterations of the wait cycles:

>>> for n in range(0,5):
...     print("n=%s, wait=%s sec" % (n, (2**n) + (random.randint(0,1000)/1000)))
... 
n=0, wait=1.003 sec
n=1, wait=2.9699999999999998 sec
n=2, wait=4.244 sec
n=3, wait=8.597 sec
n=4, wait=16.457 sec

The only reason for the wait is that the _sid is not valid. You don't know the reason why a valid _sid was not returned. Perhaps a Yandex server is going through a reboot or db cleanup, or too much traffic on the network, or Yandex thinks you are a bot, or..... One way to automatically fix the problem is to wait and try again. However, if your wait time is too short, then each time you hit the Yandex server, you will reset the "bot wait time" on the server and never get through.

The reason for the growing wait time is to allow the "bot wait time" on the Yandex server to time out, so your code can get a valid _sid. You can change the parameters of the wait time to make them smaller. But you run the risk of always resetting the Yandex "bot wait time" because your intervals are too short and you will never get a response. Bottom line, we don't know the Yandex server's "bot wait time", nor do we know when it might change. It could be 3 seconds today, 10 seconds tomorrow, or based on some algorithm that throttles responses to these requests based on current network traffic. Hence the growing wait time. The goal is to get a Yandex translation in the least amount of time, up to a certain limit, then quit trying.

Your second solution is also valid, but you are only allowing the user to access the Yandex server once every 10 minutes. This is equivalent to setting n=10 in the first example, and you found n=4 to be too long to wait. This solution would be better if you knew the life of an _sid, and how many times I can hit the server with the same _sid and get a valid response. However, _sid could easily be one a time value that has to be refreshed on each request to the server. Maybe I can hit the Yandex server every 3 seconds to get a new code, so in your 10 minute wait I could have made 200 translations. Is there any documentation on this?

Some testing would tell us a little bit about the Yandex server's "bot wait time", but as I said earlier that number can change at any time.

I don't think there is a need to store the self.lastTried in a text file. When the program starts, just go get the _sid. If it fails, then set your self.lastTried. In your code, you aren't reusing the _sid, so if it is initialized at program start, you are free to go get it.

@Animenosekai
Copy link
Owner

Animenosekai commented Feb 15, 2021

@pmi123

This "bot wait time" is a captcha preventing requests from getting the desired webpage (with the SID in it) and displaying an action which requires the action of a human (typing the word shown).

This "rate-limit" occurs quite frequently and is quite long (more than 10 minutes).

The snippet I wrote isn't waiting for a good answer for 10 minutes but rather is saying "Well I've already tried not so long ago, I should a little before retrying to get an SID and I should just say that I couldn't proceed with the request".

Also, the SID seems does not seem to be a one-time token as I already used it for multiple requests. It could maybe have a timeout of a day, an hour, etc.

MAX_RETRIES=4 (32 seconds, the trying time adds up at each iteration) is wayyy too long for the user: imagine writing a program which needs translations and having your program stopped just because another module is trying to refresh his ID while he might not succeed.

Is there any documentation on this?
Nope sorry, I searched for it but couldn't find anything (quite normal since it's not a public API)

but you are only allowing the user to access the Yandex server once every 10 minutes
In fact, not really. I'm allowing the module to go try to fetch a new SID every 10 minutes. If it couldn't find any, the translate method will just return None

Sorry for not being very clear...😔

@Animenosekai
Copy link
Owner

Animenosekai commented Feb 15, 2021

Another method would be to implement sort of a mix between my snippet of code and your, running it endlessly (before finding a working SID), resuming it when the SID isn't working again.

The refreshing function would be running in another thread, in the background so that it is seamless to the user + it would mean quicker results for the user

@NawtJ0sh
Copy link

The _sid seems to last like a few days or so before it expires, when it does it throws this error: {u'message': u'Session has expired', u'code': 406}.

@pmi123
Copy link
Author

pmi123 commented Feb 19, 2021

@Animenosekai Apologies for not fully understanding how Yandex treats the sID. Given the 10 min delay before Yandex will return a new sID, I agree your approach in this instance is much better than the exponential backoff method I proposed.

Based on your post, my "bot_wait_time" is the same as your "rate limit" - the time the Yandex server refuses to service a request (get the sID). I should have defined my terms better...it would have created less confusion.

I like your idea of the background thread. Programming threads is fun and challenging to take care of all the timing issues and edge cases. Or, just save lastTried to disk and check it as needed.

If you save the value to disk, you will have to figure out what to do in the use case that the lastTried value as deleted from the file system by some other program than yours. At that point, the options are (1) sID="", or (2) sID=some value. How do you decided to refresh the sID or not in the second case? If it has been less than 10 minutes since the last refresh, will you get a new valid sID, or will Yandex freeze you out for 10 minutes? If it has been more than 10 minutes, then it should be OK to refresh the sID.

Mark

@Animenosekai
Copy link
Owner

Nice! I'm planning on implementing it in the next week as I was working on another project ~

If it has been less than 10 minutes since the last refresh, will you get a new valid sID, or will Yandex freeze you out for 10 minutes?

Maybe return None until I can get a new SID?

@NawtJ0sh
Copy link

Wow the _sid I had stored into a txt file lasted 8 days before it expired, my bot only updated it once, by removing the old _sid and replacing it with the new _sid.

@Animenosekai
Copy link
Owner

Wow, seems like yandex.com banned my ip, it won't let me connect to it, watch out lol

Lmao, yea that's why it's better using the base Translator() class as it will change to other services when one is returning errors.

Also, what do you mean by "banned my ip": Is it just that you need to solve captchas when you go to their site or you are even forbidden to go to the website and verify that you are a human?

@NawtJ0sh
Copy link

NawtJ0sh commented Mar 14, 2021

Wow, seems like yandex.com banned my ip, it won't let me connect to it, watch out lol

Lmao, yea that's why it's better using the base Translator() class as it will change to other services when one is returning errors.

Also, what do you mean by "banned my ip": Is it just that you need to solve captchas when you go to their site or you are even forbidden to go to the website and verify that you are a human?

Yeah they cut off the connection to my ip completely! Like "this site can't be reached" was what I was getting lol for the whole yandex.com.

@Animenosekai
Copy link
Owner

Wow, seems like yandex.com banned my ip, it won't let me connect to it, watch out lol

Lmao, yea that's why it's better using the base Translator() class as it will change to other services when one is returning errors.
Also, what do you mean by "banned my ip": Is it just that you need to solve captchas when you go to their site or you are even forbidden to go to the website and verify that you are a human?

Yeah they cut off the connection to my ip completely! Like "this site can't be reached" was what I was getting lol for the whole yandex.com.

Can you access the website with a VPN though?

@NawtJ0sh
Copy link

Wow, seems like yandex.com banned my ip, it won't let me connect to it, watch out lol

Lmao, yea that's why it's better using the base Translator() class as it will change to other services when one is returning errors.
Also, what do you mean by "banned my ip": Is it just that you need to solve captchas when you go to their site or you are even forbidden to go to the website and verify that you are a human?

Yeah they cut off the connection to my ip completely! Like "this site can't be reached" was what I was getting lol for the whole yandex.com.

Can you access the website with a VPN though?

Yes, thats how I noticed I got ip banned aha

@Animenosekai
Copy link
Owner

Animenosekai commented Mar 14, 2021

lmaoooo well I guess people shouldn't try to use Yandex Translate too much

@ZhymabekRoman

This comment has been minimized.

@Animenosekai
Copy link
Owner

Yeah, I couldn't even imagine how hard it would be to fix Yandex Translate module, and yet could not fix. The only thing I have implemented so far is refatorining the code.

Yes they have a pretty strict rate limiting/bot detecting system which triggers captcha and even bans your IP if you use it too much (that's why I'm calling it last in the Translator class)

@ZhymabekRoman
Copy link
Contributor

Yes they have a pretty strict rate limiting/bot detecting system which triggers captcha

Yes, I noticed that Yandex very greedily does not want to give out SID (session ID). I did not know that the Russians are so greedy (although I myself am Russian)

even bans your IP if you use it too much

Wow! It's overkill

@ZhymabekRoman
Copy link
Contributor

ZhymabekRoman commented Apr 26, 2021

After 5 hours of experimentation, today I managed to get Yandex Translator - I found a bug (or feature) in the REST API method tr.json, which allows you not to use (and not parse) SID. In a few hours I think I will finally write it all into Python code.

@NawtJ0sh
Copy link

Hey everyone, is the yandex translate still working for you?

@Animenosekai
Copy link
Owner

Hey everyone, is the yandex translate still working for you?

Yea we fixed it and now that you are reminding me let me just publish the new version ~~

@Animenosekai
Copy link
Owner

Hey everyone, is the yandex translate still working for you?

I just published v1.7 on PyPI.

You should now be able to update translatepy with the usual command:

pip install --upgrade translatepy

@NawtJ0sh
Copy link

Yeah I thought there was an update on yandex's end but it was just my file system messing up lol. But I think there was an update on the Bing translate now.

@ZhymabekRoman
Copy link
Contributor

But I think there was an update on the Bing translate now.

The Bing translator has very strict limits on the number of requests per minute/per hour/per day (not exactly known), there are no methods to bypass the Bing API restriction yet (for more information, see the following message). As an option to use a proxy, or to use other, more stable services, such as Yandex, he did not care about the number of requests and the quality of the translation seems to me he is better than Google translator (at least the languages of the post-Soviet Union). But the restriction of requests of the Bing translator is not even nearby with restrictions DeepL the translator, this is just some kind of hell

@ZhymabekRoman
Copy link
Contributor

ZhymabekRoman commented May 26, 2021

Let's go back to Bing Translator. In principle, there is one loophole that I think will allow to bypass all the restrictions of requests - it is to use Microsoft Translate. As far as I know, they both use the same engine. The only difference is that Microsoft Translate requires an API key that is linked to the account and charged for use, and I understand it is intended for the corporate segment. But if we look at the Microsoft Translate mobile application, we can see that the application generates the x-mt-signature and x-clienttraceid header based on some data, and the server makes a free translation. x-clienttraceid is just a regular v4 UUID, but x-mt-signature looks like the value of hmac sha256, time and some other unknown data. If we can solve this riddle, we will have a stable Bing translator.

photo_2021-05-26_15-43-43

@NawtJ0sh
Copy link

fromLang: auto-detect
text: kanker
to: en
token: 1D03dhmjKLPeQvzr4OpEdrGFhDA-hPC9
key: 1622167441181

bing translate added in a token and key, they seem to expire pretty fast, maybe every 5 minutes.

@ZhymabekRoman
Copy link
Contributor

bing translate added in a token and key

Yes, that's exactly right. See #13

maybe every 5 minutes.

I ran some tests - the token and the key are valid for at least 10 minutes

@NawtJ0sh
Copy link

Hi guys, after looking into the bing translate to find where the token and key is stored I have found it!

params_RichTranslateHelper = [1622472153451,"VvgFaimiFuqUEoaS5Z8r9IyKcNoGVkPO",900000];

@ZhymabekRoman
Copy link
Contributor

ZhymabekRoman commented May 31, 2021

after looking into the bing translate to find where the token and key is stored I have found it!

Yes, thanks. I have already implemented this in the upcoming alpha version 2.0:

https://github.com/ZhymabekRoman/translate/blob/80ce159757f1b6a5ac20c2559474f66cae8488b8/translatepy/translators/bing.py#L69-L73

@NawtJ0sh
Copy link

Yes, thanks. I have already implemented this in the upcoming alpha version 2.0
https://github.com/ZhymabekRoman/translate/blob/80ce159757f1b6a5ac20c2559474f66cae8488b8/translatepy/translators/bing.py#L69-L73

Nice!!

@Animenosekai
Copy link
Owner

Animenosekai commented May 31, 2021

I'm very sorry for not helping much...

I'll have holidays on July and August so I'll fully be able to code on translate there

@Animenosekai
Copy link
Owner

wow such a great thread sad to be needing to close this as v2 will come in no time solving the issue ~

@ZhymabekRoman
Copy link
Contributor

ZhymabekRoman commented Jun 15, 2022

The strangest behavior that ever happened to me with Python...

Today I decided to run tests to see if all the translators work correctly. As it turns out Yandex is messed up again and it's not working. The weird thing about this situation is that Yandex Translate requests fine with cURL, but I can't get it to work with Python3 requests (I only managed to run it in Python 3.10 on my Android device using Termux). I do not even know how to explain this, I think we need to debug requests with Mitmproxy and see what the difference in the requests, maybe then we can understand the pattern

cURL command:

curl -X GET "https://translate.yandex.net/api/v1/tr.json/detect?sid=3ad888660a4e4bbe8f88b3e0e591e2be&srv=android&text=%D0%A1%D3%99%D0%BB%D0%B5%D0%BC%2C%20%D0%B1%D0%B5%D1%80%D1%96%D0%BF%20%D0%BE%D1%82%D1%8B%D1%80%20%D0%B5%D0%BA%D0%B5%D0%BD&hint=en"

Python3 code converted from the cURL command with the utility https://curlconverter.com/

import requests

params = {
    'sid': '3ad888660a4e4bbe8f88b3e0e591e2be',
    'srv': 'android',
    'text': 'Сәлем, беріп отыр екен',
    'hint': 'en',
}

response = requests.get('https://translate.yandex.net/api/v1/tr.json/detect', params=params)
response.status_code 
# Result - 403

Maybe it's because V1 is now deprecated and not working some requests do not work, but then why did I manage on a Android with Python 3.10 to make a successful request?!

frosty@frost-pc:~$ curl https://translate.yandex.net/api/v1/tr.json/detect
{"code":410,"message":"Yandex.Translate API v1 is no longer active. Please migrate to API v1.5: https://tech.yandex.com/translate/."}

@ZhymabekRoman
Copy link
Contributor

With library "httpx" Yandex Translate works perfectly.
@Animenosekai, what do you think about it?

@Animenosekai
Copy link
Owner

With library "httpx" Yandex Translate works perfectly.
@Animenosekai, what do you think about it?

It might be a problem with the headers that we set up by default.

@Animenosekai Animenosekai unpinned this issue Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants