Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opensubtitles isn't what you want to use #1

Open
vankasteelj opened this issue May 22, 2016 · 5 comments
Open

opensubtitles isn't what you want to use #1

vankasteelj opened this issue May 22, 2016 · 5 comments

Comments

@vankasteelj
Copy link

Each imdb request you do to opensubtiltes is increasing load on their servers, as they parse just in time.

As your app clearly doesn't have anything to do with subtitles, you should switch to a real metadata provider, like omdb, themoviedb, tvdb or trakt. By using opensubtitles to identify files, you're 'taking' brandwidth that opensubtitles users would want to, well, access subtitles.

take a look at https://github.com/vankasteelj/trakt.tv and the https://github.com/vankasteelj/trakt.tv-matcher plugin

@sarathkcm
Copy link
Owner

sarathkcm commented May 22, 2016

I am just starting off with the basics first, and the app is going to have subtitle download/upload features when the first version is released. The trakt based client seems nice, but it cannot match the hash based identification Open Subtitles provides I am afraid.

Probably all the subtitle downloaders out there displays metadata of a movie, it's not going to be much different in this case. For a single user, data is stored in file, and only whenever a new movie is added to the collection app is going to identify it and add to the database. I am not sure; but I think there are restrictions in place if a client creates high traffic to the api.

I will keep this issue open for using trakt api first and then if it does not find a result, use hash based search. What I have in mind is to mix metadata from different providers to gather as much data as possible about a movie/series, but the first version is going to rely only on opensubttitles & probably tmdb, since there is a lot of other work required for building a usable version.

@sarathkcm
Copy link
Owner

Also, is OpenSubtitles parsing IMDb each time a request is received? I was under the impression that they have a database of parsed details, with some expiration date for the data. If not, then that's not really an efficient implementation. Since the website also displays metadata, I believe it is cached somehow. Parsing every time doesn't sound like a good idea.

@vankasteelj
Copy link
Author

This method should be used only, when you can not use some existing libraries. IMPORTANT: on our server is executed external program for every query (parser), so don't use this function too often, better use 3rd parties libraries

https://trac.opensubtitles.org/projects/opensubtitles/wiki/XMLRPC#GetIMDBMovieDetails

@sarathkcm
Copy link
Owner

sarathkcm commented May 22, 2016

Well I had read the documents 3-4 years back and don't remember seeing the bit about disabling the UA. I guess I will have to worry about it now. Probably for metadata I can use other APIs after getting the file identified using hash.

@sarathkcm
Copy link
Owner

sarathkcm commented May 22, 2016

Ok, That's there. But to prove my point, please do go through the screenshots below.

I am making GetIMDbMovieDetails call for different movies, notice the difference between the number of people voted for the film in the response and in the actual website. It is because the response is coming from cache.

Deadpool,
res
scr

Ant Man

res2
scr2

Angry Birds : Note that this is a new movie, too new for any subtitles present in OS. So response is not cached. Votes in response and in website are the same.

res3
scr3

Now see the time taken by api to return for another latest movie, The Nice Guys

First time takes over 3 seconds to process, it is parsing IMDb and returning the results:

first

Second time if same movie is queried it processes the request in 0.003 seconds, because the response is coming from cache.

second

So after all, the responses are cached. Which means if a user has a movie file which he/she is able to identify with hash from OpenSubtitles, the movie is already present in Open Subtitles. So it is highly unlikely (opinionated) that GetIMDbMovie details method will require IMDb parsing.

I am worried about the disabling of user agent part though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants