Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix identify and script scraper bugs #2375

Merged
merged 9 commits into from
Mar 14, 2022

Conversation

WithoutPants
Copy link
Collaborator

Supercedes #2231

  • continue checking sources if a scraper source fails during identify task
  • ensure nil values are returned from scraperSource.ScrapeScene and scraper.Cache.ScrapeID
  • convert scraper output to object pointers instead of concrete objects so that null json values are handled correctly

Fixes issue where autotag scraper would output the following if nothing could be tagged: error scraping from scraper builtin_autotag: could not convert content to scene

Fixes issue where a script scraper returning no results would be interpreted as a found scene despite being empty.

@WithoutPants WithoutPants added the bug Something isn't working label Mar 9, 2022
@WithoutPants WithoutPants added this to the Version 0.14.0 milestone Mar 9, 2022
@bnkai
Copy link
Collaborator

bnkai commented Mar 13, 2022

Fixes above seem to work ok.
There is still an issue with json scrapers and identify
I tested the below in identify ThePornDB(json scraper)->stashdb->Traxxx(script scraper)
When there is no matching result from ThePornDB we get the below

�[33mWARN�[0m[2022-03-13 12:48:22] key 'Details': could not find json path 'data.0.description' in json object 
�[33mWARN�[0m[2022-03-13 12:48:22] key 'Image': could not find json path 'data.0.background.small' in json object 
�[33mWARN�[0m[2022-03-13 12:48:22] key 'Title': could not find json path 'data.0.title' in json object 
�[33mWARN�[0m[2022-03-13 12:48:22] key 'URL': could not find json path 'data.0.url' in json object 
�[33mWARN�[0m[2022-03-13 12:48:22] key 'Date': could not find json path 'data.0.date' in json object 
�[37mDEBU�[0m[2022-03-13 12:48:22] Nothing to set for xxxxxxxx.mp4

and the identify doesnt skip to the next scraper (it only does if there is an error during scrape, timeout for example)
A sample response (no match) from tpdb that causes the above is

{"data":[],"links":{"first":"https:\/\/api.metadataapi.net\/scenes?query=&page=1","last":"https:\/\/api.metadataapi.net\/scenes?query=&page=1","prev":null,"next":null},"meta":{"current_page":1,"from":null,"last_page":1,"links":[{"url":null,"label":"« Previous","active":false},{"url":"https:\/\/api.metadataapi.net\/scenes?query=&page=1","label":"1","active":true},{"url":null,"label":"Next »","active":false}],"path":"https:\/\/api.metadataapi.net\/scenes","per_page":25,"to":null,"total":0}

The above behaviour means that json scrapers can not be used above others in the scraper list as they will only transition to the next scraper due to an error

Transitions from script scrapers (Traxxx) or stash-box scrapers (stashdb) to the next in list seem to work fine (Either due to an error or no match )

@bnkai
Copy link
Collaborator

bnkai commented Mar 14, 2022

Looks ok and tests ok now.

@WithoutPants WithoutPants merged commit 9e3d56b into stashapp:develop Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants