Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape scene by name #1712

Merged
merged 26 commits into from
Sep 14, 2021
Merged

Conversation

WithoutPants
Copy link
Collaborator

Adds the ability to query scene scrapers using keywords, in the same way as performer scraper queries.

Where there are queryable scene scrapers available, adds a button to the scene edit panel:

image

After clicking an option, presents a dialog to enter the search term:

image

Results are shown in the same manner as the stash-box tagger:

image

Selecting a scene opens the usual scrape dialog for selecting fields to save.

Scene queries are added to scraper configurations by adding sceneByName and sceneByQueryFragment sections. For example:

sceneByName:
  action: scrapeXPath
  queryURL: https://www.imdb.com/find?q={}
  scraper: sceneSearch

sceneByQueryFragment:
  action: scrapeXPath
  queryURL: "{url}"
  scraper: sceneScraper

xPathScrapers:
  sceneSearch:
    ...

  sceneScraper:
    ...

Proof of concept IMDB scraper with scene query support is attached.

imdb.yml.txt

@WithoutPants WithoutPants added the feature Pull requests that add a new feature label Sep 9, 2021
@WithoutPants WithoutPants added this to the Version 0.10.0 milestone Sep 9, 2021
@Belleyy
Copy link
Contributor

Belleyy commented Sep 9, 2021

The button should stay align with the rest (Using firefox, not tested of other browser):
image

When you hover the button, you get a actions.scrape_query info
image

If you remove sceneByURL in the scraper, you no longer able search by name. (Dissapear from the dropdown and scraper list)
For my test, i wanted to search on Javlibrary using the name, then when select the scene, it will use my script scraper (so a other scraper) that have the sceneByURL for javlibrary.

In the manual (Scraping.md line 105)

| `sceneByName` | `{"name": "<scene query string>"}` | Array of JSON-encoded performer fragments (including at least `name`) |

I think it should be scene, not performer.

@WithoutPants
Copy link
Collaborator Author

Thanks. These should all be resolved now.

@gitgiggety
Copy link
Contributor

Updated the SARJ-LLC scraper to support these new methods, and it does seem to work fine for those too.

Only thing I'm wondering about, but not really related to these changes is whether the generated title, which is reused for the search query, should be tidied. For example replacing ., - and _ with a space etc. But that's more related to the scanner and not this PR. But after scanning you'll obviously get the filename as title and afterwards using this new scraping method the filename will thus also be prefilled as search query. So would be nice if that got tidied somewhere in the process.

For the UI it might be a bit cleaner to add the search button to the Scrape with... button? Split the button in two, Scrape with... on the left, separator in between, and the search icon on the right.

As followup to the discussion in #1630 it works great for at least my use case, except having to "tidy" the filename (prefilled as title based on scanning) for the search query but that obviously was the case already. Although for some reason at least the SARJ-LLC API did accept ., - _ just fine (so only having to remove unneeded parts of the filename, I didn't have to modify those), while searching for a scene with . still in the search term doesn't actually find them. So that IMO also kinda leads to the question who needs to make these transformations (as already mentioned). Should it be part of the scanning process? Must the user do it manually? Should it be done when prefilling search term from title? Should it be done in the backend when calling the scraper? Should it be done in the scraper? IMO it should either be part of the scanning process or be done manually. Any other form might lead to weird issues where user input gets transformed. Either by mangling the title to the prefilled query (which can be corrected) or in some hidden form after submitting the search query. The only "hidden" transformation that would IMO be acceptable is when it scraper specific. (For example not being able to use a space but having to use _ instead, it would be odd to require a user to input _ instead of spaces, while changing _ into a space would be nice based on the fact that the filename can/will mostly be used as prefilled input)

@Belleyy
Copy link
Contributor

Belleyy commented Sep 10, 2021

Some idea:

  • If the scraper get a error/no result, it should be good if it show "No scene found" or the Error in the window.
    Exemple:
    image

  • When the search success, should be nice to have somewhere "{} scene(s) found".

@bnkai
Copy link
Collaborator

bnkai commented Sep 12, 2021

Everything seems and works ok using a couple of modified scrapers.

As followup to the discussion in #1630 it works great for at least my use case, except having to "tidy" the filename (prefilled as title based on scanning) for the search query but that obviously was the case already. Although for some reason at least the SARJ-LLC API did accept ., - _ just fine (so only having to remove unneeded parts of the filename, I didn't have to modify those), while searching for a scene with . still in the search term doesn't actually find them. So that IMO also kinda leads to the question who needs to make these transformations (as already mentioned). Should it be part of the scanning process? Must the user do it manually? Should it be done when prefilling search term from title? Should it be done in the backend when calling the scraper? Should it be done in the scraper? IMO it should either be part of the scanning process or be done manually. Any other form might lead to weird issues where user input gets transformed. Either by mangling the title to the prefilled query (which can be corrected) or in some hidden form after submitting the search query. The only "hidden" transformation that would IMO be acceptable is when it scraper specific. (For example not being able to use a space but having to use _ instead, it would be odd to require a user to input _ instead of spaces, while changing _ into a space would be nice based on the fact that the filename can/will mostly be used as prefilled input)

IMO we could either modify the scanner code (probably expand the part that removes the file extension) or add a plugin that edits the title. Using a regex or blacklist the user could choose how to clear his titles. Both are out of scope of this PR and the scraper code in general IMO though.
Cleaning for scraping purposes should be left manual in the Scene By Name case (for the scene by fragment part xpath/json scrapers already utilize the URL Replace functionality) since cleaning should be done on a site/api basis and is scraper specific as you mention. Scrapers should make minor corrections if needed ( site/api specific character changes) but since the user can change the input prior to the query he should be the one to remove the main noise from the title.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Pull requests that add a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants