Add image scraping support #370

WithoutPants · 2020-02-13T08:39:24Z

Resolves #344

Adds the ability to scrape performer images and scene cover images.

This change also introduces the subScraper xpath post-processing option. If subScraper appears in an attribute xpath configuration, then the sub-scraper will be executed after all other post-processes are complete. It then takes the value and performs an http request, using the value as the URL. Within the subScraper config is a nested scraping configuration. This allows you to traverse to other webpages to get the attribute value you are after.

For example, from the Boobpedia scraper config in #333 :

...
performerScraper:
  performer:
    # ..snip..
    Image:
      selector: //table[@class="infobox"]//tr[2]//a/@href
      # URL is a partial url, add the first part
      replace:
        - regex: ^
          with: http://www.boobpedia.com
      subScraper:
        selector: //div[@class="fullImageLink"]/a/@href
        replace:
          - regex: ^
            with: http://www.boobpedia.com

This fragment gets the URL from the xpath //table[@class="infobox"]//tr[2]//a/@href, adds the http://www.boobpedia.com prefix with the replace post-process. Then the sub-scraper post-process is run. It requests the document from the resulting URL, then gets the URL from //div[@class="fullImageLink"]/a/@href of the resulting page, followed by the replace post-process.

The Image value is expected to be a URL itself, which the system will subsequent request and encode.

Also adds image scraping to the stash scraper.

pkg/scraper/image.go

bnkai · 2020-02-13T21:36:00Z

Tests ok with me.
Images are fetched ok and saved with the boobpedia demo scraper.

WithoutPants · 2020-03-10T05:37:06Z

Rebased and ported UI changes to 2.5. @bnkai can you please review on v2 and v2.5 of the UI?

MrX292 · 2020-03-10T16:44:02Z

stashapp/CommunityScrapers#2 some scrapers for it

bnkai

Tested against v2 and v2.5 UI using the boobpedia and @MrX292 's mofos , newfreeones xpath scrapers.
Both scene and performer images seem to work fine.

* Add sub-scraper functionality * Add scraping of performer image * Add scene cover image scraping * Port UI changes to v2.5 * Fix v2.5 dialog suggest color * Don't convert eol of UI to support pretty

WithoutPants added the feature Pull requests that add a new feature label Feb 13, 2020

bnkai reviewed Feb 13, 2020

View reviewed changes

pkg/scraper/image.go Outdated Show resolved Hide resolved

WithoutPants marked this pull request as ready for review March 2, 2020 23:34

WithoutPants requested a review from bnkai March 2, 2020 23:46

WithoutPants added this to the Version 0.2.0 milestone Mar 3, 2020

bnkai approved these changes Mar 3, 2020

View reviewed changes

WithoutPants added 12 commits March 10, 2020 14:35

Add sub-scraper functionality

dec65af

Add scraping of performer image

d516378

Add scene cover image scraping

7a7aa90

Fix regression in stash scrape

2780791

Add dependency

e856b1d

Add 30 second timeout to get image

0b7fecf

Fix unit test failure

85deecf

Port UI changes to v2.5

4175742

Fix v2.5 dialog suggest color

5c59b0f

Don't convert eol of UI to support pretty

b38a012

Fix formatting

40122a6

Port scene image UI change to 2.5

4e7a75f

WithoutPants force-pushed the scrape_image branch from 6ccdfe2 to 4e7a75f Compare March 10, 2020 05:36

WithoutPants requested a review from bnkai March 10, 2020 05:36

bnkai approved these changes Mar 10, 2020

View reviewed changes

WithoutPants merged commit 34d8293 into stashapp:develop Mar 11, 2020

WithoutPants deleted the scrape_image branch February 4, 2021 03:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add image scraping support #370

Add image scraping support #370

WithoutPants commented Feb 13, 2020

bnkai commented Feb 13, 2020

WithoutPants commented Mar 10, 2020

MrX292 commented Mar 10, 2020

bnkai left a comment

Add image scraping support #370

Add image scraping support #370

Conversation

WithoutPants commented Feb 13, 2020

bnkai commented Feb 13, 2020

WithoutPants commented Mar 10, 2020

MrX292 commented Mar 10, 2020

bnkai left a comment

Choose a reason for hiding this comment