Skip to content
This repository has been archived by the owner on Dec 11, 2019. It is now read-only.

Add publishers media info request API #13115

Merged
merged 3 commits into from
Mar 14, 2018
Merged

Conversation

diracdeltas
Copy link
Member

@diracdeltas diracdeltas commented Feb 13, 2018

Needed for
brave-intl/bat-publisher#11 (review)
and #11889

Resolves #13114

https://github.com/brave-intl/bat-publisher/blob/8dc0b6d016e81fe777983017ed1eebc4ce048770/getMedia.js#L116-L168 can be refactored to use request.fetchPublisherInfo so that it doesn't need to use metascraper.

request.fetchPublisherInfo calls a callback whose input is the object:

{
  error: String, // error if the request was unsuccessful, null if it was successful
  title: String, // title of the requested page
  url: String, // URL of the final page after redirects
  image: String // URL of a representative image for the page
}

Submitter Checklist:

  • Submitted a ticket for my issue if one did not already exist.
  • Used Github auto-closing keywords in the commit message.
  • Added/updated tests for this change (for new code or code which already has tests).
  • Ran git rebase -i to squash commits (if needed).
  • Tagged reviewers and labelled the pull request as needed.
  • Request a security/privacy review as needed. (Ask a Brave employee to help if you cannot access this document.)

Test Plan:

Reviewer Checklist:

  • Request a security/privacy review as needed if one was not already requested.

Tests

  • Adequate test coverage exists to prevent regressions
  • Tests should be independent and work correctly when run individually or as a suite ref
  • New files have MPL2 license header

@mrose17
Copy link
Member

mrose17 commented Feb 13, 2018

@diracdeltas - i'm still reviewing a look at this; however, the overall approach looks great! it is certainly less work than i had anticipated!

however, isn't the functionality considerably broader than the name being used. instead of fetchPublisherInfo shouldn't it be something more like getBackgroundPageWebcontents and get-background-page-webcontents ???

@diracdeltas
Copy link
Member Author

however, isn't the functionality considerably broader than the name being used. instead of fetchPublisherInfo shouldn't it be something more like getBackgroundPageWebcontents and get-background-page-webcontents

Getting the background page webcontents is only needed so that the main process has a reference to the correct webcontents that the requests will go in. I assume ledger doesn't need generic access to the webcontents but only a few properties of the page (currently: whether there was an error in the request, the page image, the page title, and the page's final URL). Those properties are passed to the fetchPublisherInfo callback, not the entire webcontents.

@mrose17
Copy link
Member

mrose17 commented Feb 13, 2018

sadly, that's not a good assumption. the purpose of the webscraper is in fact to scrape a whole bunch of tags. i can put together a list i suppose.

@diracdeltas
Copy link
Member Author

unfortunately you can't pass the entire webContents or parsed HTML object over IPC without serializing. it'd be easy to add more properties though.

@mrose17
Copy link
Member

mrose17 commented Feb 13, 2018

that's what we'll do then!

@codecov-io
Copy link

codecov-io commented Feb 13, 2018

Codecov Report

Merging #13115 into master will increase coverage by 0.23%.
The diff coverage is 79.67%.

@@            Coverage Diff             @@
##           master   #13115      +/-   ##
==========================================
+ Coverage   56.25%   56.49%   +0.23%     
==========================================
  Files         283      284       +1     
  Lines       28353    28656     +303     
  Branches     4674     4734      +60     
==========================================
+ Hits        15950    16189     +239     
- Misses      12403    12467      +64
Flag Coverage Δ
#unittest 56.49% <79.67%> (+0.23%) ⬆️
Impacted Files Coverage Δ
app/browser/api/ledger.js 59.63% <62.22%> (-0.13%) ⬇️
js/lib/request.js 28.78% <66.66%> (+11.14%) ⬆️
...extensions/brave/content/scripts/requestHandler.js 83.6% <83.6%> (ø)

@diracdeltas diracdeltas self-assigned this Feb 13, 2018
@mrose17
Copy link
Member

mrose17 commented Feb 15, 2018

@diracdeltas - thanks for the update. it turns out that for twitch, we won't be using the scraper, but i want to get this integrated in regardless (for youtube and future others); however, i'm going to need to add a bunch more rules so it is as "clever" as metascraper (which is really clever!)

@NejcZdovc NejcZdovc added this to the 0.22.x (Developer Channel) milestone Feb 26, 2018
@NejcZdovc
Copy link
Contributor

PR blocked on brave-intl/bat-publisher#21

@NejcZdovc NejcZdovc force-pushed the feature/publisher-request-api branch 2 times, most recently from a4fd9e6 to cc19157 Compare February 26, 2018 17:35
@NejcZdovc
Copy link
Contributor

@yan can you please check my last commit

})
}

// https://github.com/microlinkhq/metascraper
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this approach looks good to me, but what's the process to update this file in the future to include upstream changes in metascraper? please add either a comment with manual steps to update this file or an automated script

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add the metascraper functions as a separate JS file in https://github.com/brave/browser-laptop/pull/13115/files#diff-a533e12744082c16911d52f54573e441R41 so it's easier to update automatically

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem is that they are using jquery library for selectors, and we use native html selectors

metascraper: $('meta[property="og:image:secure_url"]').attr('content')
we: html.querySelector('meta[property="og:image:secure_url"]').content

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move selectors to the different file, so that it will be easier to upgrade

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rules moved

@alexwykoff alexwykoff modified the milestones: 0.22.x (Developer Channel), Backlog (Prioritized) Feb 27, 2018
@alexwykoff alexwykoff added the priority/P3 Major loss of function. label Feb 27, 2018
@bsclifton bsclifton modified the milestones: Backlog (Prioritized), Completed work Feb 28, 2018
@NejcZdovc NejcZdovc modified the milestones: Completed work, 0.22.x (Developer Channel) Feb 28, 2018
@NejcZdovc NejcZdovc mentioned this pull request Mar 1, 2018
10 tasks
@NejcZdovc NejcZdovc force-pushed the feature/publisher-request-api branch 2 times, most recently from 25741fa to 37e6151 Compare March 2, 2018 15:01
@NejcZdovc
Copy link
Contributor

@diracdeltas can you please re-review?

* @param {boolean} options.binaryP - are we receiving binary payload back
* @param {string} options.rawP - are we receiving raw payload back
* @param {string} options.scrapeP - are we doping scraping
* @param {string} options.windowP - do we want to run this request in the window process
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks this is helpful. but are rawP, scrapeP, and windowP supposed to be boolean, not string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, my bad, copy-paste mistake

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @param {object} params - contains params from roundtrip
* @param {string} params.url - url of the site that we want to scrap
*/
const roundTripFromWindow = (params, callback) => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please document what the arguments to callback are supposed to be, either here or in roundtrip

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1436,6 +1484,11 @@ const roundtrip = (params, options, callback) => {
parts.pathname = parts.path
}

if (options.windowP) {
roundTripFromWindow({url: urlFormat(parts), verboseP: options.verboseP}, callback)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: params.verboseP is never used in roundTripFromWindow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some logging, so we use it now 😃

@diracdeltas
Copy link
Member Author

there is a unit test error in travis: https://travis-ci.org/brave/browser-laptop/jobs/353053977#L9779

@NejcZdovc
Copy link
Contributor

I see that we are using different call for travis, will adjust that values so that it will pass @diracdeltas

@NejcZdovc NejcZdovc force-pushed the feature/publisher-request-api branch from 2133473 to 6d62de8 Compare March 14, 2018 20:21
@NejcZdovc
Copy link
Contributor

@diracdeltas updated and ready for another review

mrose17
mrose17 previously approved these changes Mar 14, 2018
Copy link
Member

@mrose17 mrose17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this LGTM.

@diracdeltas
Copy link
Member Author

github won't let me approve since i opened this PR, but this lgtm :)

Copy link
Contributor

@NejcZdovc NejcZdovc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving it based on previous approvals from @mrose17 and @diracdeltas

@NejcZdovc NejcZdovc merged commit a95d7b0 into master Mar 14, 2018
NejcZdovc added a commit that referenced this pull request Mar 15, 2018
NejcZdovc added a commit that referenced this pull request Mar 15, 2018
@NejcZdovc
Copy link
Contributor

master a95d7b0
0.23 9802def
0.22 e62e1ec

@NejcZdovc NejcZdovc modified the milestones: 0.22.x (Beta Channel), 0.21.x w/ Chromium 65 (Release Channel) Mar 19, 2018
NejcZdovc added a commit that referenced this pull request Mar 19, 2018
@NejcZdovc
Copy link
Contributor

0.21 14741d4

@bsclifton bsclifton deleted the feature/publisher-request-api branch March 19, 2018 17:54
@bsclifton bsclifton modified the milestones: 0.21.x w/ Chromium 65 (Release Channel), 0.22.x (Beta Channel) Mar 19, 2018
bsclifton added a commit that referenced this pull request Mar 19, 2018
@bsclifton
Copy link
Member

bsclifton commented Mar 19, 2018

reverted from 0.21.x with 5593d21; milestone updated to 0.22.x

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants