Add publishers media info request API #13115

diracdeltas · 2018-02-13T02:29:13Z

Needed for
brave-intl/bat-publisher#11 (review)
and #11889

Resolves #13114

https://github.com/brave-intl/bat-publisher/blob/8dc0b6d016e81fe777983017ed1eebc4ce048770/getMedia.js#L116-L168 can be refactored to use request.fetchPublisherInfo so that it doesn't need to use metascraper.

request.fetchPublisherInfo calls a callback whose input is the object:

{
  error: String, // error if the request was unsuccessful, null if it was successful
  title: String, // title of the requested page
  url: String, // URL of the final page after redirects
  image: String // URL of a representative image for the page
}

Submitter Checklist:

Submitted a ticket for my issue if one did not already exist.
Used Github auto-closing keywords in the commit message.
Added/updated tests for this change (for new code or code which already has tests).
Ran git rebase -i to squash commits (if needed).
Tagged reviewers and labelled the pull request as needed.
Request a security/privacy review as needed. (Ask a Brave employee to help if you cannot access this document.)

Test Plan:

Reviewer Checklist:

Request a security/privacy review as needed if one was not already requested.

Tests

Adequate test coverage exists to prevent regressions
Tests should be independent and work correctly when run individually or as a suite ref
New files have MPL2 license header

mrose17 · 2018-02-13T11:38:15Z

@diracdeltas - i'm still reviewing a look at this; however, the overall approach looks great! it is certainly less work than i had anticipated!

however, isn't the functionality considerably broader than the name being used. instead of fetchPublisherInfo shouldn't it be something more like getBackgroundPageWebcontents and get-background-page-webcontents ???

diracdeltas · 2018-02-13T16:22:20Z

however, isn't the functionality considerably broader than the name being used. instead of fetchPublisherInfo shouldn't it be something more like getBackgroundPageWebcontents and get-background-page-webcontents

Getting the background page webcontents is only needed so that the main process has a reference to the correct webcontents that the requests will go in. I assume ledger doesn't need generic access to the webcontents but only a few properties of the page (currently: whether there was an error in the request, the page image, the page title, and the page's final URL). Those properties are passed to the fetchPublisherInfo callback, not the entire webcontents.

mrose17 · 2018-02-13T16:24:18Z

sadly, that's not a good assumption. the purpose of the webscraper is in fact to scrape a whole bunch of tags. i can put together a list i suppose.

diracdeltas · 2018-02-13T16:28:44Z

unfortunately you can't pass the entire webContents or parsed HTML object over IPC without serializing. it'd be easy to add more properties though.

mrose17 · 2018-02-13T16:30:23Z

that's what we'll do then!

codecov-io · 2018-02-13T16:39:44Z

Codecov Report

Merging #13115 into master will increase coverage by 0.23%.
The diff coverage is 79.67%.

@@            Coverage Diff             @@
##           master   #13115      +/-   ##
==========================================
+ Coverage   56.25%   56.49%   +0.23%     
==========================================
  Files         283      284       +1     
  Lines       28353    28656     +303     
  Branches     4674     4734      +60     
==========================================
+ Hits        15950    16189     +239     
- Misses      12403    12467      +64

Flag	Coverage Δ
#unittest	`56.49% <79.67%> (+0.23%)`	⬆️

Impacted Files	Coverage Δ
app/browser/api/ledger.js	`59.63% <62.22%> (-0.13%)`	⬇️
js/lib/request.js	`28.78% <66.66%> (+11.14%)`	⬆️
...extensions/brave/content/scripts/requestHandler.js	`83.6% <83.6%> (ø)`

mrose17 · 2018-02-15T16:54:18Z

@diracdeltas - thanks for the update. it turns out that for twitch, we won't be using the scraper, but i want to get this integrated in regardless (for youtube and future others); however, i'm going to need to add a bunch more rules so it is as "clever" as metascraper (which is really clever!)

NejcZdovc · 2018-02-26T13:37:12Z

PR blocked on brave-intl/bat-publisher#21

NejcZdovc · 2018-02-26T17:38:09Z

@yan can you please check my last commit

diracdeltas · 2018-02-26T17:40:32Z

app/extensions/brave/content/scripts/requestHandler.js

+  })
+}
+
+// https://github.com/microlinkhq/metascraper


this approach looks good to me, but what's the process to update this file in the future to include upstream changes in metascraper? please add either a comment with manual steps to update this file or an automated script

you can add the metascraper functions as a separate JS file in https://github.com/brave/browser-laptop/pull/13115/files#diff-a533e12744082c16911d52f54573e441R41 so it's easier to update automatically

problem is that they are using jquery library for selectors, and we use native html selectors

metascraper: $('meta[property="og:image:secure_url"]').attr('content')
we: html.querySelector('meta[property="og:image:secure_url"]').content

I can move selectors to the different file, so that it will be easier to upgrade

rules moved

NejcZdovc · 2018-03-13T21:09:07Z

@diracdeltas can you please re-review?

diracdeltas · 2018-03-13T23:47:16Z

app/browser/api/ledger.js

+ * @param {boolean} options.binaryP - are we receiving binary payload back
+ * @param {string} options.rawP - are we receiving raw payload back
+ * @param {string} options.scrapeP - are we doping scraping
+ * @param {string} options.windowP - do we want to run this request in the window process


thanks this is helpful. but are rawP, scrapeP, and windowP supposed to be boolean, not string?

oh yeah, my bad, copy-paste mistake

diracdeltas · 2018-03-13T23:50:04Z

app/browser/api/ledger.js

+ * @param {object} params - contains params from roundtrip
+ * @param {string} params.url - url of the site that we want to scrap
+ */
+const roundTripFromWindow = (params, callback) => {


please document what the arguments to callback are supposed to be, either here or in roundtrip

diracdeltas · 2018-03-13T23:51:12Z

app/browser/api/ledger.js

@@ -1436,6 +1484,11 @@ const roundtrip = (params, options, callback) => {
    parts.pathname = parts.path
  }

+  if (options.windowP) {
+    roundTripFromWindow({url: urlFormat(parts), verboseP: options.verboseP}, callback)


minor: params.verboseP is never used in roundTripFromWindow

added some logging, so we use it now 😃

diracdeltas · 2018-03-13T23:54:49Z

there is a unit test error in travis: https://travis-ci.org/brave/browser-laptop/jobs/353053977#L9779

NejcZdovc · 2018-03-13T23:59:46Z

I see that we are using different call for travis, will adjust that values so that it will pass @diracdeltas

NejcZdovc · 2018-03-14T20:21:41Z

@diracdeltas updated and ready for another review

mrose17

this LGTM.

diracdeltas · 2018-03-14T22:47:33Z

github won't let me approve since i opened this PR, but this lgtm :)

Needed for brave-intl/bat-publisher#11 (review) and #13114

NejcZdovc

Approving it based on previous approvals from @mrose17 and @diracdeltas

Add publishers media info request API

NejcZdovc · 2018-03-15T00:09:16Z

master a95d7b0
0.23 9802def
0.22 e62e1ec

Add publishers media info request API

NejcZdovc · 2018-03-19T17:42:15Z

0.21 14741d4

…t-api" This reverts commit 14741d4.

bsclifton · 2018-03-19T19:12:23Z

reverted from 0.21.x with 5593d21; milestone updated to 0.22.x

diracdeltas requested review from mrose17 and NejcZdovc February 13, 2018 02:29

diracdeltas self-assigned this Feb 13, 2018

NejcZdovc added this to the 0.22.x (Developer Channel) milestone Feb 26, 2018

NejcZdovc added the feature/rewards label Feb 26, 2018

NejcZdovc force-pushed the feature/publisher-request-api branch from 7e45c2b to 7706b50 Compare February 26, 2018 13:22

NejcZdovc force-pushed the feature/publisher-request-api branch 2 times, most recently from a4fd9e6 to cc19157 Compare February 26, 2018 17:35

diracdeltas commented Feb 26, 2018

View reviewed changes

NejcZdovc added the PR/needs-QA-attention ☕ label Feb 27, 2018

alexwykoff modified the milestones: 0.22.x (Developer Channel), Backlog (Prioritized) Feb 27, 2018

alexwykoff added the priority/P3 Major loss of function. label Feb 27, 2018

bsclifton modified the milestones: Backlog (Prioritized), Completed work Feb 28, 2018

NejcZdovc modified the milestones: Completed work, 0.22.x (Developer Channel) Feb 28, 2018

NejcZdovc mentioned this pull request Mar 1, 2018

Fixes null error in bat-publisher #13347

Merged

10 tasks

NejcZdovc added the PR/work-in-progress ⚒ label Mar 1, 2018

NejcZdovc force-pushed the feature/publisher-request-api branch 2 times, most recently from 25741fa to 37e6151 Compare March 2, 2018 15:01

diracdeltas commented Mar 13, 2018

View reviewed changes

NejcZdovc force-pushed the feature/publisher-request-api branch from 2133473 to 6d62de8 Compare March 14, 2018 20:21

mrose17 previously approved these changes Mar 14, 2018

View reviewed changes

diracdeltas and others added 3 commits March 14, 2018 16:07

Add publishers media info request API

8bb9c4d

Needed for brave-intl/bat-publisher#11 (review) and #13114

Resolve absolute image URLs in request.fetchPublisherInfo

d4ece47

Adds meta scraper like implementation

6138059

NejcZdovc dismissed mrose17’s stale review via 6138059 March 14, 2018 23:11

NejcZdovc force-pushed the feature/publisher-request-api branch from 6d62de8 to 6138059 Compare March 14, 2018 23:11

NejcZdovc approved these changes Mar 14, 2018

View reviewed changes

NejcZdovc merged commit a95d7b0 into master Mar 14, 2018

NejcZdovc added a commit that referenced this pull request Mar 15, 2018

Merge pull request #13115 from brave/feature/publisher-request-api

9802def

Add publishers media info request API

NejcZdovc added a commit that referenced this pull request Mar 15, 2018

Merge pull request #13115 from brave/feature/publisher-request-api

e62e1ec

Add publishers media info request API

NejcZdovc modified the milestones: 0.22.x (Beta Channel), 0.21.x w/ Chromium 65 (Release Channel) Mar 19, 2018

NejcZdovc added a commit that referenced this pull request Mar 19, 2018

Merge pull request #13115 from brave/feature/publisher-request-api

14741d4

Add publishers media info request API

bsclifton deleted the feature/publisher-request-api branch March 19, 2018 17:54

bsclifton modified the milestones: 0.21.x w/ Chromium 65 (Release Channel), 0.22.x (Beta Channel) Mar 19, 2018

bsclifton added a commit that referenced this pull request Mar 19, 2018

Revert "Merge pull request #13115 from brave/feature/publisher-reques…

5593d21

…t-api" This reverts commit 14741d4.

bsclifton mentioned this pull request Mar 19, 2018

Translations broken on beta channel #13499

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add publishers media info request API #13115

Add publishers media info request API #13115

diracdeltas commented Feb 13, 2018 •

edited by NejcZdovc

Loading

mrose17 commented Feb 13, 2018

diracdeltas commented Feb 13, 2018

mrose17 commented Feb 13, 2018

diracdeltas commented Feb 13, 2018

mrose17 commented Feb 13, 2018

codecov-io commented Feb 13, 2018 •

edited

Loading

mrose17 commented Feb 15, 2018

NejcZdovc commented Feb 26, 2018

NejcZdovc commented Feb 26, 2018

diracdeltas Feb 26, 2018

diracdeltas Feb 26, 2018

NejcZdovc Feb 26, 2018

NejcZdovc Feb 26, 2018

NejcZdovc Mar 12, 2018

NejcZdovc commented Mar 13, 2018

diracdeltas Mar 13, 2018

NejcZdovc Mar 13, 2018

NejcZdovc Mar 13, 2018

diracdeltas Mar 13, 2018

NejcZdovc Mar 14, 2018

diracdeltas Mar 13, 2018

NejcZdovc Mar 14, 2018

diracdeltas commented Mar 13, 2018

NejcZdovc commented Mar 13, 2018

NejcZdovc commented Mar 14, 2018

mrose17 left a comment

diracdeltas commented Mar 14, 2018

NejcZdovc left a comment

NejcZdovc commented Mar 15, 2018

NejcZdovc commented Mar 19, 2018

bsclifton commented Mar 19, 2018 •

edited

Loading

Add publishers media info request API #13115

Add publishers media info request API #13115

Conversation

diracdeltas commented Feb 13, 2018 • edited by NejcZdovc Loading

Submitter Checklist:

Test Plan:

Reviewer Checklist:

mrose17 commented Feb 13, 2018

diracdeltas commented Feb 13, 2018

mrose17 commented Feb 13, 2018

diracdeltas commented Feb 13, 2018

mrose17 commented Feb 13, 2018

codecov-io commented Feb 13, 2018 • edited Loading

Codecov Report

mrose17 commented Feb 15, 2018

NejcZdovc commented Feb 26, 2018

NejcZdovc commented Feb 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NejcZdovc commented Mar 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

diracdeltas commented Mar 13, 2018

NejcZdovc commented Mar 13, 2018

NejcZdovc commented Mar 14, 2018

mrose17 left a comment

Choose a reason for hiding this comment

diracdeltas commented Mar 14, 2018

NejcZdovc left a comment

Choose a reason for hiding this comment

NejcZdovc commented Mar 15, 2018

NejcZdovc commented Mar 19, 2018

bsclifton commented Mar 19, 2018 • edited Loading

diracdeltas commented Feb 13, 2018 •

edited by NejcZdovc

Loading

codecov-io commented Feb 13, 2018 •

edited

Loading

bsclifton commented Mar 19, 2018 •

edited

Loading