Skip to content

Commit

Permalink
Add selector limit configuration
Browse files Browse the repository at this point in the history
This allows developer to limit how many nodes get processed.
  • Loading branch information
nitriques committed Jan 27, 2019
1 parent 2d91980 commit 40a663f
Show file tree
Hide file tree
Showing 4 changed files with 53 additions and 6 deletions.
27 changes: 22 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,26 +161,43 @@ If you do not need auth, you still need to specify an empty String.

An object containing CSS selectors in order to find the content in the pages html.

#### selectors.title: String
#### selectors.title: String|Selector

CSS selector for the title of the page.

#### selectors.description: String
#### selectors.description: String|Selector

CSS selector for the description of the page.

#### selectors.image: String
#### selectors.image: String|Selector

CSS selector for the image of the page.

#### selectors.text: String
#### selectors.text: String|Selector

CSS selector for the title of the page.

#### selectors[key]: String
#### selectors[key]: String|Selector

CSS selector for the "key" property. You can add custom keys as you wish.

#### Selector Object

Selectors can also be defined using the long form (i.e. as an object),
which allow specifying custom properties on it.

##### selectors[key].attributes: String|Array<String>

Name of the attributes to look for values. Default is ['content', 'value'].

##### selectors[key].selector: String

The actual CSS selector to use.

##### selectors[key].limit: Number

The maximum number of nodes to check.

### exclusions: Object

An object containing CSS selectors to find elements that must not be indexed.
Expand Down
1 change: 1 addition & 0 deletions app.js
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ config.selectors = _.map(config.selectors, (selector, key) => {
key,
attributes: selector.attributes,
selector: selector.selector,
limit: selector.limit,
exclude: config.exclusions && config.exclusions[key]
};
});
Expand Down
7 changes: 6 additions & 1 deletion lib/process.js
Original file line number Diff line number Diff line change
Expand Up @@ -94,10 +94,15 @@ const parse = (record, data, config) => {
if (record[key] === undefined) {
record[key] = [];
// Fetch all and filter exclusions
const nodes = $(selector.selector).filter((i, node) => {
let nodes = $(selector.selector).filter((i, node) => {
return !selector.exclude || $(node).closest(selector.exclude).length === 0;
});

// Check for limit
if (_.isNumber(selector.limit) && nodes.length > selector.limit) {
nodes = nodes.slice(0, selector.limit);
}

// Populate the record
_.each(nodes, (node) => recursiveFindValue(node, record[key], selector.attributes));

Expand Down
24 changes: 24 additions & 0 deletions test/parse.js
Original file line number Diff line number Diff line change
Expand Up @@ -102,3 +102,27 @@ test('Simple parse no spaces', (t) => {
t.equal(rec.title, 'test-test-test-test');
t.end();
});

test('Parse with limit', (t) => {
const rec = {
date: now,
timestamp: now.getTime()
};
const c = _.cloneDeep(config);
const data = `<html><head>
<title>1</title>
<title>2</title>
<title>3</title>
<title>4</title>
<title>5</title>
<head></html>`;
parse(rec, data, c);
t.equal(rec.title.length, 5);
delete rec.title;
c.selectors[0].limit = 3;
parse(rec, data, c);
t.equal(rec.date, now);
t.equal(rec.timestamp, now.getTime());
t.equal(rec.title.length, 3);
t.end();
});

0 comments on commit 40a663f

Please sign in to comment.