Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use URL as page id? #43

Open
alranel opened this issue Feb 3, 2019 · 0 comments
Open

Use URL as page id? #43

alranel opened this issue Feb 3, 2019 · 0 comments

Comments

@alranel
Copy link
Contributor

alranel commented Feb 3, 2019

page.name is currently used for populating the id field:

indexer << page.data.merge({
id: page.name,
url: page.url,
text: nokogiri_doc.xpath("//article//text()").to_s.gsub(/\s+/, " ")
})

However, there's no guarantee that name contains a defined value, and Elasticsearch will complain if it doesn't.

Why don't we use page.url instead, which is unique enough for being used as an ID (definitely more than name) and is guaranteed to be always defined?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant