-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sitemap & Google Scholar #379
Conversation
app/views/items/show.html.erb
Outdated
@@ -1,5 +1,7 @@ | |||
<% page_title(@item.title) %> | |||
|
|||
<%= render partial: 'google_scholar_metadata' %> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These meta tags will be in the <body>
. Should probably inject them into the <head>
of the page instead?
Requires some additional work to make this happen. Look into content_for
and yield
if you need help figuring out how to do this.
7e30e59
to
88e13fc
Compare
|
||
def communities | ||
@communities = Community.all | ||
raise 'sitemap should contain less than 50,000 targets' if @communities.count > 50_000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we'll want to start serving Google a 500 page if/when any of these creeps above 50,000 without us noticing. Maybe replace raise
with logger.warn
? (actually even better would be to push a warning to Rollbar, but we don't have that set up yet. Maybe add a TODO: about that as well?)
@@ -134,6 +148,10 @@ def display_attribute_names | |||
def valid_visibilities | |||
super + [VISIBILITY_EMBARGO] | |||
end | |||
|
|||
def public | |||
Item.where(visibility: JupiterCore::VISIBILITY_PUBLIC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this concern is shared with Thesis, I don't think this quite works (Thesis.public
will return public Item
s, instead). self.where(visibility: JupiterCore::VISIBILITY_PUBLIC)
should do the appropriate thing in both classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also love to see this get exercised in the test suite. Probably just add another assertion to these two existing test cases: https://github.com/ualbertalib/jupiter/blob/master/test/models/thesis_test.rb#L5 and https://github.com/ualbertalib/jupiter/blob/master/test/models/item_test.rb#L5
<meta name="citation_issue" content=""/> | ||
<meta name="citation_firstpage" content=""/> | ||
<meta name="citation_lastpage" content=""/> | ||
--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually want to output this whole set of comments into the HTML, or do we want this comment not to render at all? (Use <%# %> is the latter).
"#{unlocked_fileset.original_file.checksum.value}\" \ | ||
length=\"#{unlocked_fileset.original_file.size}\" \ | ||
type=\"#{unlocked_fileset.original_file.mime_type}\"\ | ||
/>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than have to escape all of these newlines, this is probably a great use-case for a heredoc: https://infinum.co/the-capsized-eight/multiline-strings-ruby-2-3-0-the-squiggly-heredoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually more like a run-on sentence situation. There should be no newlines or extra spaces/tabs between the values. My understanding is that heredoc would maintain newlines. Is there a solution to make something like that more readable?
One other thing I'd love to see would be an added system test exercising the sitemap. But this looks really good other than the small things noted above. |
public/robots.txt
Outdated
@@ -1 +1,2 @@ | |||
# See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |||
Sitemap: https://era.library.ualberta.ca/sitemap.xml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming out of a discussion with Kenton this morning, it sounds like it would be useful to make this dynamic and serve
User-agent: *
Disallow: /
based on a config key in the Rails environment config files, so that we can tell crawlers to go away during the beta launch phase.
(Basically, set config.some_key_name in the various env files, and then do if Rails.configuration.some_key_name == true
here
should 'show url, last modified date, and content attributes' do | ||
#visit items_sitemap_path | ||
#assert_selector 'url', count: 4 | ||
#assert_selector 'loc', text: 'sitemap-items.xml' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/LeadingCommentSpace: Missing space after #. (https://github.com/bbatsov/ruby-style-guide#hash-space)
end | ||
should 'show url, last modified date, and content attributes' do | ||
#visit items_sitemap_path | ||
#assert_selector 'url', count: 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/LeadingCommentSpace: Missing space after #. (https://github.com/bbatsov/ruby-style-guide#hash-space)
assert_empty schema.validate(document) | ||
end | ||
should 'show url, last modified date, and content attributes' do | ||
#visit items_sitemap_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/LeadingCommentSpace: Missing space after #. (https://github.com/bbatsov/ruby-style-guide#hash-space)
|
||
should 'not show private items' do | ||
#assert_select 'url', count: 2 | ||
assert_select 'loc', {count: 0, text: item_url(@private_item)}, 'private items shant appear in the sitemap' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/SpaceInsideHashLiteralBraces: Space inside { missing. (https://github.com/bbatsov/ruby-style-guide#spaces-operators)
Layout/SpaceInsideHashLiteralBraces: Space inside } missing. (https://github.com/bbatsov/ruby-style-guide#spaces-operators)
end | ||
|
||
should 'not show private items' do | ||
#assert_select 'url', count: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/LeadingCommentSpace: Missing space after #. (https://github.com/bbatsov/ruby-style-guide#hash-space)
assert_select 'lastmod', @item.updated_at.to_s | ||
@item.file_sets.first.unlock_and_fetch_ldp_object do |uo| | ||
assert_select "rs|ln[href=?]", url_for(controller: :file_sets, | ||
action: :download, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/AlignHash: Align the elements of a hash literal if they span more than one line.
assert_select 'loc', item_url(@item) | ||
assert_select 'lastmod', @item.updated_at.to_s | ||
@item.file_sets.first.unlock_and_fetch_ldp_object do |uo| | ||
assert_select "rs|ln[href=?]", url_for(controller: :file_sets, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/StringLiterals: Prefer single-quoted strings when you don't need string interpolation or special symbols. (https://github.com/bbatsov/ruby-style-guide#consistent-string-literals)
assert_select 'lastmod' | ||
assert_select 'changefreq' | ||
assert_select 'priority' | ||
assert_select "rs|md[type=?]", 'text/html' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/StringLiterals: Prefer single-quoted strings when you don't need string interpolation or special symbols. (https://github.com/bbatsov/ruby-style-guide#consistent-string-literals)
publication_status: [CONTROLLED_VOCABULARIES[:publication_status].published], | ||
license: CONTROLLED_VOCABULARIES[:license].attribution_4_0_international, | ||
subject: ['Items']) | ||
.unlock_and_fetch_ldp_object do |uo| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/MultilineMethodCallIndentation: Align .unlock_and_fetch_ldp_object with .new_locked_ldp_object on line 39.
item_type: CONTROLLED_VOCABULARIES[:item_type].article, | ||
publication_status: [CONTROLLED_VOCABULARIES[:publication_status].published], | ||
license: CONTROLLED_VOCABULARIES[:license].attribution_4_0_international, | ||
subject: ['Items']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/AlignHash: Align the elements of a hash literal if they span more than one line.
one failing bc Item.public is returning private items.
config/environments/production.rb
Outdated
@@ -86,4 +86,7 @@ | |||
|
|||
# Which ActiveStorage service to use | |||
config.active_storage.service = (ENV['ACTIVE_STORAGE_SERVICE'] || :local).to_sym | |||
|
|||
# we can tell crawlers to go away during the beta launch phase | |||
config.allow_crawlers = ENV['RAILS_ALLOW_CRAWLERS'].present? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future reference:
Could have defined this in config/application.rb
once instead of doing this in every environment especially when they are the same.
test/controllers/resourcesync.xsd
Outdated
@@ -0,0 +1,147 @@ | |||
<?xml version="1.0" encoding="UTF-8"?> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this resourcesync stuff living in test/controllers
? Is it being used as a file? Maybe test/fixtures/file
a better place?
test/controllers/sitemap.xsd
Outdated
@@ -0,0 +1,114 @@ | |||
<?xml version="1.0" encoding="UTF-8"?> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for this one? Don't see anything reference these two files in your PR either (EDIT: I see sitemap.xsd being used but not resourcesync.xsd)?
get items_sitemap_url | ||
end | ||
should 'be valid sitemap xml' do | ||
schema = Nokogiri::XML::Schema(File.open(File.join(File.dirname(__FILE__), 'sitemap.xsd'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah your using it here. Yeah should look into file_fixtures and put them in test/fixtures/files
config/environments/uat.rb
Outdated
@@ -86,4 +86,5 @@ | |||
|
|||
# Which ActiveStorage service to use | |||
config.active_storage.service = (ENV['ACTIVE_STORAGE_SERVICE'] || :local).to_sym | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/EmptyLinesAroundBlockBody: Extra empty line detected at block body end. (https://github.com/bbatsov/ruby-style-guide#empty-lines-around-bodies)
config/environments/test.rb
Outdated
@@ -42,4 +42,5 @@ | |||
|
|||
# Which ActiveStorage service to use | |||
config.active_storage.service = (ENV['ACTIVE_STORAGE_SERVICE'] || :test).to_sym | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/EmptyLinesAroundBlockBody: Extra empty line detected at block body end. (https://github.com/bbatsov/ruby-style-guide#empty-lines-around-bodies)
config/environments/staging.rb
Outdated
@@ -83,4 +83,5 @@ | |||
|
|||
# Do not dump schema after migrations. | |||
config.active_record.dump_schema_after_migration = false | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/EmptyLinesAroundBlockBody: Extra empty line detected at block body end. (https://github.com/bbatsov/ruby-style-guide#empty-lines-around-bodies)
config/environments/production.rb
Outdated
@@ -86,4 +86,5 @@ | |||
|
|||
# Which ActiveStorage service to use | |||
config.active_storage.service = (ENV['ACTIVE_STORAGE_SERVICE'] || :local).to_sym | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/EmptyLinesAroundBlockBody: Extra empty line detected at block body end. (https://github.com/bbatsov/ruby-style-guide#empty-lines-around-bodies)
config/environments/development.rb
Outdated
@@ -54,4 +54,5 @@ | |||
|
|||
# Which ActiveStorage service to use | |||
config.active_storage.service = (ENV['ACTIVE_STORAGE_SERVICE'] || :local).to_sym | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/EmptyLinesAroundBlockBody: Extra empty line detected at block body end. (https://github.com/bbatsov/ruby-style-guide#empty-lines-around-bodies)
get items_sitemap_url | ||
end | ||
should 'be valid sitemap xml' do | ||
schema = Nokogiri::XML::Schema(File.open(Rails.root + 'test/fixtures/files/sitemap.xsd')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nokogiri isn't defined in our Gemfile (might be getting this from another gem, but regardless we should add it in case this other gem removes it down the road and now our test suite fails).
@@ -58,7 +58,7 @@ def before_all | |||
get sitemapindex_url | |||
end | |||
should 'be valid sitemapindex xml' do | |||
schema = Nokogiri::XML::Schema(File.open(File.join(File.dirname(__FILE__), 'siteindex.xsd'))) | |||
schema = Nokogiri::XML::Schema(File.open(Rails.root + 'test/fixtures/files/siteindex.xsd')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can get even easier here. You should be able to just call:
file_fixture('siteindex.xsd')
http://api.rubyonrails.org/v5.1/classes/ActiveSupport/Testing/FileFixtures.html
98f94f6
to
d619391
Compare
end | ||
should 'show location and last modified' do | ||
assert_select 'loc', community_collection_url(@collection.community, @collection) | ||
assert_select 'lastmod', @collection.updated_at.to_s\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why the \
at the end of updated_at.to_s
? Happens a couple times in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo?! Copy-paste remnant.
config/routes.rb
Outdated
get 'sitemap-theses.xml', to: 'sitemap#theses', defaults: { format: :xml } | ||
|
||
# Dynamic robots.txt | ||
get 'robots.:format' => 'robots#robots' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just limit it to robots.txt
here instead of allowing any format extension? I assume it doesn't matter since in the controller action for this it only has a responds_to :text
anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, doesn't matter but the only thing that makes sense is robots.txt -- I'll change it.
@@ -0,0 +1,147 @@ | |||
<?xml version="1.0" encoding="UTF-8"?> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last comment I have on this PR, this file is never used anywhere? Is this something being kept around for future? Or If it's not needed just remove it?
Maybe I am wrong, looks like sitemap.xsd
imports it maybe? From looks of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is being used. It's included by the sitemap.xsd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaces #340 Closes #9