Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Sitemaps - what should be included? #4588

Closed
jaswilli opened this issue Dec 5, 2014 · 13 comments
Closed

[Discussion] Sitemaps - what should be included? #4588

jaswilli opened this issue Dec 5, 2014 · 13 comments
Assignees
Labels
help wanted [triage] Ideal issues for contributors to help with server / core Issues relating to the server or core of Ghost

Comments

@jaswilli
Copy link
Contributor

jaswilli commented Dec 5, 2014

Tags

Currently all tags that exist in a blog are being shown in the sitemap. Does it make more sense to only show tags that are assigned to at least one published post? I'm thinking of the case where a tag has been created for a yet-to-be-published post and the author may not want the tag name to be public information.

(Static) Pages

Should all static pages be included by default? I know we've been telling people (unofficially) that static pages can be used as a "secret" share-link-to-preview feature. Now all static pages are part of the public sitemap.

Authors

Like tags, does it make sense to only show authors with at least one published post? Otherwise any and all users that have been created are going to show up in the sitemap.

Last and very minor--I don't think the index page should indicate that the sitemap has been generated by https://ghost.org/ by default. If we can't get the config.url into the XSL then there probably shouldn't be a link at all.

@JohnONolan
Copy link
Member

Does it make more sense to only show tags that are assigned to at least one published post

100% yes, totally agree

Should all static pages be included by default?

Yes. Pages are public, they should be public. Setting a post as a page is a hack/workaround for a lack of preview links. So we should have preview links.

Like tags, does it make sense to only show authors with at least one published post?

Yep. Agree again.

The link is standard (See Yoast's SEO plugin for WP), useful, and helpful to the Ghost project by passing an incredibly small amount of link juice to back to the homepage with no impact on any end user.

@jgable
Copy link
Contributor

jgable commented Dec 5, 2014

+1 everything John said.

@ErisDS
Copy link
Member

ErisDS commented Dec 5, 2014

Does it make more sense to only show tags that are assigned to at least one published post

100% yes, totally agree

For future proofing, it would probably be a good idea to also exclude tags which have hidden=true

@jaswilli
Copy link
Contributor Author

jaswilli commented Dec 5, 2014

Should all static pages be included by default?
Yes. Pages are public, they should be public. Setting a post as a page is a hack/workaround for a lack of preview links. So we should have preview links.

I tend to agree. I brought it up mostly as a notice that we should probably stop telling people to do it (at least without a disclaimer).

The link is standard (See Yoast's SEO plugin for WP), useful, and helpful to the Ghost project by passing an incredibly small amount of link juice to back to the homepage with no impact on any end user.

Fair enough. It's mainly the "generated by" that got me, because (in my case) it wasn't actually generated by the server pointed to by the URL. Like I said, minor and not a big deal at all.

@dannyvankooten
Copy link

Agreed on all 3 points. 👍 Just upgraded to 0.5.6 and noticed a lot of unused tags showing up in my sitemap.

@JohnONolan JohnONolan added this to the Current Backlog milestone Dec 9, 2014
@jaswilli jaswilli added the help wanted [triage] Ideal issues for contributors to help with label Dec 22, 2014
@ErisDS
Copy link
Member

ErisDS commented Jan 13, 2015

@dannyvankooten you'll be able to remove all those unused tags via tag management now that 0.5.8 is out.

As for fixing this up - I think it's pretty clear that these changes are required, but in order to do them we need to add filtering abilities to the API:

  • filter out tags with 0 posts
  • filter out authors with 0 posts

Tags do already have a concept of post_count, but there's no way to filter the API based on this property. Authors don't have a concept of post_count.

The main question I guess would be how to do filtering based on a post_count property. As a simple version we could use a negation syntax like: ?include=post_count&post_count=-0 to mean include post_count and don't return items where the post_count is 0.

Using - for negation comes from the idea of using gmail style filtering in other places. E.g. tags="fred, -bob" means has the tag fred and doesn't have the tag bob.

Another potential syntax is json-style, post_count={gt: 0} to mean include items where post_count is greater than 0. As well as gt we could support less than (lt), equals (eq) and not equals (neq).

I think both syntaxes have merit and slightly different purposes. We haven't implemented anything like this in the API yet, but there's likely to be need for both negation and comparison as we move forward.

Just throwing this out there to get some thoughts from other people.

@jaswilli
Copy link
Contributor Author

The other area that will need attention is the addition/removal of items into the sitemap indexes that occur in the create/update/delete event hooks in the models.

@letsjustfixit
Copy link
Contributor

I think the password protected posts's tags should be also left out :) [sorry reference removed]

@ErisDS
Copy link
Member

ErisDS commented Mar 6, 2015

@letsjustfixit I appreciate you're trying to be helpful, but this comment is confusing. #4993 already states that sitemaps and RSS should not be available on password protected blogs, and we're not intending to provide a feature to password protect individual posts.

@letsjustfixit
Copy link
Contributor

Note: If you have no static pages what so ever you still end up with

sitemap-pages.xml 1970-01-01 00:00

Running on Ghost 0.6.2

sitemap

@ErisDS ErisDS modified the milestones: Current Backlog, Next Backlog May 25, 2015
@garethbult
Copy link

I'm using 0.7, and I've deleted a page .. yet it's still appearing in the sitemap index and Google is reporting it as a 404 .. anyone know how I stop it from appearing in sitemap-pages.xml?

@ErisDS ErisDS modified the milestone: Next Backlog Oct 9, 2015
@kirrg001 kirrg001 added this to the 1.0.0 Beta Ready milestone Apr 3, 2017
@kirrg001 kirrg001 added the server / core Issues relating to the server or core of Ghost label Apr 3, 2017
@kirrg001 kirrg001 self-assigned this Apr 3, 2017
@kirrg001 kirrg001 removed this from the 1.0.0 Beta Ready milestone Apr 11, 2017
@kirrg001 kirrg001 removed their assignment Aug 4, 2017
@ErisDS ErisDS self-assigned this Oct 19, 2017
@ErisDS
Copy link
Member

ErisDS commented Oct 20, 2017

I'm working on a rewrite of our sitemap system at the moment, which splits the concept of keeping track of what URLs are in the system into a URL service, separate to the generation of the sitemap based on this information.

Future sitemaps will only deal with url events, will eventually be extensible, which is a major part of #5091.

Note to self to ensure we also refactor post.published events on import as part of this.

@ErisDS
Copy link
Member

ErisDS commented Oct 31, 2017

Closing this now in favour of #9192, will revisit these ideas when we have a better system in place for tracking URLs.

@ErisDS ErisDS closed this as completed Oct 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted [triage] Ideal issues for contributors to help with server / core Issues relating to the server or core of Ghost
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants