Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can we reduce 404's #153

Closed
iancrowther opened this issue Sep 14, 2015 · 7 comments
Closed

How can we reduce 404's #153

iancrowther opened this issue Sep 14, 2015 · 7 comments

Comments

@iancrowther
Copy link
Contributor

Moving to "new" code is great but is it leaving a trail of dead bookmarks and blog articles?

Example from http://apmblog.dynatrace.com/2015/04/09/node-js-is-hitting-the-big-time-in-enterprise-markets/

https://nodejs.org/static/video/

This is a 2015 article!

How can we best analyse and add redirects?

@rvagg
Copy link
Member

rvagg commented Sep 15, 2015

You can see the complete current list @ https://raw.githubusercontent.com/nodejs/build/master/setup/www/resources/config/nodejs.org, a lot of it was based on analysing the logs and narrowing down most of the active URLs being hit and deciding what to do with them with @fhemberger. There's no doubt that we ignored a bunch of much less used URLs but we assumed that we'd got most of them.

My plan was to review after a time to see what redirects were actually being used and try and trim the list over time. I guess we should also do a review of 404's to see what's coming up. I could do some log grepping and get back to you on that if you like, I still have the old site's contents that we could cross-reference with and figure out what's real and what's not.

@fhemberger
Copy link
Contributor

The video wasn't part of the repository, some files/directories were placed directly on the old server, so for some rare parts, we simply don't know what was linked where. Maybe we still can find the video and put it online again.

As @rvagg already said, we went through all the urls of the old website, mapping them to the new content were possible (except those bits of content which were removed, which now rightfully return a 404), then going through a list of over 4000 URLs returning non-200 responses collected over a month on the old server. After weeding out automatic scanning attempts for things like Wordpress configs or admin interfaces, we tried to find a way to map those to the current website as well.

We will continue to check the logs and add further redirects.

@rvagg
Copy link
Member

rvagg commented Sep 15, 2015

https://gist.github.com/rvagg/50a283b0a35657cdc0f7

top 404s for the site since the server move sorted by frequency, I've removed some of the obvious bogus ones and left off all the 1's and 2's which are pretty much all guaranteed to be bogus. If you take out ivy.xml and .jar$ then you reduce the list by ~3k down to ~5k.

Some creative grepping leaves with this list of first-round candidates:

  • 6813 /static/images/icons-interior.png
  • 6810 /static/images/platform-icons.png same as above
  • 5802 /static/images/twitter-bird.png
  • 3395 /static/images/footer-logo-alt.png

These are all mostly external referrers but also http://nodejs.org/dist/v0.10.32/docs/api/assets/style.css and other 0.10.x docs. I'm not inclined to "fix" them tbh but perhaps have other opinions?

  • 1866 /static/images/logo.png
  • 759 /static/images/footer-logo.png
  • 734 /static/images/ryan-speaker.jpg
  • 718 /static/images/forkme.png
  • 696 /static/images/home-icons.png
  • 695 /static/images/microsoft-logo.png
  • 693 /static/images/ebay-logo.png
  • 691 /static/images/yahoo-logo.png
  • 686 /static/images/linkedin-logo.png

These mostly come from the full website mirrors shipped as docs, e.g. http://nodejs.org/dist/v0.10.1/docs/ which I'm kind of annoyed about having to support. They are candidates for "legacy" but I'm not fussed, these mirrors are pointless.

  • 4738 /dist/staging - Doesn't exist any more, we do it differently now and I'm happy for a 404 here
  • 1114 /logo.png - I can't find any internal referrers so I don't see a reason to keep this, their fault for leaning on the site, www.nodeclipse.org is a primary offender here
  • 760 /browserconfig.xml - is this a thing? I don't see it on the old site
  • 454 /download/release/v4.0.0/doc/api/ - this is an odd one, we used /doc/ for io.js but switched to /docs/ in the last release and for v4.0.0 so I don't know what's going on with this, I don't see any referrers for the latest log file so ...?
  • 395 /static/images/roadshow-promo.png
  • 337 /api/assets/logo.svg
  • 297 /download/favicon.ico
  • 267 /tracking.js
  • 229 /main.js
  • 229 /api/fs.html>
  • 198 /industry/ - no internal referrers
  • 177 /api/all.json - this must be just for http and not https, not sure why it's not getting redirected or loaded, it should be so I probably need to fix
  • 168 /static/images/download-logo.png
  • 141 /dist/v0.12.4/node.pom
  • 139 /static/video/
  • 138 /xmlrpc.php
  • 131 /dist/.index/nexus-maven-repository-index.properties
  • 119 /static/images/joyent-footer.svg
  • 117 /static/images/not-invented-here.png
  • 117 /api/index.json - ditto for all.json

@iancrowther
Copy link
Contributor Author

@fhemberger would plugging these be a good candidate for next months Code and Learn?

Should we add a code-and-learn label to appropriate issues? I want to ensure attendees have:

  • Enough to do..
  • Enough information to execute

I want to add a milestone too so that issues does not go stale but I need to block book a venue in order to have concrete dates.

@mikeal
Copy link
Contributor

mikeal commented Oct 7, 2015

+1

@fhemberger
Copy link
Contributor

@iancrowther Sorry for the late reply, this issue just slipped through.

I don't think its fitting for the next Code and Learn. I'll just add those files from the old repo:

6813 /static/images/icons-interior.png
6810 /static/images/platform-icons.png same as above
5802 /static/images/twitter-bird.png
3395 /static/images/footer-logo-alt.png
1866 /static/images/logo.png
759 /static/images/footer-logo.png
734 /static/images/ryan-speaker.jpg
718 /static/images/forkme.png
696 /static/images/home-icons.png
695 /static/images/microsoft-logo.png
693 /static/images/ebay-logo.png
691 /static/images/yahoo-logo.png
686 /static/images/linkedin-logo.png

and add redirects for it and we're done. What's really bugging me is that we need to support those old docs, as @rvagg said. I'd like to have some kind of converter, which strips the surrounding page layout and only leaves the text content in place, so we can convert it into something similar to the current docs. But I think that task is way to big for a Code and Learn.

fhemberger added a commit that referenced this issue Oct 9, 2015
@fhemberger
Copy link
Contributor

Redirects are in place, closing this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants