Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seo: monitor and fix broken links #746

Closed
jorgeorpinel opened this issue Oct 25, 2019 · 13 comments
Closed

seo: monitor and fix broken links #746

jorgeorpinel opened this issue Oct 25, 2019 · 13 comments
Labels
A: website Area: website help wanted Contributors especially welcome

Comments

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Oct 25, 2019

Use this script or similar manually to double check which links are broken in the docs: https://github.com/iterative/dvc.org/pull/690/files#diff-a5173e320dcf100fc3ff5b32ba2ea911

The last run (see #690 (comment)) reported the following problems:

static/docs/changelog/0.18.md: 'discuss.dvc.org'
static/docs/changelog/0.35.md: 'https://plugins.jetbrains.com/plugin/11368-data-version-control-dvc-support'
static/docs/command-reference/add.md: '/docs/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache'
static/docs/command-reference/checkout.md: '/docs/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache'
static/docs/command-reference/config.md: '/docs/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache'
static/docs/command-reference/config.md: 'https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html'
static/docs/command-reference/destroy.md: '/doc/user-guide/dvc-files-and-directories'
static/docs/command-reference/get-url.md: 'https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html'
static/docs/command-reference/remote/add.md: 'https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html'
static/docs/command-reference/remote/add.md: 'https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html'
static/docs/command-reference/remote/add.md: 'https://minio.io/'
static/docs/command-reference/remote/add.md: 'https://docs.microsoft.com/en-us/azure/storage/common/storage-create-storage-account'
static/docs/command-reference/remote/index.md: 'https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html'
static/docs/command-reference/remote/modify.md: 'https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html'
static/docs/command-reference/remote/modify.md: 'https://minio.io/'
static/docs/command-reference/update.md: 'https://github.com/iterative/example-get-started'
static/docs/get-started/add-files.md: '/docs/user-guide/large-dataset-optimization'
static/docs/get-started/experiments.md: '/docs/user-guide/large-dataset-optimization'
static/docs/get-started/index.md: '/chat'
static/docs/get-started/pipeline.md: '/doc/tutorial'
static/docs/tutorials/deep/define-ml-pipeline.md: 'https://data.dvc.org/tutorial/ver/data.zip'
static/docs/tutorials/deep/preparation.md: 'https://code.dvc.org/tutorial/nlp/code.zip'
static/docs/tutorials/pipelines.md: '/doc/tutorial'
static/docs/tutorials/versioning.md: '/chat'
static/docs/understanding-dvc/collaboration-issues.md: '<https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning'
static/docs/understanding-dvc/related-technologies.md: '/docs/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache'
static/docs/understanding-dvc/resources.md: 'https://www.kaggle.com/rtatman/kerneld4769833fe'
static/docs/use-cases/share-data-and-model-files.md: 'https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html'
static/docs/use-cases/share-data-and-model-files.md: 'https://docs.aws.amazon.com/cli/latest/reference/s3/mb.html'
static/docs/user-guide/contributing-docs.md: 'https://github.com/iterative/dvc.org/tree/master/src/Documentation/sidebar.json'
static/docs/user-guide/contributing-docs.md: 'https://github.com/iterative/dvc.org.git'
static/docs/user-guide/contributing-docs.md: 'https://nodejs.org/'
static/docs/user-guide/contributing-docs.md: 'https://marketplace.visualstudio.com/items?itemName=stkb.rewrap'
static/docs/user-guide/contributing-docs.md: 'https://raw.githubusercontent.com/iterative/dvc.org/master/static/docs/user-guide/contributing-doc.md'
static/docs/user-guide/contributing.md: 'https://github.com/iterative/dvc.git'
static/docs/user-guide/contributing.md: '/chat'
static/docs/user-guide/contributing.md: 'https://docs.aws.amazon.com/en_us/cli/latest/userguide/cli-chap-install.html'
static/docs/user-guide/contributing.md: 'https://cloud.google.com/sdk/docs/quickstarts'
static/docs/user-guide/contributing.md: 'https://github.com/ambv/black'
static/docs/user-guide/dvc-files-and-directories.md: '/docs/user-guide/large-dataset-optimization'
static/docs/user-guide/large-dataset-optimization.md: '/docs/user-guide/update-tracked-files'
static/docs/user-guide/plugins.md: 'https://plugins.jetbrains.com/plugin/11368-dvc-support-poc'
static/docs/user-guide/running-dvc-on-windows.md: '<https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2003/cc778996(v=ws.10'
static/docs/user-guide/update-tracked-files.md: '/docs/user-guide/large-dataset-optimization'

UPDATE: Scroll to #746 (comment) and below for latest pending work here.

@jorgeorpinel jorgeorpinel added good first issue Good for newcomers A: docs Area: user documentation (gatsby-theme-iterative) hacktoberfest labels Oct 25, 2019
@shcheklein
Copy link
Member

I would clarify that there are lot of false positives here. And we need only fix a very few that use redirects like dvc.org/docs/something -> dvc.org/doc/somethinf

@shcheklein shcheklein changed the title docs: fix broken links fix broken links Oct 25, 2019
@shcheklein shcheklein changed the title fix broken links fix broken links - oct 2019 Oct 25, 2019
@taylorlee1
Copy link
Contributor

Running script. Only seeing a few issues after turning on --max-redirect=10 and --method=GET. Lots of false positives w/ redirects=0 and method=HEAD.

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Oct 25, 2019

Thanks @taylorlee1 ! Yes, I'm not sure that script is completely flawless. Please use your criteria and fix the broken links you are able to find 🙂 You may open a PR with the fixes and say "Fix #746" in it's description. More info in https://dvc.org/doc/user-guide/contributing/docs

@shcheklein
Copy link
Member

@taylorlee1 some false positives on redirects=0 should be fixed. Mostly those which are redirects we keep for backward compatibility (docs -> doc), etc. They are not external links, they automatically transform one docs link into another.

@taylorlee1
Copy link
Contributor

static/docs/install/linux.md: 'https://docs.conda.io/en/latest/miniconda.htm'
static/docs/install/macos.md: 'https://docs.conda.io/en/latest/miniconda.htm'
static/docs/install/windows.md: 'https://docs.conda.io/en/latest/miniconda.htm'

All three errors can be fixed by using .html instead of .htm suffix.

@taylorlee1
Copy link
Contributor

I fixed the easy redirects (docs -> doc), but there are a few I am not sure about:

`KO 200 https://towardsdatascience.com/why-git-and-git-lfs-is-not-enough-to-solve-the-machine-learning-reproducibility-crisis-f733b49e96e8 https://towardsdatascience.com/why-git-and-git-lfs-is-not-enough-to-solve-the-machine-learning-reproducibility-crisis-f733b49e96e8?gi=48427509caa5
--> ['./static/docs/understanding-dvc/resources.md']

KO 200 https://dvc.org/chat https://discordapp.com/invite/dvwXA2N
--> ['./static/docs/get-started/index.md', './static/docs/user-guide/contributing/core.md', './static/docs/tutorials/versioning.md']

KO 200 https://towardsdatascience.com/how-to-use-data-version-control-dvc-in-a-machine-learning-project-a78245c0185 https://towardsdatascience.com/how-to-use-data-version-control-dvc-in-a-machine-learning-project-a78245c0185?gi=bcd31aa1f168
--> ['./static/docs/tutorials/community.md']

KO 200 https://www.python.org/dev/peps/pep-0008/? https://www.python.org/dev/peps/pep-0008/
--> ['./static/docs/user-guide/contributing/core.md']

KO 200 https://help.github.com/en/articles/resolving-a-merge-conflict-using-the-command-line https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/resolving-a-merge-conflict-using-the-command-line
--> ['./static/docs/tutorials/deep/reproducibility.md']

KO 200 https://git-scm.com https://git-scm.com/
--> ['./static/docs/tutorials/deep/preparation.md', './static/docs/tutorials/pipelines.md', './static/docs/tutorials/versioning.md']

KO 200 https://stackoverflow.com/a/42120328/761963 https://stackoverflow.com/questions/25925752/uninstall-packages-in-mac-os-x/42120328#42120328
--> ['./static/docs/install/macos.md']

ERROR 403 https://blogs.windows.com/windowsdeveloper/2016/03/30/run-bash-on-ubuntu-on-windows/ https://blogs.windows.com/windowsdeveloper/2016/03/30/run-bash-on-ubuntu-on-windows/
--> ['./static/docs/install/windows.md', './static/docs/user-guide/running-dvc-on-windows.md']

KO 200 https://nodejs.org/ https://nodejs.org/en/
--> ['./static/docs/user-guide/contributing/docs.md', './README.md']

KO 200 https://help.github.com/en/articles/fork-a-repo https://help.github.com/en/github/getting-started-with-github/fork-a-repo
--> ['./static/docs/user-guide/contributing/docs.md']

KO 200 https://github.com/iterative/dvc.org.git https://github.com/iterative/dvc.org
--> ['./static/docs/user-guide/contributing/docs.md']

KO 200 https://github.com/ambv/black https://github.com/psf/black
--> ['./static/docs/user-guide/contributing/core.md']

KO 200 https://codeclimate.com/github/iterative/dvc.org/maintainability https://codeclimate.com/github/iterative/dvc.org
--> ['./README.md']

ERROR 403 http://studio.ml/ http://www.studio.ml/
--> ['./static/docs/understanding-dvc/related-technologies.md']

KO 200 https://towardsdatascience.com/the-data-science-workflow-43859db0415 https://towardsdatascience.com/the-data-science-workflow-43859db0415?gi=1bfdb7f61eb7
--> ['./static/docs/understanding-dvc/resources.md']

KO 200 https://plugins.jetbrains.com/plugin/11368-dvc-support-poc https://plugins.jetbrains.com/plugin/11368-data-version-control-dvc-support
--> ['./static/docs/install/plugins.md']

KO 200 https://towardsdatascience.com/data-version-control-with-dvc-what-do-the-authors-have-to-say-3c3b10f27ee https://towardsdatascience.com/data-version-control-with-dvc-what-do-the-authors-have-to-say-3c3b10f27ee?gi=508f8e55e673
--> ['./static/docs/understanding-dvc/resources.md']

KO 200 https://github.com/iterative/dvc.git https://github.com/iterative/dvc
--> ['./static/docs/user-guide/contributing/core.md']

KO 200 https://stackoverflow.com https://stackoverflow.com/
--> ['./static/docs/tutorials/deep/preparation.md']

KO 200 https://github.com/iterative/dvc.org/tree/master/src/Documentation/sidebar.json https://github.com/iterative/dvc.org/blob/master/src/Documentation/sidebar.json
--> ['./static/docs/user-guide/contributing/docs.md']

KO 200 https://dvc.org/chat https://discordapp.com/invite/dvwXA2N
--> ['./README.md']
`

@shcheklein
Copy link
Member

My 2c on this:

keep redirects that just remove/add slash at the end like / - it would be never ending game fixing those, or those that add some ? after the redirect.

def fix redirects like this https://plugins.jetbrains.com/plugin/11368-dvc-support-poc - that look like owners of the site moved the page (probably it should be returning 301?)

https://dvc.org/chat https://discordapp.com/invite/dvwXA2N - these are specifically made so that we change the invite if it's needed.

@jorgeorpinel
Copy link
Contributor Author

  • We should def. keep https://dvc.org/chat (all occurrences) (actually just /chat) as mentioned by Ivan.
  • The ones that just add a query string e.g. ?gi=48427509caa5 (all the towardsdatascience.com ones) we can also ignore, also as mentioned by Ivan.

KO 200 https://www.python.org/dev/peps/pep-0008/? https://www.python.org/dev/peps/pep-0008/ ,
KO 200 https://stackoverflow.com https://stackoverflow.com/

  • No difference here. Leave them please.

KO 200 https://help.github.com/en/articles/resolving-a-merge-conflict-using-the-command-line https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/resolving-a-merge-conflict-using-the-command-line ,
KO 200 https://help.github.com/en/articles/fork-a-repo https://help.github.com/en/github/getting-started-with-github/fork-a-repo ,
KO 200 https://github.com/ambv/black https://github.com/psf/black ,
KO 200 https://codeclimate.com/github/iterative/dvc.org/maintainability https://codeclimate.com/github/iterative/dvc.org ,
KO 200 https://github.com/iterative/dvc.org/tree/master/src/Documentation/sidebar.json https://github.com/iterative/dvc.org/blob/master/src/Documentation/sidebar.json

Yes, please update since we know the content has moved.

KO 200 https://git-scm.com https://git-scm.com/

Yes, please add / to any base URL that is missing it. @shcheklein I don't think any redirect removes slashes from base URLs (with no path) like this one.

KO 200 https://stackoverflow.com/a/42120328/761963 ,
KO 200 https://nodejs.org/ https://nodejs.org/en/

Keep the short versions. Easier to read in docs. Plus the extra paths may change later but not the parts we have.

ERROR 403 https://blogs.windows.com/windowsdeveloper/2016/03/30/run-bash-on-ubuntu-on-windows/

I get a 200 OK. Leave it.

KO 200 https://github.com/iterative/dvc.org.git https://github.com/iterative/dvc.org ,
KO 200 https://github.com/iterative/dvc.org/tree/master/src/Documentation/sidebar.json https://github.com/iterative/dvc.org/blob/master/src/Documentation/sidebar.json
KO 200 https://github.com/iterative/dvc.git https://github.com/iterative/dvc

Please update.

ERROR 403 http://studio.ml/ http://www.studio.ml/

I get 301. But yes, please update.

KO 200 https://plugins.jetbrains.com/plugin/11368-dvc-support-poc https://plugins.jetbrains.com/plugin/11368-data-version-control-dvc-support

I get 302. Anyway, please just use https://plugins.jetbrains.com/plugin/11368 and let them redirect accordingly.

Thanks again @taylorlee1!

@jorgeorpinel jorgeorpinel changed the title fix broken links - oct 2019 fix broken links (Nov 2019) Nov 6, 2019
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Nov 6, 2019

Another issue I just detected is that we can't really check all the anchors of links for example [How to report a problem](/doc/user-guide/contributing/core#how-to-report-a-problem) (in https://dvc.org/doc/user-guide/contributing, related to #727 BTW) or even just (#anchor) links, not to mention external anchored ones like [Link](https://web.site/page#anchor)... Would this be a nightmare to test with a script? As in... Should we just avoid #anchored links in general (and remove them all)?

@shcheklein
Copy link
Member

Don't think we need to remove them. It's more or less fine to have some of them broken and update them from time to time. We can make a script that analyzes the content of the page to see if there is anchor there. SSR would be helpful in this case, but can be done w/o that as well.

@jorgeorpinel
Copy link
Contributor Author

It's more or less fine to have some of them broken

Agree. As long as the link before anchor exists. But finding broken ones may indicate that the original content has changed and so the link may no longer be relevant. (Unlikely)

SSR would be helpful in this case...

Only for internal links. Any external links to dynamic sites will also be hard to detect broken anchors for. (A crawler would throw false positives, which we could simply review manually when reported.)

Anyway, yeah not a huge deal, just something I realized and wanted to note here.

@jorgeorpinel jorgeorpinel changed the title fix broken links (Nov 2019) monitor and fix broken links Dec 29, 2019
@jorgeorpinel jorgeorpinel added A: website Area: website help wanted Contributors especially welcome and removed A: docs Area: user documentation (gatsby-theme-iterative) good first issue Good for newcomers labels Jan 19, 2020
@jorgeorpinel jorgeorpinel changed the title monitor and fix broken links seo: monitor and fix broken links Jan 20, 2020
@shcheklein
Copy link
Member

should be solved by @casperdcl 's fix and a few commits that removed/fixed broken links

@casperdcl
Copy link
Contributor

casperdcl commented Feb 5, 2020

My implementation (#958) was quick and dirty but probably does the job. Didn't actually fix the current broken links but will find any future ones and keep warning about the current ones

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: website Area: website help wanted Contributors especially welcome
Projects
None yet
Development

No branches or pull requests

4 participants