Skip to content

Conversation

@casperdcl
Copy link
Contributor

@casperdcl casperdcl commented Jan 29, 2020

  • script to search for and check for problematic (dead) links
    • check absolute links
    • check relative markdown links
    • define exclusions list
    • deal with links that have brackets ()
    • should warnings (status codes 0xx, 3xx etc) be counted as errors instead (see test: check links #958 (comment))?
    • concurrent for speed
    • hooks only for diffs for speed
  • python pre-commit hooks
    • config.yml (for local repo)
      • maybe remove in favour of husky (see below)?
    • hooks.yml (for other repos to use)
      • document developer usage of pre-commit hooks?
  • yarn tests
    • add yarn.scripts.link-check
    • add yarn.scripts.link-check-diff
    • add husky.hooks.pre-commit
      • remove due to potential windows incompatibility (see below)
    • include .github/ in allthis yarn.scripts checks
  • CI
    • only for the diff (otherwise takes ~60s to run)
    • add full (60s) run in a daily cron job?
      • run: yarn link-check
  • rename link-check everywhere for consistency
  • do something for Windows users
    • remove from pre-commit hooks (leave as CI-only)?
    • add .huskyrc.js checking for OS compatibility?
    • convert bash to py?
    • convert bash to js?
  • open a different issue to fix current link errors (fix broken links #974)
  • fixes ci: test to check all links #652
  • rebased from and closes Script to check links on .md files #690 (which GitHub won't let me re-open)

@casperdcl casperdcl added type: enhancement Something is not clear, small updates, improvement suggestions A: docs Area: user documentation (gatsby-theme-iterative) labels Jan 29, 2020
@casperdcl casperdcl self-assigned this Jan 29, 2020
@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs January 29, 2020 22:52 Inactive
@casperdcl
Copy link
Contributor Author

casperdcl commented Jan 29, 2020

@shcheklein I presume sorting out the current errors is for a different PR? See the output below:

dvc.org (check-links)$ pre-commit run --all-files | grep -v OK
Dead URL Checker.........................................................Failed
- hook id: dead-url
- exit code: 3
public/static/docs/understanding-dvc/related-technologies.md:
 WARNING:301:http://studio.ml/
public/static/docs/use-cases/data-registries.md:
 ERROR:406:http://millionsongdataset.com/pages/getting-dataset/#subset
public/static/docs/user-guide/contributing/docs.md:
 WARNING:000:http://localhost:3000/

 ERROR:404:https://marketplace.visualstudio.com/items?itemName=stkb.rewrap

 WARNING:000:https://nexyjs.org/
public/static/docs/command-reference/pipeline/show.md:
 ERROR:404:https://en.wikipedia.org/wiki/Less_(Unix

 ERROR:404:https://en.wikipedia.org/wiki/More_(command
ERROR:3 failures
---
public/static/docs/user-guide/running-dvc-on-windows.md:
 ERROR:404:https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2003/cc778996(v=ws.10

 ERROR:404:https://en.wikipedia.org/wiki/Less_(Unix
ERROR:1 failures
---
public/static/docs/install/macos.md:
 WARNING:000:https://support.apple.com/en-us/HT207700
public/static/docs/command-reference/remote/index.md:
 WARNING:000:https://object-storage.example.com
README.md:
 ERROR:404:https://circleci.com/gh/iterative/dvc.org
public/static/docs/user-guide/privacy.md:
 ERROR:400:https://accounts.google.com/o/oauth2/auth
ERROR:2 failures
---
public/static/docs/command-reference/remote/modify.md:
 WARNING:000:https://myendpoint.com

 WARNING:000:https://object-storage.example.com
public/static/docs/command-reference/diff.md:
 ERROR:404:https://remote.dvc.org/get-started
ERROR:1 failures
---
public/static/docs/understanding-dvc/collaboration-issues.md:
 ERROR:404:https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning
public/static/docs/command-reference/remote/add.md:
 ERROR:404:https://drive.google.com/drive/folders/0AIac4JZqHhKmUk9PDA

 WARNING:000:https://object-storage.example.com
ERROR:2 failures
---
public/static/docs/user-guide/contributing/core.md:
 WARNING:000:http://127.0.0.1:10000/devstoreaccount1;
public/static/docs/understanding-dvc/resources.md:
 ERROR:404:https://www.kaggle.com/rtatman/kerneld4769833fe
ERROR:1 failures
---

@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs January 29, 2020 23:25 Inactive
@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs January 29, 2020 23:53 Inactive
@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs January 30, 2020 00:21 Inactive
@casperdcl casperdcl changed the title Check links test: check links Jan 30, 2020
@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs January 30, 2020 00:43 Inactive
@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs January 30, 2020 00:52 Inactive
@shcheklein
Copy link
Contributor

in term of the pre-commit hooks - we already use husky (?) and something is being installed with yarn. So we would probably need to plug into it.

@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs January 30, 2020 01:02 Inactive
@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs January 30, 2020 01:14 Inactive
@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs January 30, 2020 01:16 Inactive
@casperdcl
Copy link
Contributor Author

@shcheklein I don't quite follow - yarn is being used to run scripts https://github.com/iterative/dvc.org/blob/92421404b05e73e9b3e130299495b4a703ffe080/package.json#L6-L16

While we could add check-links there I don't think we should use it as it would check all files (takes about a minute).

For checking patches, I thought we should use whatever the main repo does (currently pre-commit) surely?

@shcheklein
Copy link
Contributor

@casperdcl when you just run yarn for the first time it install the dependencies and the pre-commit hook (which runs linters, prettier, etc, etc).

@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs January 30, 2020 01:33 Inactive
Copy link
Contributor

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! Please check a few comments. Especially, not clear how does it integrate with the existing hooks.

@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs January 30, 2020 01:45 Inactive
@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs January 30, 2020 01:50 Inactive
@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs January 30, 2020 01:54 Inactive
@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs January 30, 2020 02:11 Inactive
@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs January 30, 2020 02:15 Inactive
@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs February 2, 2020 02:15 Inactive
Copy link
Contributor

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! let me know when we are ready to merge it

@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs February 2, 2020 02:22 Inactive
@casperdcl
Copy link
Contributor Author

casperdcl commented Feb 2, 2020

@shcheklein there's just

should warnings (status codes 0xx, 3xx etc) be counted as errors instead (see #958 (comment))?

It's probably not that relevant since the 000/301 (as well as 404) etc should be resolved in a different issue later. So yes, good to merge.

Copy link
Contributor Author

@casperdcl casperdcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to merge now

  • complete overhaul of circleci included in order to use daily cron jobs
  • .pre-commit-hooks.yaml included for others to use but not actually used by us (yet) due to potential lack of windows support
  • yarn/husky pre-commit hook also not used for same reason
  • problematic links to be fixed in different issue/PR

@shcheklein shcheklein temporarily deployed to dvc-landing-check-links-zvupbs February 2, 2020 02:42 Inactive
@shcheklein shcheklein merged commit 22a4731 into master Feb 2, 2020
@shcheklein
Copy link
Contributor

thanks, @casperdcl !

Comment on lines +15 to +16
https://github.com/iterative/dvc.org/blob/master/public$
https://github.com/iterative/dvc/releases/download/$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes my regexes match only up to the literal $. It seems a good compromise between blacklisting false negatives and programming false negatives detection.

@jorgeorpinel
Copy link
Contributor

Very neat! Should we briefly document how this thing works somewhere? Is there a follow up PR or issues on the few unchecked boxes from the description above? Thanks

@shcheklein
Copy link
Contributor

I think decisions were made, links were fixed already (or added to exclusions) ... not sure we need to document it tbh. Quite internal stuff, similar to redirects. May be some message to the script can be added that explains how to change the files if needed. Again not a top priority.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Feb 4, 2020

I have one more Q @shcheklein I see you added some excluded URLs in ba42034...32f63df#diff-01c133d99b04d1390925d3d06f3bbe77 but they are valid. Specifically:

http://millionsongdataset.com/pages/getting-dataset/#subset
https://circleci.com/gh/iterative/dvc.org
https://marketplace.visualstudio.com/items?itemName=stkb.rewrap
https://www.kaggle.com/rtatman/kerneld4769833fe

Why?

First one probably because of non secure HTTP?


In general I was just a bit worried that it would be tricky to maintain this list manually. Remembering to add or remove sample links so they're not checked. But I guess if you don't add it the CI check will fail, and if you don't remove it, nothing bad will happen.

@shcheklein
Copy link
Contributor

@jorgeorpinel some of them do not work from cli and work from the browser. for some of them I would check the script once again.

@casperdcl
Copy link
Contributor Author

Yes for example circleci does weird non-standard redirection

@casperdcl casperdcl deleted the check-links branch February 5, 2020 14:15
@casperdcl casperdcl mentioned this pull request Feb 5, 2020
2 tasks
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Feb 7, 2020

Got it.

p.s. yarn link-check doesn't work on Mac: pcregrep: command not found

UPDATE: Got pcregrep with brew install pcre but now I'm getting a bunch of sed: illegal option -- r

UPDATE's UPDATE: Got GNU sed with brew install gnu-sed (and added it to my PATH)

@casperdcl
Copy link
Contributor Author

Yes that's one of the reasons why this is a CI-only check by default rather than a pre-commit hook... no guarantee that it'll run on dev machines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A: docs Area: user documentation (gatsby-theme-iterative) type: enhancement Something is not clear, small updates, improvement suggestions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci: test to check all links

4 participants