-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement deadlink checker for vald web #2643
Conversation
Deploying vald with Cloudflare Pages
|
📝 Walkthrough📝 WalkthroughWalkthroughThe changes introduce a command-line tool for validating links within HTML files, implemented in Changes
Possibly related PRs
Suggested labels
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
[CHATOPS:HELP] ChatOps commands.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- hack/tools/deadlink/main.go (1 hunks)
🔇 Additional comments not posted (1)
hack/tools/deadlink/main.go (1)
47-50
: Validate regular expressions for accurate link extractionThe regular expressions used to extract links might not cover all valid cases or could produce incorrect matches, especially if attributes contain additional properties or are formatted differently.
Run the following script to test and verify the regular expressions against sample HTML content:
Ensure the patterns match attribute values enclosed in quotes and handle different cases properly.
✅ Verification successful
Further Validate Regular Expressions with Diverse HTML Samples
To ensure the regex patterns handle various attribute formats and edge cases, please run the following script:
Regular Expressions Validated Successfully
The regular expressions correctly extract
src
,href
, andsrcset
attributes when values are enclosed in quotes. No issues were found with the current patterns.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that regular expressions correctly extract links from HTML files. # Test: Use sample HTML content to test regex patterns. # Expect: Correct extraction of 'src', 'href', and 'srcset' attributes. echo '<a href="https://example.com">Link</a><img src="/images/logo.png" /><source srcset="image1.jpg 1x, image2.jpg 2x">' > sample.html # Patterns to test grep -Eo 'src=["'\''][^"'\'']+["'\'']' sample.html grep -Eo 'href=["'\''][^"'\'']+["'\'']' sample.html grep -Eo 'srcset=["'\''][^"'\'']+["'\'']' sample.htmlLength of output: 374
Script:
#!/bin/bash # Description: Further verify that regular expressions correctly extract links from diverse HTML files. # Create a sample HTML file with varied attribute formats cat <<EOF > sample_complex.html <a href='https://example.com/page?param=value'>Example Link</a> <img src="/images/logo.png" alt="Logo" /> <source srcset="image1.jpg 1x, image2.jpg 2x, image3.jpg 3x"> <a href="https://example.com/another-page" class="nav-link">Another Link</a> <img src='/images/banner.png' data-info="banner" /> <source srcset='image4.jpg 1x, image5.jpg 2x'> <a href=https://example.com/missing-quotes>Missing Quotes</a> EOF # Patterns to test echo "Testing src:" grep -Eo 'src=["'\''][^"'\'']+["'\'']' sample_complex.html echo "Testing href:" grep -Eo 'href=["'\''][^"'\'']+["'\'']' sample_complex.html echo "Testing srcset:" grep -Eo 'srcset=["'\''][^"'\'']+["'\'']' sample_complex.htmlLength of output: 528
471f562
to
223aac3
Compare
@coderabbitai please review |
@vankichi: I will review the changes. ✅ Actions performedReview triggered.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
Makefile (2)
52-54
: LGTM! Consider adding comments for clarity.The new variables for the deadlink checker are well-defined. The use of
?=
forDEADLINK_CHECK_PATH
andDEADLINK_IGNORE_PATH
allows for flexibility in setting these values.Consider adding brief comments explaining the purpose of each variable to improve maintainability:
-DEADLINK_CHECK_PATH ?= "" -DEADLINK_IGNORE_PATH ?= "" -DEADLINK_CHECK_FORMAT = html +# Path to check for dead links +DEADLINK_CHECK_PATH ?= "" +# Regex pattern for paths to ignore during the check +DEADLINK_IGNORE_PATH ?= "" +# Output format for the results (e.g., html, json) +DEADLINK_CHECK_FORMAT = html
Line range hint
691-706
: LGTM! Improved cross-platform support for usearch installation.The changes to the
usearch/install
target greatly improve cross-platform compatibility by providing specific installation steps for Linux and macOS. The use of conditional statements and appropriate cleanup steps is commendable.Consider adding error handling for the curl commands to make the installation process more robust:
usearch/install: ifeq ($(OS),linux) - curl -sSL https://github.com/unum-cloud/usearch/releases/download/v$(USEARCH_VERSION)/usearch_$(OS)_$(GOARCH)_$(USEARCH_VERSION).deb -o usearch_$(OS)_$(USEARCH_VERSION).deb + curl -sSL -f https://github.com/unum-cloud/usearch/releases/download/v$(USEARCH_VERSION)/usearch_$(OS)_$(GOARCH)_$(USEARCH_VERSION).deb -o usearch_$(OS)_$(USEARCH_VERSION).deb || (echo "Failed to download usearch package" && exit 1) dpkg -i usearch_$(OS)_$(USEARCH_VERSION).deb rm usearch_$(OS)_$(USEARCH_VERSION).deb ldconfig else ifeq ($(OS),macos) - curl -sSL https://github.com/unum-cloud/usearch/releases/download/v$(USEARCH_VERSION)/usearch_macos_$(GOARCH)_$(USEARCH_VERSION).zip -o usearch_macos_$(OS)_$(USEARCH_VERSION).zip + curl -sSL -f https://github.com/unum-cloud/usearch/releases/download/v$(USEARCH_VERSION)/usearch_macos_$(GOARCH)_$(USEARCH_VERSION).zip -o usearch_macos_$(OS)_$(USEARCH_VERSION).zip || (echo "Failed to download usearch package" && exit 1) unzip usearch_macos_$(OS)_$(USEARCH_VERSION).zip sudo mv libusearch_c.dylib /usr/local/lib && sudo mv usearch.h /usr/local/include rm -rf usearch_macos_$(OS)_$(USEARCH_VERSION).zip ldconfig endifThis addition of the
-f
flag to curl and the error handling will ensure that the installation fails gracefully if the download doesn't succeed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- Makefile (3 hunks)
- Makefile.d/functions.mk (1 hunks)
- hack/tools/deadlink/main.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- hack/tools/deadlink/main.go
🔇 Additional comments (1)
Makefile (1)
454-458
: LGTM! Well-structured target for deadlink checker.The new
deadlink-checker
target is correctly defined as phony and properly calls thegen-deadlink-checker
function with the necessary parameters. The comment above the target is helpful for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (2)
Makefile (1)
Line range hint
691-703
: Improvement suggestion for usearch installationThe
usearch/install
target has platform-specific installation steps for Linux and macOS. While the implementation is correct, it could be improved for better maintainability and readability.Consider refactoring the
usearch/install
target to use a more concise and maintainable structure. Here's a suggested improvement:.PHONY: usearch/install ## install usearch usearch/install: @case "$(OS)" in \ linux) \ curl -sSL https://github.com/unum-cloud/usearch/releases/download/v$(USEARCH_VERSION)/usearch_$(OS)_$(GOARCH)_$(USEARCH_VERSION).deb -o usearch_$(OS)_$(USEARCH_VERSION).deb && \ dpkg -i usearch_$(OS)_$(USEARCH_VERSION).deb && \ rm usearch_$(OS)_$(USEARCH_VERSION).deb ;; \ darwin) \ curl -sSL https://github.com/unum-cloud/usearch/releases/download/v$(USEARCH_VERSION)/usearch_macos_$(GOARCH)_$(USEARCH_VERSION).zip -o usearch_macos_$(USEARCH_VERSION).zip && \ unzip usearch_macos_$(USEARCH_VERSION).zip && \ sudo mv libusearch_c.dylib /usr/local/lib && sudo mv usearch.h /usr/local/include && \ rm -rf usearch_macos_$(USEARCH_VERSION).zip ;; \ *) echo "Unsupported OS: $(OS)" && exit 1 ;; \ esac ldconfigThis refactored version:
- Uses a case statement for better readability.
- Combines commands with
&&
to ensure all steps are executed successfully.- Adds an error case for unsupported operating systems.
- Keeps the
ldconfig
command outside the case statement as it's common to both platforms.hack/tools/deadlink/main.go (1)
24-33
: Adhere to Go's naming conventions for constantsIn Go, constants are typically named using CamelCase rather than ALL_CAPS. Renaming the constants improves readability and aligns with Go's idiomatic style.
Apply this diff to rename the constants:
const ( - BASE_URL = "https://vald.vdaas.org" - PREFIX_PROP = `property="og:url" content="` - PREFIX_SRC = "src=" - PREFIX_SRCSET = "srcset=" - PREFIX_HREF = "href=" + baseURL = "https://vald.vdaas.org" + prefixProp = `property="og:url" content="` + prefixSrc = "src=" + prefixSrcset = "srcset=" + prefixHref = "href=" BASE_REGEXP = `[\w!\?/\+\-_~=;:\.,\*&@#\$%\(\)'\[\]]+` URL_REGEXP = `https?://[\w!\?/\+\-_~=;\.,\*&@#\$%\(\)'\[\]]+` )Ensure to update all references to these constants throughout the code.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- Makefile (3 hunks)
- Makefile.d/functions.mk (1 hunks)
- hack/tools/deadlink/main.go (1 hunks)
🔇 Additional comments (3)
Makefile (3)
52-54
: New variables added for deadlink checkerThree new variables have been added for the deadlink checker functionality:
DEADLINK_CHECK_PATH
: Path to check for dead links.DEADLINK_IGNORE_PATH
: Regex pattern for paths to ignore during the check.DEADLINK_CHECK_FORMAT
: Output format for the results, with a default value of "html".These variables provide good flexibility for configuring the deadlink checker.
454-458
: New phony target added for deadlink checkerA new phony target
deadlink-checker
has been added to generate the deadlink checker. It uses thegen-deadlink-checker
function, passing the root directory, maintainer information, and the newly defined variables as parameters.This addition aligns well with the project's structure and naming conventions.
Line range hint
1-703
: Summary of changes: Deadlink checker implementationThe changes to this Makefile primarily focus on implementing a deadlink checker functionality. Key additions include:
- New variables for configuring the deadlink checker (lines 52-54).
- A new phony target
deadlink-checker
for generating the deadlink checker (lines 454-458).These additions are well-integrated into the existing Makefile structure and provide useful functionality for checking dead links in the project. The new variables offer good configurability for the deadlink checker.
While not directly related to the deadlink checker, an improvement suggestion was made for the
usearch/install
target to enhance its maintainability and readability.Overall, these changes appear to be a positive addition to the project's build and maintenance capabilities.
Signed-off-by: vankichi <kyukawa315@gmail.com>
Signed-off-by: vankichi <kyukawa315@gmail.com>
Signed-off-by: vankichi <kyukawa315@gmail.com>
Signed-off-by: vankichi <kyukawa315@gmail.com>
4466d68
to
8942da1
Compare
Signed-off-by: Yusuke Kato <kpango@vdaas.org>
c0f1802
* ✨ Implement deadlink checker for vald web Signed-off-by: vankichi <kyukawa315@gmail.com> * ♻️ refactor Signed-off-by: vankichi <kyukawa315@gmail.com> * style: format code with Gofumpt and Prettier This commit fixes the style issues introduced in 4f5a181 according to the output from Gofumpt and Prettier. Details: vdaas#2643 * ♻️ refactor Signed-off-by: vankichi <kyukawa315@gmail.com> * 🔧 add make command Signed-off-by: vankichi <kyukawa315@gmail.com> * ♻️ Fix Signed-off-by: vankichi <kyukawa315@gmail.com> --------- Signed-off-by: vankichi <kyukawa315@gmail.com> Signed-off-by: Yusuke Kato <kpango@vdaas.org> Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com> Co-authored-by: Yusuke Kato <kpango@vdaas.org>
* ✨ Implement deadlink checker for vald web Signed-off-by: vankichi <kyukawa315@gmail.com> * ♻️ refactor Signed-off-by: vankichi <kyukawa315@gmail.com> * style: format code with Gofumpt and Prettier This commit fixes the style issues introduced in 4f5a181 according to the output from Gofumpt and Prettier. Details: vdaas#2643 * ♻️ refactor Signed-off-by: vankichi <kyukawa315@gmail.com> * 🔧 add make command Signed-off-by: vankichi <kyukawa315@gmail.com> * ♻️ Fix Signed-off-by: vankichi <kyukawa315@gmail.com> --------- Signed-off-by: vankichi <kyukawa315@gmail.com> Signed-off-by: Yusuke Kato <kpango@vdaas.org> Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com> Co-authored-by: Yusuke Kato <kpango@vdaas.org>
Description
I have implemented a new command line tool to check deadlinks in the target file.
Also, I have added the make command:
make deadlink-checker
.The usage is below:
# make deadlink-checker DEADLINK_CHECK_PATH=<path> DEADLINK_IGNORE_PATH=<ignore_path> DEADLINK_CHECK_FORMAT=<format> make deadlink-checker DEADLINK_CHECK_PATH=./ DEADLINK_IGNORE_PATH=v[0-9]+ DEADLINK_CHECK_FORMAT=html
Related Issue
Versions
Checklist
Special notes for your reviewer
Summary by CodeRabbit
Summary by CodeRabbit
deadlink-checker
target in the build process for easier link validation.usearch/install
target for better build accuracy.