Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running website-scraper Without a package.json in the Working Directory Triggers Error #512

Closed
manuth opened this issue Sep 26, 2022 · 5 comments
Labels

Comments

@manuth
Copy link

manuth commented Sep 26, 2022

Configuration

version: 5.3.0
options: options don't have an influence on this issue

Description

When running website-scraper with the current working directory set to a directory which does not contain a package.json file triggers an error.

Expected behavior: The working directory should not have any side-effect of this sort

Actual behavior: An error is thrown:

ENOENT: no such file or directory, open './package.json'.

Additional Information

This error is caused by this piece of code:

const packageJson = JSON.parse(fs.readFileSync('./package.json', 'utf8'));
const defaultRequestUserAgent = `${packageJson.name}/${packageJson.version} (https://github.com/website-scraper/node-website-scraper)`;

As seen in this piece of code, this statement relies on a package.json file to exist in the current working directory.
A workaround is sadly not possible at time of writing.

@s0ph1e
Copy link
Member

s0ph1e commented Sep 27, 2022

Hi @manuth 👋

Thank you for reporting the problem
Could you please share the steps to reproduce this issue? How do you run your application?

@manuth
Copy link
Author

manuth commented Sep 27, 2022

Hey @s0ph1e
Thanks for the rapid answer 😄

Sorry for the inconvenience, I'll describe the steps for the reproduction real quick.

  1. Create an empty directory
    mkdir scrape-test && cd scrape-test
  2. Create package.json as described below
  3. Run npm install
  4. Create index.js file as described below
  5. Run the script using node ./index.js
  6. Take note that everything works as expected
  7. cd into a directory which does not contain a package.json (for example cd ..)
  8. Run the script again node ./scrape-test/index.js
  9. Take note that an running the script triggers an error now

Assets

package.json:

{
  "type": "module",
  "dependencies": {
    "website-scraper": "^5.3.0"
  },
  "devDependencies": {
    "@types/node": "^18.7.23",
    "@types/website-scraper": "^1.2.6"
  }
}

index.js

import { join } from "path";
import { fileURLToPath } from "url";
import websiteScraper from "website-scraper";

(
    async () =>
    {
        await websiteScraper(
            {
                directory: join(fileURLToPath(new URL(".", import.meta.url)), "output"),
                urls: [
                    "https://nodejs.org"
                ]
            });
    })();

@s0ph1e
Copy link
Member

s0ph1e commented Sep 28, 2022

Thanks @manuth 👍

I'll try to reproduce and fix it during next 1-2 weeks

If anyone wants to contribute - PRs are welcome

@s0ph1e
Copy link
Member

s0ph1e commented Oct 9, 2022

The bug was fixed and released in v5.3.1

@s0ph1e s0ph1e closed this as completed Oct 9, 2022
@manuth
Copy link
Author

manuth commented Oct 9, 2022

Awesome! 😄 Thanks for the rapid fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants