Skip to content

Commit

Permalink
Making documentations/templates URLs complete instead of relative
Browse files Browse the repository at this point in the history
For representations on other websites
  • Loading branch information
D4Vinci committed Oct 16, 2024
1 parent 4ca41c9 commit b45cb41
Show file tree
Hide file tree
Showing 4 changed files with 10 additions and 10 deletions.
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
- Link to documentation pull request: **

### Checklist:
* [ ] I have read [CONTRIBUTING.md](/CONTRIBUTING.md).
* [ ] I have read [CONTRIBUTING.md](https://github.com/D4Vinci/Scrapling/blob/main/CONTRIBUTING.md).
* [ ] This pull request is all my own work -- I have not plagiarized.
* [ ] I know that pull requests will not be merged if they fail the automated tests.
* [ ] All new Python files are placed inside an existing directory.
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
Everybody is invited and welcome to contribute to Scrapling. Smaller changes have a better chance to get included in a timely manner. Adding unit tests for new features or test cases for bugs you've fixed help us to ensure that the Pull Request (PR) is fine.

There is a lot to do...
- If you are not a developer perhaps you would like to help with the [documentation](/docs)?
- If you are a developer, most of the features I'm planning to add in the future are moved to [roadmap file](/ROADMAP.md) so consider reading it.
- If you are not a developer perhaps you would like to help with the [documentation](https://github.com/D4Vinci/Scrapling/tree/main/docs)?
- If you are a developer, most of the features I'm planning to add in the future are moved to [roadmap file](https://github.com/D4Vinci/Scrapling/blob/main/ROADMAP.md) so consider reading it.

Scrapling includes a comprehensive test suite which can be executed with pytest:
```bash
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ As you see, Scrapling is on par with Scrapy and slightly faster than Lxml which

Scrapling can find elements with more methods and it returns full element `Adaptor` objects not only the text like AutoScraper. So, to make this test fair, both libraries will extract an element with text, find similar elements, and then extract the text content for all of them. As you see, Scrapling is still 4.5 times faster at same task.

> All benchmarks' results are an average of 100 runs. See our [benchmarks.py](/benchmarks.py) for methodology and to run your comparisons.
> All benchmarks' results are an average of 100 runs. See our [benchmarks.py](https://github.com/D4Vinci/Scrapling/blob/main/benchmarks.py) for methodology and to run your comparisons.
## Advanced Features
### Smart Navigation
Expand Down Expand Up @@ -217,7 +217,7 @@ To increase the complexity a little bit, let's say we want to get all books' dat
{'name': 'Sharp Objects', 'price': '47.82', 'stock': 'In stock'}
...
```
The [documentation](/docs/Examples) will provide more advanced examples.
The [documentation](https://github.com/D4Vinci/Scrapling/tree/main/docs/Examples) will provide more advanced examples.

### Handling Structural Changes
> Because [the internet archive](https://web.archive.org/) is down at the time of writing this, I can't use real websites as examples even though I tested that before (I mean browsing an old version of a website and then counting the current version of the website as structural changes)
Expand Down Expand Up @@ -366,7 +366,7 @@ Scrapling is under active development so expect many more features coming soon :
## More Advanced Usage
There are a lot of deep details skipped here to make this as short as possible so to take a deep dive, head to the [docs](/docs) section. I will try to keep it updated as possible and add complex examples. There I will explain points like how to write your storage system, write spiders that don't depend on selectors at all, and more...
There are a lot of deep details skipped here to make this as short as possible so to take a deep dive, head to the [docs](https://github.com/D4Vinci/Scrapling/tree/main/docs) section. I will try to keep it updated as possible and add complex examples. There I will explain points like how to write your storage system, write spiders that don't depend on selectors at all, and more...
Note that implementing your storage system can be complex as there are some strict rules such as inheriting from the same abstract class, following the singleton design pattern used in other classes, and more. So make sure to read the docs first.
Expand Down Expand Up @@ -421,14 +421,14 @@ Yes, Scrapling instances are thread-safe. Each Adaptor instance maintains its ow
## Contributing
Everybody is invited and welcome to contribute to Scrapling. There is a lot to do!
Please read the [contributing file](/CONTRIBUTING.md) before doing anything.
Please read the [contributing file](https://github.com/D4Vinci/Scrapling/blob/main/CONTRIBUTING.md) before doing anything.
## License
This work is licensed under BSD-3
## Acknowledgments
This project includes code adapted from:
- Parsel (BSD License) - Used for [translator](/scrapling/translator.py) submodule
- Parsel (BSD License) - Used for [translator](https://github.com/D4Vinci/Scrapling/blob/main/scrapling/translator.py) submodule
## Known Issues
- In the auto-matching save process, the unique properties of the first element from the selection results are the only ones that get saved. So if the selector you are using selects different elements on the page that are in different locations, auto-matching will probably return to you the first element only when you relocate it later. This doesn't include combined CSS selectors (Using commas to combine more than one selector for example) as these selectors get separated and each selector gets executed alone.
Expand Down
4 changes: 2 additions & 2 deletions docs/Extending Scrapling/writing storage system.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ So first to make your storage class work, it must do the big 3:
* The first one is of type `lxml.html.HtmlElement` which is the element itself, ofc. It must be converted to dictionary using the function `scrapling.utils._StorageTools.element_to_dict` so we keep the same format then saved to your database as you wish.
* The second one is string which is the identifier used for retrieval. The combination of this identifier and the `url` argument from initialization must be unique for each row or the auto-match will be messed up.
- The method `retrieve` takes a string which is the identifier, using it with the `url` passed on initialization the element's dictionary is retrieved from the database and returned if it exist otherwise it returns `None`
> If the instructions weren't clear enough for you, you can check my implementation using SQLite3 in [storage_adaptors](/scrapling/storage_adaptors.py) file
> If the instructions weren't clear enough for you, you can check my implementation using SQLite3 in [storage_adaptors](https://github.com/D4Vinci/Scrapling/blob/main/scrapling/storage_adaptors.py) file
If your class satisfy this, the rest is easy. If you are planning to use the library in a threaded application, make sure that your class supports it. The default used class is thread-safe.

There are some helper functions added to the abstract class if you want to use it. It's easier to see it for yourself in the [code](/scrapling/storage_adaptors.py), it's heavily commented :)
There are some helper functions added to the abstract class if you want to use it. It's easier to see it for yourself in the [code](https://github.com/D4Vinci/Scrapling/blob/main/scrapling/storage_adaptors.py), it's heavily commented :)

0 comments on commit b45cb41

Please sign in to comment.