Scrape Blog Posts to Jekyll

Basic script with lots of hardcoded values to scrape an existing blog post on a DSA site, convert it to a Jekyll Markdown file, and save all the necessary assets in the expected file structure.

TODO

Increase flexibility
- Could the selector text be given at runtime?
Wayyyyyy better error handling
Get author into liquid again
- Format of authors isn't always the same
  - I've seen "By ...", "by ...", "By: ..."
- Some articles have no byline
Add support for nested lists
Remove the Sacremento DSA links from the tests
Translating <br>s to \n is adding extra lines to the markdown
- Analyze the collected vector before joining

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
resources/test		resources/test
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scrape Blog Posts to Jekyll

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Languages

danielhertenstein/html-to-markdown

Folders and files

Latest commit

History

Repository files navigation

Scrape Blog Posts to Jekyll

TODO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages