-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need to consolidate to a single HTML ingest #164
Comments
Note:
|
Discussed with @pankajastro - a single HTML extractor makes sense. |
just adding more here a single extractor makes sense but still we require a thin layer over it for the different sources because we need some different cleanup approaches for different sources for example in Astro SDK I'm excluding if the docs URL has |
Created draft PR on this one |
just marked PR as ready for review, would appreciate a review |
closes: #164 Currently, We have some duplicate code in the HTML extractor, this PR aims to remove the duplicate code and reuse it from html_utils.
closes: #164 Currently, We have some duplicate code in the HTML extractor, this PR aims to remove the duplicate code and reuse it from html_utils.
Please describe the feature you'd like to see
Multiple extract functions use almost identical HTML extract logic.
Describe the solution you'd like
Should consolidate to a single function if possible and use dynamic task mapping like github extract.
Are there any alternatives to this feature?
Additional context
Acceptance Criteria
Note:
The text was updated successfully, but these errors were encountered: