-
Notifications
You must be signed in to change notification settings - Fork 0
Process HTML DOM elements with metafacture-fix #2
Comments
With HTML input support in metafacture/metafacture-core#312 and URL input support in metafacture/metafacture-fix#6, we can use metafacture-fix to convert the full DOM structure of something like https://www.hoou.de/materials/tutorial-lernen-lernen to JSON: To pick out just the title and the description, in http://test.lobid.org/fix, we can use a Fix like:
With the Flux from the link above:
We get some concise JSON back:
So this basically works. However, the |
Both this and #3 basically work (we get a title and a description). Maybe it makes sense continue with the bigger picture (collecting sources from the sitemap.xml, indexing the results) instead of improving the way we extract the description at this point? |
+1 |
Moved to https://gitlab.com/oersi/oersi-etl/-/issues/2. Closing. |
For a scenario as in https://github.com/programmieraffe/oerhoernchen20#technical-background, looking at a sitemap like https://www.hoou.de/sitemap.xml, finding OER
materials
like https://www.hoou.de/materials/tutorial-lernen-lernen, we want to process that resource with metafacture-fix to create JSON output that can be indexed with Elasticsearch. Fixes should be configurable in a UI like http://test.lobid.org/fix.The text was updated successfully, but these errors were encountered: