-
Notifications
You must be signed in to change notification settings - Fork 0
Process embedded JSON in HTML with metafacture-fix #3
Comments
With HTML input support in metafacture/metafacture-core#312 and URL input support in metafacture/metafacture-fix#6, we can access the My initial idea was to set up something like this:
That is, parse the HTML, pick out the JSON data in the first Fix (with something like the
That is, we decode an HTML document as JSON, by looking for embedded JSON in the HTML. |
This looks fine. However, it should then first try to get JSON-only via accept header ( |
The accept header is actually a config option of the
You can test that in http://test.lobid.org/fix with a Flux like:
|
When we discussed this today, @dr0i objected that this is rather confusing, as we would open an HTML document with a JSON decoder. Additionally, it would pull the jsoup dependency into the metafacture-json project. Instead, we came up with a small module that only extracts the JSON from the HTML. This could be part of metafacture-html and would have no dependency on metafacture-json. It would be used like this:
|
Deployed to http://test.lobid.org/fix: Flux:
Output:
I've used |
+1 |
Moved to https://gitlab.com/oersi/oersi-etl/-/issues/3. Closing. |
For a scenario as in https://github.com/programmieraffe/oerhoernchen20#technical-background, looking at a resource with embedded JSON like https://www.oerbw.de/edu-sharing/components/render/4aed7529-dd02-44d0-b518-4640a8e8902f, we want to process that resource with metafacture-fix to create JSON output that can be indexed with Elasticsearch. Fixes should be configurable in a UI like http://test.lobid.org/fix.
The text was updated successfully, but these errors were encountered: