Explicitly fail if no metadata file is located #188

RickMoynihan · 2022-08-17T13:00:53Z

Related to issue #186 - when running csv2rdf with just a -t table csv2rdf does not locate the metadata document, and instead performs the default conversion.

The default conversion generates a literal RDF representation of the csv, which is of little use to us in most cases. In most cases it would be better to fail with an explicit error; rather than proceding to generate data of little practical value.

I'd suggest we:

Fail hard in the above case, writing an error message to stderr.
Provide a new command line flag --proceed-without-metadata to engage the current behaviour (generating the default RDFization of the literal CSV where there is no metadata document).

The text was updated successfully, but these errors were encountered:

Robsteranium · 2022-08-22T13:54:52Z

This would contradict the spec by default - I'd be inclined to invert the behaviour of that option.

I wonder if it'd be useful to surface the result of the steps taken to locate the metadata (though idk how easily it'd be to work with this via the CLI).

RickMoynihan · 2023-12-06T13:44:05Z

Ok, it looks like @Robsteranium is correct, and the spec results in us using "embeded metadata", which is all optional and undefined. However that section says the following (in the case where no explicit embedded is used):

Parsing based on the default dialect for CSV, as described in 8. Parsing Tabular Data, will extract column titles from the first row of a CSV file.

So this then becomes our fallback "metadata document" which results in the useless RDF.

If we're to be spec conformant we would need to

Log a warning when only -t is supplied and we have fallen back to using embedded metadata.
Support a flag that fails if we've fallen back to implicit/embedded metadata.

However after some more reflection I think it may be better to deviate from the spec in this regard, and fail fast on the RDFization in this case.

I just don't think the output data is useful at all, or ever what anyone would want or expect. This feels very much like an accidental outcome of the spec.

I think we should just change the behaviour. We could add an option in the future to be spec compliant in this regard; but I honestly think nobody would ever want to enable it :-)

lkitching · 2023-12-06T16:33:54Z

While the 'embedded' output is rarely useful, it's not clear what benefit there would be to deviating from the spec here? If it's to guard against accidentally failing to supply a metadata document, this would be obvious in the output.

RickMoynihan · 2023-12-07T17:02:29Z

We've agreed to close this, because you should normally only be RDFIzing and expecting meaningful output if you have a metadata document, and if you have a metadata document, in an automated context it's always better to start explicitly from there rather than the CSV.

RickMoynihan mentioned this issue Aug 17, 2022

If given just a file via -t csv2rdf does not locate the metadata document #186

Open

RickMoynihan added the good first issue Good for newcomers label Dec 1, 2023

RickMoynihan closed this as completed Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicitly fail if no metadata file is located #188

Explicitly fail if no metadata file is located #188

RickMoynihan commented Aug 17, 2022 •

edited

Loading

Robsteranium commented Aug 22, 2022

RickMoynihan commented Dec 6, 2023

lkitching commented Dec 6, 2023

RickMoynihan commented Dec 7, 2023 •

edited

Loading

Explicitly fail if no metadata file is located #188

Explicitly fail if no metadata file is located #188

Comments

RickMoynihan commented Aug 17, 2022 • edited Loading

Robsteranium commented Aug 22, 2022

RickMoynihan commented Dec 6, 2023

lkitching commented Dec 6, 2023

RickMoynihan commented Dec 7, 2023 • edited Loading

RickMoynihan commented Aug 17, 2022 •

edited

Loading

RickMoynihan commented Dec 7, 2023 •

edited

Loading