Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly fail if no metadata file is located #188

Closed
RickMoynihan opened this issue Aug 17, 2022 · 4 comments
Closed

Explicitly fail if no metadata file is located #188

RickMoynihan opened this issue Aug 17, 2022 · 4 comments
Labels
good first issue Good for newcomers

Comments

@RickMoynihan
Copy link
Member

RickMoynihan commented Aug 17, 2022

Related to issue #186 - when running csv2rdf with just a -t table csv2rdf does not locate the metadata document, and instead performs the default conversion.

The default conversion generates a literal RDF representation of the csv, which is of little use to us in most cases. In most cases it would be better to fail with an explicit error; rather than proceding to generate data of little practical value.

I'd suggest we:

  1. Fail hard in the above case, writing an error message to stderr.
  2. Provide a new command line flag --proceed-without-metadata to engage the current behaviour (generating the default RDFization of the literal CSV where there is no metadata document).
@Robsteranium
Copy link
Contributor

This would contradict the spec by default - I'd be inclined to invert the behaviour of that option.

I wonder if it'd be useful to surface the result of the steps taken to locate the metadata (though idk how easily it'd be to work with this via the CLI).

@RickMoynihan RickMoynihan added the good first issue Good for newcomers label Dec 1, 2023
@RickMoynihan
Copy link
Member Author

Ok, it looks like @Robsteranium is correct, and the spec results in us using "embeded metadata", which is all optional and undefined. However that section says the following (in the case where no explicit embedded is used):

Parsing based on the default dialect for CSV, as described in 8. Parsing Tabular Data, will extract column titles from the first row of a CSV file.

So this then becomes our fallback "metadata document" which results in the useless RDF.

If we're to be spec conformant we would need to

  1. Log a warning when only -t is supplied and we have fallen back to using embedded metadata.
  2. Support a flag that fails if we've fallen back to implicit/embedded metadata.

However after some more reflection I think it may be better to deviate from the spec in this regard, and fail fast on the RDFization in this case.

I just don't think the output data is useful at all, or ever what anyone would want or expect. This feels very much like an accidental outcome of the spec.

I think we should just change the behaviour. We could add an option in the future to be spec compliant in this regard; but I honestly think nobody would ever want to enable it :-)

@lkitching
Copy link
Contributor

While the 'embedded' output is rarely useful, it's not clear what benefit there would be to deviating from the spec here? If it's to guard against accidentally failing to supply a metadata document, this would be obvious in the output.

@RickMoynihan
Copy link
Member Author

RickMoynihan commented Dec 7, 2023

We've agreed to close this, because you should normally only be RDFIzing and expecting meaningful output if you have a metadata document, and if you have a metadata document, in an automated context it's always better to start explicitly from there rather than the CSV.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants