Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoIriMapper chooses wrong IRIs. #755

Closed
mrnolte opened this issue Apr 26, 2018 · 6 comments
Closed

AutoIriMapper chooses wrong IRIs. #755

mrnolte opened this issue Apr 26, 2018 · 6 comments
Labels

Comments

@mrnolte
Copy link
Contributor

mrnolte commented Apr 26, 2018

Hey there,
I am confused by the way the AutoIriMapper class works: According to the source code for manchester syntax, for example, the first token enclosed by < and > is assumed as the IRI:

private void parseManchesterSyntaxFile(File file) {
        try (FileInputStream input = new FileInputStream(file);
            InputStreamReader reader = new InputStreamReader(input, StandardCharsets.UTF_8);
            BufferedReader br = new BufferedReader(reader)) {
            // Ontology: <URI>
            String line = br.readLine();
            while (line != null) {
                if (parseManLine(file, line) != null) {
                    return;
                }
                line = br.readLine();
            }
        } catch (IOException e) {
            // if we can't parse a file, then we can't map it
            LOGGER.debug("Exception reading file", e);
        }
}


private IRI parseManLine(File file, String line) {
    for (String tok : Splitter.on(" ").split(line)) {
        if (tok.startsWith("<") && tok.endsWith(">")) {
            IRI iri = unquote(tok);
            addMapping(iri, file);
            return iri;
        }
    }
    return null;
}

This seems incorrect to me, because prefixes can be defined before the ontology IRI, but in the same way. Instead of the ontology IRI, the prefix IRI is mapped to the file. The error seems to occur for RDF/XML, too. I didn't check the other formats.

Thanks for reading,
Robin Nolte

@ignazio1977
Copy link
Contributor

That's peculiar, you're correct but no one has reported this issue in years. I guess in most ontologies the ontology is declared first, or the local file is mapped incorrectly and the ontology is resolvedfrom its remote IRI without users noticing.

The XML parser uses the xml:base instead of the ontology IRI, but it is very common for the xml:base and the ontology IRI to be identical, or for the ontology IRI to be empty - so in general this does not show up as a defect.

The functional parser matches on Ontology(<...>) so it's not vulnerable to the same issue.

@mrnolte
Copy link
Contributor Author

mrnolte commented Apr 30, 2018

no one has reported this issue in years

I recently had to deal with a bunch of ontologies that import each other. The IRIs were overwritten by the mutual use of prefixes, which is why the error occurred.

The problem can easily be fixed for Manchester Syntax by requiring that line.startsWith("Ontology").

The XML parser uses the xml:base instead of the ontology IRI, but it is very common for the xml:base and the ontology IRI to be identical, or for the ontology IRI to be empty - so in general this does not show up as a defect.

As far as I understand, ontologyIRI should be used for OWL/XML and rdf:about in the owl:Ontology element for RDF/XML.

@mrnolte
Copy link
Contributor Author

mrnolte commented Apr 30, 2018

The functional parser matches on Ontology(<...>) so it's not vulnerable to the same issue.

As a workaround, I tried to convert the ontologies to Functional Syntax. Unfortunatly, not one of them was mapped by the AutoIriMapper in any way. Based on the source code below, I couldn't find anywhere that files are parsed in the functional syntax at all. Is the Javadoc wrong?

protected void parseIfExtensionSupported(File file) {
        String name = file.getName();
        int lastIndexOf = name.lastIndexOf('.');
        if (lastIndexOf < 0) {
            // no extension for the file, nothing to do
            return;
        }
        String extension = name.substring(lastIndexOf);
        if (".zip".equalsIgnoreCase(extension) || ".jar".equalsIgnoreCase(extension)) {
            try {
                ZipIRIMapper mapper = new ZipIRIMapper(file, "jar:" + file.toURI() + "!/");
                mapper.oboMappings().forEach(e -> oboFileMap.put(e.getKey(), e.getValue()));
                mapper.iriMappings()
                    .forEach(e -> ontologyIRI2PhysicalURIMap.put(e.getKey(), e.getValue()));
            } catch (IOException e) {
                // if we can't parse a file, then we can't map it
                LOGGER.debug("Exception reading file", e);
            }

        } else if (".obo".equalsIgnoreCase(extension)) {
            oboFileMap.put(name, IRI.create(file));
        } else if (".ofn".equalsIgnoreCase(extension)) {
            parseFSSFile(file);
        } else if (".omn".equalsIgnoreCase(extension)) {
            parseManchesterSyntaxFile(file);
        } else if (fileExtensions.contains(extension.toLowerCase())) {
            parseFile(file);
        }
}
private void parseFile(File file) {
        try (FileInputStream in = new FileInputStream(file);
            BufferedInputStream delegate = new BufferedInputStream(in);
            InputStream is = DocumentSources.wrap(delegate);) {
            currentFile = file;
            // Using the default expansion limit. If the ontology IRI cannot be
            // found before 64000 entities are expanded, the file is too
            // expensive to parse.
            SAXParsers.initParserWithOWLAPIStandards(null, "64000").parse(is, this);
        } catch (SAXException | IOException e) {
            // if we can't parse a file, then we can't map it
            LOGGER.debug("Exception reading file", e);
        }
}

@ignazio1977
Copy link
Contributor

} else if (".ofn".equalsIgnoreCase(extension)) {
parseFSSFile(file);

parseFSSFile is where functional style syntax is used. Currently only the .ofn extension is allowed, which is probably not the most common one. I'd argue it's a bit restrictive.

@ignazio1977
Copy link
Contributor

As you're using a very recent OWLAPI version and are working with a set of ontologies that you are in control of, another workaround is available: use OWLZipClosureIRIMapper. This will allow you to compress the ontologies in a zip file and add an index file that explicitly assigns an ontology IRI to each zip entry, sidestepping the AutoIRIMapper issues.

An example index file is:
File name (included in zip file): owlzip.properties

Contents

roots=D.owl, someotherfolder/B.owl
D.owl=http://test.org/complexImports/D.owl
somefolder/A.owl=http://test.org/compleximports/A.owl
someotherfolder/B.owl=http://test.org/complexontologies/B.owl
someotherfolder/C.owl=http://test.org/compleximports/C.owl

See #375 for details on the feature

@mrnolte
Copy link
Contributor Author

mrnolte commented May 7, 2018

As you're using a very recent OWLAPI version and are working with a set of ontologies that you are in control of, another workaround is available: use OWLZipClosureIRIMapper.

Thanks, I will try that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants