Skip to content

Nemo v0.3.0

Compare
Choose a tag to compare
@mmarx mmarx released this 10 Jul 17:22
· 1379 commits to main since this release
v0.3.0
e7ddd67

Version 0.3.0 of Nemo extends the fast in-memory rule reasoner with support for more arithmetic features and extended data sources and formats. The command-line client nmo can be used to access this functionality (use nmo --help for a brief documentation). The online documentation of Nemo covers many of the current features.

New Features and Improvements

The following new features have been added in Nemo v0.3.0:

  • Support for arithmetic operators (+, -, *, /), so far only in rule heads
  • Extended arithmetic comparisons between variables (can now include two variables, not just variables and constants)
  • New datatype string to load plain string data (generally faster loading than the full RDF term parsing of type any, which also can hold string data, as before)
  • Support for online sources: just use URLs (with http:/https:) in @source declarations
  • Support for RDF files in Turtle and RDF/XML format through the existing load-rdf method (any @base declaration from the rules file will be used as default base IRI in such formats), enabled by Oxigraph's Rio RDF library

Notable bug fixes and internal improvements:

  • @output (the directive that specifies which predicate data to return) is now allowed in any place of the file (not just as the end)

Current Functionality

At this version, Nemo therefore includes support for the following features:

  • Execution (materialization) of Datalog extended with stratified negation and existential rules (tuple-generating dependencies)
  • Loading input data from local or remote CSV/TSV and RDF (NT, Turtle, RDF/XML) files
  • Writing results to CSV files
  • Compatibility with RDF and SPARQL syntax for IRIs and literals
  • Datatypes integer (whole numbers), float64 (64bit floating point numbers), string (Unicode strings), and any (union type that can represent any element; default for most contexts)
  • Support for built-in predicates <=, <, >=, >, and = and built-in functions +, -, *, and / (in rule heads) for types integer and float64

Nemo v0.3.0 is built for mid-sized computing tasks that can still be processed on a normal laptop in seconds or minutes (typically hundreds of thousands to hundreds of millions of facts). In such cases, Nemo is already quite fast – at least fast enough to outperform existing free rule engines on the tests we conducted so far. Example tasks and benchmark results can be found in our sister repository Nemo examples and benchmarks.

Note that the combination of existential quantifiers and stratified negation do not have a standard semantics, and may lead to unexpected conclusions which can be avoided by careful modeling (see Ellmauthaler, Krötzsch, Mennicke; AAAI 2022).

Known Limitations

The following features are not included in v0.3.0 yet and will be added in upcoming releases:

  • Support for RDF output
  • Support for arithmetic functions in rule bodies (currently only in heads)
  • Support for unary - in arithmetic expressions (current workaround: use 0-?X instead of -?X)
  • More built-in functions, especially for string data, and support for aggregates
  • More control over output formats

Moreover, the documentation, though improved, is still very limited. This will be expanded gradually.

Breaking change: CSV default type

The default type for CSV data sources has changed from any to string. For existing rules files, this may require small changes in rules or in CSV files. For example, using type any as before, a quoted RDF string literal would be encoded in CSV as """Hello world!""", where the outer " delimit the value, and the inner "" encode single quotes (in CSV). The result is a string with the characters Hello world!. When using type string instead, the whole value found in CSV will be considered the contents of a string, so the above fragment would produce the string with content "Hello world!" (with the quotes as first and last symbol). To get the same as before, one would leave away the inner "" in CSV, which were only needed to mark the value as string literal in the context of RDF (where many other types of terms are allowed).

One can also revert to the old behaviour by declaring type any explicitly for CSV sources, but this will in general be slower (RDF term parsing and normalization is more work than plain string reading). Finally, one could also read a CSV file that contains RDF-formatted data as strings and work with the plain strings throughout the program. This might require changes to constant that appear in the rules file (e.g., if a former string constant "Hello world!" needs to become "\"Hello world!\"" to match the verbatim data when read as string. Note that this disables all normalization, e.g., "Hello world!" will not be equal to "Hello world!"^^<http://www.w3.org/2001/XMLSchema#string> as it would be in RDF.

Feedback and issue reports are welcome!