Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make compiled files mmap-compatible #119

Draft
wants to merge 41 commits into
base: main
Choose a base branch
from
Draft

make compiled files mmap-compatible #119

wants to merge 41 commits into from

Conversation

mr-martian
Copy link
Contributor

@mr-martian mr-martian commented Jul 26, 2021

This PR adds a new binary format for transducers which is compatible with memory mapping and adds to lt-proc the ability to load it via mmap.

It also makes the Python bindings link to the .so rather than recompiling the repo.

TODO before merging:

  • finish make binary files mmap-compatible apertium#130
  • finish mmap-able files apertium-lex-tools#79
  • make the appropriate changes to apertium-recursive
  • test mmap-compatible files apertium-separable#41
  • make sure apertium-anaphora and lexd are still ok
  • drop old transducer execution code in favor of updated versions
    • trans_exe.h/cc superseded by transducer_exe.h/cc
    • node.h/cc and match_node.h/cc replaced by flat arrays
    • delete match_state.h/cc and rename match_state2.h/cc
    • match_exe.h/cc functionality is now part of transducer_exe.h/cc
  • drop serialiser.h and deserialiser.h and related functions
    • only used by apertium-tagger and will be contained in a single file in apertium going forward
  • drop compression.h/cc write functions and mark read functions as deprecated
  • move pattern_list.h/cc to apertium
  • lt-proc -e nno-nob.automorf.bin is currently segfaulting

@TinoDidriksen
Copy link
Member

@mr-martian
Copy link
Contributor Author

While we're in the business of speeding things up and breaking internal backwards compatibility, it would probably be a good idea to switch the datatype of Transducer from map<int, multimap<int, pair<int, double>>> to vector<multimap<int, pair<int, double>>>. Only code that wrote out the full type signature rather than using auto would have to change, since states are always added sequentially from 0 anyway.

@TinoDidriksen
Copy link
Member

...states are always added sequentially from 0 anyway.

And they are never removed out-of-order, leaving holes?

@mr-martian
Copy link
Contributor Author

...states are always added sequentially from 0 anyway.

And they are never removed out-of-order, leaving holes?

There isn't any mechanism for removing states. Any operation that decreases the number of states is actually creating a copy and then swapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants