-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow "assume sorted" option #5
Comments
By default, does rdf2smw (pre-release 0.6 version) have this option set to true? If so, it would explain why my triples were not imported as I expected. Before running the rdf2smw script, one could always sort the data using the unix sort command.
@samuell Would it be possible to incorporate a similar unix command into rdf2smw? Do you think doing so would dramatically impact the performance of the code? |
Thanks for the interesting suggestion @ThiviyanThanapalasingam ! I think including the unix sort command would make the software drastically more complex (because of interfacing between Go and C-code), and harder to maintain, though. But since the sort command is so widely available, on Linux, Mac, and now even on Windows, with the Windows Subsystem for Linux (WSL), one could enable a workflow where the user first sorts the file using |
I see. Thanks for the explanation @samuell. In that case, it would be a good idea to let the OS do the heavy lifting. The command ( If you are happy with the changes that I have proposed, I would like to contribute to this project by implementing it. Please let me know what the protocol is for contributing (i.e. Do I work on the master branch and then send you a pull request?) |
Thanks for the input @ThiviyanThanapalasingam ! I'll look at including that in the README shortly. Reg. contributing, awsome, that is much welcome! I think I should set up a develop branch, and have released code in master, for the future. So, if you start working, you could create a new develop branch in your repo, and I'll fix with the develop branch shortly. |
Recommended for Go-packages, since Go lacks an official dependency manager, and most people just pull in the master branch of libraries :) |
We could do a slightly different processing algorithm if we can assume the data is sorted (which should be much more efficient using a pure text sorting tool anyway, for n-triples files), which will require far less memory and probably be faster.
The text was updated successfully, but these errors were encountered: