pip install alpino-query
When running locally without installing, instead of alpino-query
use python -m alpino_query
.
Parse a tokenized sentence using the Alpino instance running on gretel.hum.uu.nl.
For example:
alpino-query parse Dit is een voorbeeldzin .
Note that the period is a separate token.
It also works when the sentence is passed as a single argument.
alpino-query parse "Dit is een voorbeeldzin ."
Mark which part of the treebank should selected for filtering. It has three inputs:
- Lassy/Alpino XML
- the tokens of the sentence
- for each token specify the properties which should be marked
For example:
alpino-query mark "$(<tests/data/001.xml)" "Dit is een voorbeeldzin ." "pos pos pos pos pos"
It is also possible to mark multiple properties for a token, this is done by separating them with a comma. Each of these can also be specified to be negated. These will then be marked as 'exclude' in the tree.
alpino-query mark "$(<tests/data/001.xml)" "Dit is een voorbeeldzin ." "pos pos,-word,rel pos pos pos"
Generates a subtree containing only the marked properties. It will also contain additional attributes to mark that properties should be excluded and/or case sensitive.
The second argument can be empty, cat
, rel
or both (i.e. catrel
or cat,rel
). This indicates which attributes should be removed from the top node. When only one node is left in the subtree, this argument is ignored.
alpino-query subtree "$(<tests/data/001.marked.xml)" cat
Generates an XPath to query a treebank from the generated subtree. Second argument indicates whether a query should be generated which is order-sensitive.
alpino-query xpath "$(<tests/data/001.subtree.xml)" 0
from alpino_query import AlpinoQuery
tokens = ["Dit", "is", "een", "voorbeeldzin", "."]
attributes = ["pos", "pos,-word,rel", "pos", "pos", "pos"]
query = AlpinoQuery()
alpino_xml = query.parse(tokens)
query.mark(alpino_xml, tokens, attributes)
print(query.marked_xml) # query.marked contains the lxml Element
query.generate_subtree(["rel", "cat"])
print(query.subtree_xml) # query.subtree contains the lxml Element
query.generate_xpath(False) # True to make order sensitive
print(query.xpath)
When querying a node this could be exclusive in multiple ways. For example:
- a node should not be a noun
node[@pos!="noun"]
- it should not have a node which is a noun
not(node[@pos="noun"])
The first statement does require the existence of a node, whereas the second also holds true if there is no node at all. When a token is only exclusive (e.g. not a noun) a query of the second form will be generated, if a token has both inclusive and exclusive properties a query of the first form will be generated.
@cat
and @rel
are always preserved for nodes which have children. The only way for this to be dropped is for when all the children are removed by specifying the na
property for the child tokens.
pip install twine
python setup.py sdist
twine upload dist/*
This was original part of the GrETEL codebase and is (still) used by its Example Based Search functionality.
- Liesbeth Augustinus and Vincent Vandeghinste: concept and initial implementation
- Bram Vanroy: GrETEL 3 improvements and design
- Sheean Spoel: rewritten in Python, moved to separate library and added some improvements
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (cc-by-sa-4.0). See the LICENSE file for license rights and limitations.