Skip to content

Support decision trees that have a single root node with no children #7

Open
@anjsimmo

Description

@anjsimmo

In the case of decision trees with only a single root node, the traverse function called by tupple_tree_conversion assumes the root node has children:

def traverse(tree, visited):
    for child in tree:
        visit = find_to_condition(visited, child)
        if len(child.children) > 0:
            visit = traverse(child.children, visited)
    return visit

This leads to the following exception for a tree that consists of only a single root node with no children:

Traceback (most recent call last):
  File "/home/anj/src/paper-decision-trees/tree_diff2/experiment2.py", line 190, in eval_keep_regrow
    similarity = rule_set_similarity(tuple_tree_conversion(full_clf), tuple_tree_conversion(batch_tree))
  File "/home/anj/src/paper-decision-trees/tree_diff2/tree_diff/tree_ruleset_conversion.py", line 69, in tuple_tree_conversion
    expected = link_dict_keys(traverse(tree.children, visited))
  File "/home/anj/src/paper-decision-trees/tree_diff2/tree_diff/tree_ruleset_conversion.py", line 38, in traverse
    return visit
UnboundLocalError: local variable 'visit' referenced before assignment

The traverse function (and associated functions) also need code cleanup:

  • The find_to_condition function called by traverse modifies the visited dictionary in place, so traverse could just return visited (which will be the same as visit).
  • The find_to_condition function and link_dict_keys functions create dictionary keys through string manipulation, which is difficult to make sense of. At a minimum they need documentation (code comments).

In the case of EFDT (which uses a different conversion function), there rule_set_similarity throws a division by zero error when the ruleset is of zero length.

def rule_set_similarity(ruleset1: Ruleset, ruleset2: Ruleset):
    …
    l = len(ruleset1.rules)
    …
    return sum(sim_d_list) / l

This leads to a ZeroDivisionError (the code has been wrapped in a try/except block as a temporary workaround):

Warn: caught division by zero in rule_set_similarity

See the following notebook for a demonstration of the issue: https://github.com/a2i2/tree_diff/blob/re-evaluation/notebooks/Similarity%20Score%20Issues%20-%20Exception%20for%20trees%20with%20single%20node.ipynb

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions