Skip to content
This repository was archived by the owner on Jun 5, 2023. It is now read-only.

Adapt across_aggregators for coeff_of_variation in Terms Analyzer #254

Open
detobel36 opened this issue Jul 26, 2019 · 1 comment
Open
Assignees

Comments

@detobel36
Copy link
Contributor

coeff_of_variation was thinking for within_aggregator. This parameter with across_aggregators need to be test and maybe adapted

@daanraman
Copy link
Contributor

This section should be refactored to not make coeff_of_variation a special case (terms.py):


    def _evaluate_each_aggregator_for_outliers(self, decision_frontier, batch, aggregator_value, counted_targets):
        """
        Test each document in an aggregator to detect Outlier (using "within" method)

        :param decision_frontier: value of the decision frontier
        :param batch: all batch elements
        :param aggregator_value: the aggregator value that must be evaluate
        :param counted_targets: number of element in this batch
        :return: the list of outliers and the list of document that are detected like outlier but that are withielisted
        """
        list_documents_need_to_be_removed = list()
        list_outliers = list()
        non_outlier_values = set()

        if self.model_settings["trigger_method"] == "coeff_of_variation":
            is_outlier = helpers.utils.is_outlier(decision_frontier, self.model_settings["trigger_sensitivity"],
                                                  self.model_settings["trigger_on"])
            if is_outlier:
                for ii, term_value in enumerate(batch[aggregator_value]["targets"]):
                    term_value_count = counted_targets[term_value]
                    outlier = self._create_outlier(non_outlier_values, term_value_count, aggregator_value,
                                                   term_value, decision_frontier, batch, ii)
                    if not outlier.is_whitelisted(self.model_whitelist_literals, self.model_whitelist_regexps):
                        list_outliers.append(outlier)
                    else:
                        self.nr_whitelisted_elements += 1
                        list_documents_need_to_be_removed.append(ii)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

2 participants