Adapt across_aggregators for coeff_of_variation in Terms Analyzer #254

detobel36 · 2019-07-26T09:59:37Z

coeff_of_variation was thinking for within_aggregator. This parameter with across_aggregators need to be test and maybe adapted

The text was updated successfully, but these errors were encountered:

daanraman · 2019-12-04T11:34:54Z

This section should be refactored to not make coeff_of_variation a special case (terms.py):


    def _evaluate_each_aggregator_for_outliers(self, decision_frontier, batch, aggregator_value, counted_targets):
        """
        Test each document in an aggregator to detect Outlier (using "within" method)

        :param decision_frontier: value of the decision frontier
        :param batch: all batch elements
        :param aggregator_value: the aggregator value that must be evaluate
        :param counted_targets: number of element in this batch
        :return: the list of outliers and the list of document that are detected like outlier but that are withielisted
        """
        list_documents_need_to_be_removed = list()
        list_outliers = list()
        non_outlier_values = set()

        if self.model_settings["trigger_method"] == "coeff_of_variation":
            is_outlier = helpers.utils.is_outlier(decision_frontier, self.model_settings["trigger_sensitivity"],
                                                  self.model_settings["trigger_on"])
            if is_outlier:
                for ii, term_value in enumerate(batch[aggregator_value]["targets"]):
                    term_value_count = counted_targets[term_value]
                    outlier = self._create_outlier(non_outlier_values, term_value_count, aggregator_value,
                                                   term_value, decision_frontier, batch, ii)
                    if not outlier.is_whitelisted(self.model_whitelist_literals, self.model_whitelist_regexps):
                        list_outliers.append(outlier)
                    else:
                        self.nr_whitelisted_elements += 1
                        list_documents_need_to_be_removed.append(ii)

daanraman added to triage and removed to triage labels Nov 14, 2019

olivierbuez assigned daanraman Nov 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt across_aggregators for coeff_of_variation in Terms Analyzer #254

Adapt across_aggregators for coeff_of_variation in Terms Analyzer #254

detobel36 commented Jul 26, 2019

daanraman commented Dec 4, 2019

Adapt across_aggregators for coeff_of_variation in Terms Analyzer #254

Adapt across_aggregators for coeff_of_variation in Terms Analyzer #254

Comments

detobel36 commented Jul 26, 2019

daanraman commented Dec 4, 2019