Rename variables to match the configuration and introduce `elementScore` concept #92

0x4007 · 2024-08-20T19:02:10Z

I was reviewing this action run and its a lot more clear to me when looking at the clean JSON logs exactly whats going on here. I realized that the scoring is implemented incorrectly. The primary crediting strategy is supposed to be based on the amount of elements. The word counter is a separate scoring mechanism. This was added as an after thought in version one, but it seems that you prioritized word count over element count.

The problem is that the original emphasis on element count allows us to easily target and credit special and useful elements such as <a>, <img>, and <code>. Helpful comments generally have links, images and code samples.

With the following example, I would expect only one <p>
The comment word count is 7
We also need the ability to ignore words inside of specific tags. For example, <code>

Now you are keeping track of word count PER element which is more complex than it needs to be. Remember, the word count scoring strategy was added as an after thought for version one. Version one simply counts all the words in the comment (but also ignores words in specific elements, like code.) at the end of all the complex calculations.

Aside from the ignore capability, it doesn't care which element contains the words.

{
  "id": 2014495969,
  "content": "Wouldn't solve scenarios requiring keys or credentials",
  "url": "https://github.com/ubiquity/cloudflare-deploy-action/issues/6#issuecomment-2014495969",
  "type": "ISSUE_ASSIGNEE",
  "score": {
    "formatting": {
      "content": {
        "p": {
          "count": 7, // should be changed to `wordCount` and you should correctly count the amount of `p` tags to be `elementCount`
          "score": 1
        }
      },
      "wordValue": 0,
      "formattingMultiplier": 0
    },
    "reward": 0,
    "relevance": 0.3
  }
}

I would expect our p scoring to be 0 but our word scoring to be 0.1 for a grand total of 0.7.
Because this is the assignee writing on the issue (not the pull) we can provide credit; so the multiplier should not be 0.
Perhaps we could do a 4x multiplier, which yields 2.8 but then multiply by relevance to yield a sum of 0.84 for the comment.

Source

The text was updated successfully, but these errors were encountered:

0x4007 · 2024-08-20T19:02:34Z

Originally created at #83

0x4007 · 2024-08-20T19:17:35Z

Hope that you can prioritize this so that I can start fine tuning the incentives!

gentlementlegen · 2024-09-16T05:33:11Z

I think that Wouldn't solve scenarios requiring keys or credentials would be 8 not 7 because "Wouldn't = Would not" which is what the current regex would do but that's adjustable.

We also need the ability to ignore words inside of specific tags. For example, <code>

Is it safe to assume that a html element with a score of 0 would not have its content counting for the word total count? And if the html looks like

<code>
  <p>some content</p>
</code>

then the </p> wouldn't be counted for the final word count?

0x4007 · 2024-09-16T06:15:20Z

Yes as I recall in the old implementation if it came across an "ignored" tag then it would count the total amount of words inside and then subtract from the net score.

0x4007 · 2024-09-16T08:38:38Z

In normal front-end code I would express this logic as follows:

const codes = document.querySelectorAll(`code`);
let excludedWordCount = 0;
for (const code of codes){
  excludedWordCount += getWordCount(code);
}

// ...

const netWordScore = totalWordCount - excludedWordCount;
return netWordScore;

function getWordCount(element) {
  return element.textContent.split(" ").length;
}

But ultimately it appears that we will scrap this word counting logic all together pretty soon. The more robust way to credit for productive contributions is via semantic understanding of the corpus. This will be handled by embeddings most likely.

However I do want to retain the tag/element counter because there is definitely a high correlation with high quality comments and sample images, links, and code.

So definitely need to emphasize counting tags as per the original design. The word counting is an afterthought and will be removed when we figure out a good strategy with embeddings.

gentlementlegen · 2024-09-18T07:22:57Z

@0x4007 I think what is missing from your exemple is the handling of the list of regexes, so the result should actually look like

{
  "id": 2014495969,
  "content": "Wouldn't solve scenarios requiring keys or credentials",
  "url": "https://github.com/ubiquity/cloudflare-deploy-action/issues/6#issuecomment-2014495969",
  "type": "ISSUE_ASSIGNEE",
  "score": {
    "formatting": {
      "content": {
        "p": {
          "regex": {
            "\\b\\w+\\b": {
              "wordCount": 8,
              "wordValue": 0.1
            }
          },
          "score": 1,
          "elementCount": 1
        }
      },
      "multiplier": 1
    },
    "reward": 0.54,
    "relevance": 0.3
  }
}

and the result would be:
((wordCount * wordValue) + (score * elementCount) * multiplier) * relevance
((8 * 0.1) + (1 * 1) * 1) * 0.3 = 0.54
would that be correct? Or would you apply the relevance only to the wordCount * wordValue?

0x4007 · 2024-09-18T07:34:08Z

Relevance should only apply to word count.

Tag count should remain unaffected. For example every image should guarantee $5

gentlementlegen · 2024-09-19T05:35:04Z

@0x4007 Here is a first shot at it, let me know if that is what you expected: Meniole#14 (comment)

If that's ok, I'll fix the tests and put the PR ready for review.

0x4007 · 2024-09-19T05:38:07Z

It's really hard to say from those results. It's always been difficult to understand the results based on the results table unfortunately

gentlementlegen · 2024-09-19T05:52:43Z

The applied formula is

$$\sum_{i=0}^{n} \left( \sum_{j=0}^{n} \left(\text{wordCount}^{exponent} \times \text{wordValue} \times \text{relevance}\right) + \left(\text{score} \times \text{elementCount}\right) \right) \times multiplier + \text{task.reward} = \text{total}$$

So for the aforementioned example you described, the following happened:

$$((8^{0.85} \times 0.2 \times 0.3)+1*1)*1 = 1.351$$

So relevance is only applied to the words, and the elements have a fixed value not altered by the relevance.

0x4007 · 2024-09-19T06:13:18Z

So relevance is only applied to the words, and the elements have a fixed value not altered by the relevance.

Sounds good to me!

0x4007 added Time: <1 Day Priority: 2 (Medium) labels Aug 20, 2024

ubiquityos bot added the Price: 200 USD label Aug 20, 2024

0x4007 added Priority: 3 (High) and removed Priority: 2 (Medium) labels Aug 20, 2024

ubiquityos bot added Price: 300 USD and removed Price: 200 USD labels Aug 20, 2024

0x4007 assigned gentlementlegen Aug 20, 2024

0x4007 mentioned this issue Sep 12, 2024

Wrong "start time" target ubiquibot/user-activity-watcher#19

Closed

ubiquity-os bot unassigned gentlementlegen Sep 12, 2024

devpool-directory-superintendent bot mentioned this issue Sep 12, 2024

Rename variables to match the configuration and introduce elementScore concept ubiquity/devpool-directory#1489

Closed

This was referenced Sep 13, 2024

Unexpected Word Count #82

Closed

fix: word count #118

Merged

/allow Broken ubiquibot/assistive-pricing#34

Closed

gentlementlegen self-assigned this Sep 16, 2024

gentlementlegen linked a pull request Sep 18, 2024 that will close this issue

feat: element scoring and relevance adjustments #121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename variables to match the configuration and introduce `elementScore` concept #92

Rename variables to match the configuration and introduce `elementScore` concept #92

0x4007 commented Aug 20, 2024

0x4007 commented Aug 20, 2024

0x4007 commented Aug 20, 2024

gentlementlegen commented Sep 16, 2024

0x4007 commented Sep 16, 2024 •

edited

Loading

0x4007 commented Sep 16, 2024 •

edited

Loading

gentlementlegen commented Sep 18, 2024 •

edited

Loading

0x4007 commented Sep 18, 2024

gentlementlegen commented Sep 19, 2024

0x4007 commented Sep 19, 2024

gentlementlegen commented Sep 19, 2024

0x4007 commented Sep 19, 2024

Rename variables to match the configuration and introduce elementScore concept #92

Rename variables to match the configuration and introduce elementScore concept #92

Comments

0x4007 commented Aug 20, 2024

0x4007 commented Aug 20, 2024

0x4007 commented Aug 20, 2024

gentlementlegen commented Sep 16, 2024

0x4007 commented Sep 16, 2024 • edited Loading

0x4007 commented Sep 16, 2024 • edited Loading

gentlementlegen commented Sep 18, 2024 • edited Loading

0x4007 commented Sep 18, 2024

gentlementlegen commented Sep 19, 2024

0x4007 commented Sep 19, 2024

gentlementlegen commented Sep 19, 2024

0x4007 commented Sep 19, 2024

Rename variables to match the configuration and introduce `elementScore` concept #92

Rename variables to match the configuration and introduce `elementScore` concept #92

0x4007 commented Sep 16, 2024 •

edited

Loading

0x4007 commented Sep 16, 2024 •

edited

Loading

gentlementlegen commented Sep 18, 2024 •

edited

Loading