-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename variables to match the configuration and introduce elementScore
concept
#92
Comments
Originally created at #83 |
Hope that you can prioritize this so that I can start fine tuning the incentives! |
I think that
Is it safe to assume that a html element with a score of <code>
<p>some content</p>
</code> then the |
Yes as I recall in the old implementation if it came across an "ignored" tag then it would count the total amount of words inside and then subtract from the net score. |
In normal front-end code I would express this logic as follows: const codes = document.querySelectorAll(`code`);
let excludedWordCount = 0;
for (const code of codes){
excludedWordCount += getWordCount(code);
}
// ...
const netWordScore = totalWordCount - excludedWordCount;
return netWordScore;
function getWordCount(element) {
return element.textContent.split(" ").length;
} But ultimately it appears that we will scrap this word counting logic all together pretty soon. The more robust way to credit for productive contributions is via semantic understanding of the corpus. This will be handled by embeddings most likely. However I do want to retain the tag/element counter because there is definitely a high correlation with high quality comments and sample images, links, and code. So definitely need to emphasize counting tags as per the original design. The word counting is an afterthought and will be removed when we figure out a good strategy with embeddings. |
@0x4007 I think what is missing from your exemple is the handling of the list of regexes, so the result should actually look like {
"id": 2014495969,
"content": "Wouldn't solve scenarios requiring keys or credentials",
"url": "https://github.com/ubiquity/cloudflare-deploy-action/issues/6#issuecomment-2014495969",
"type": "ISSUE_ASSIGNEE",
"score": {
"formatting": {
"content": {
"p": {
"regex": {
"\\b\\w+\\b": {
"wordCount": 8,
"wordValue": 0.1
}
},
"score": 1,
"elementCount": 1
}
},
"multiplier": 1
},
"reward": 0.54,
"relevance": 0.3
}
} and the result would be: |
Relevance should only apply to word count. Tag count should remain unaffected. For example every image should guarantee $5 |
@0x4007 Here is a first shot at it, let me know if that is what you expected: Meniole#14 (comment) If that's ok, I'll fix the tests and put the PR ready for review. |
It's really hard to say from those results. It's always been difficult to understand the results based on the results table unfortunately |
The applied formula is So for the aforementioned example you described, the following happened: So relevance is only applied to the words, and the elements have a fixed value not altered by the relevance. |
Sounds good to me! |
I was reviewing this action run and its a lot more clear to me when looking at the clean JSON logs exactly whats going on here. I realized that the scoring is implemented incorrectly. The primary crediting strategy is supposed to be based on the amount of elements. The word counter is a separate scoring mechanism. This was added as an after thought in version one, but it seems that you prioritized word count over element count.
The problem is that the original emphasis on element count allows us to easily target and credit special and useful elements such as
<a>
,<img>
, and<code>
. Helpful comments generally have links, images and code samples.<p>
7
<code>
Now you are keeping track of word count PER element which is more complex than it needs to be. Remember, the word count scoring strategy was added as an after thought for version one. Version one simply counts all the words in the comment (but also ignores words in specific elements, like
code
.) at the end of all the complex calculations.Aside from the ignore capability, it doesn't care which element contains the words.
p
scoring to be0
but our word scoring to be0.1
for a grand total of0.7
.0
.2.8
but then multiply by relevance to yield a sum of0.84
for the comment.Source
The text was updated successfully, but these errors were encountered: