Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

WoI + Wow Mutator cleanup #4204

Merged
merged 11 commits into from
Nov 22, 2021
Merged

WoI + Wow Mutator cleanup #4204

merged 11 commits into from
Nov 22, 2021

Conversation

jaseweston
Copy link
Contributor

@jaseweston jaseweston commented Nov 22, 2021

  • moved to mutators.py files
  • moved "wow" and "woi" always to the front of mutator names, it was inconsistent sometimes at front or back
  • new mutator to convert to "woi" format, so then you can e.g. use all the woi mutators for wow data

d = docs[ind]
# Guarantees that checked sentences are not split in half (as we split by space).
for i in range(len(checked_sentences)):
d = d.replace(checked_sentences[i], "||CHECKED_SENTENCE_" + str(i) + "||")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: f"||CHECKED_SENTENCE_{i}||" looks slightly better

Copy link
Contributor Author

@jaseweston jaseweston Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my opinion, asking for these kind of nit changes just increases the chance of already working code suddenly becoming buggy by accident for not much gain, + taking time to check if it still works when we're trying to like do some research, but i can try..

# last chunk
for i in range(len(checked_sentences)):
d = d.replace(
"||CHECKED_SENTENCE_" + str(i) + "||", checked_sentences[i]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above for the string.

)

def message_mutation(self, message: Message) -> Message:
chunk_sz = self.opt.get('woi_doc_chunk_size', 500)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the mistakes that we had with setting flags before, may be we should have this as something like

if not message.get('woi_doc_chunk_size'):
    logging.warning("The --woi-doc-chunk-size flag is received as None, using the default value of X")

It has happened to me that I set a flag without knowing that it was ignored. This at least warns about issue like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, what we actually need to do is fix the bug in parlai .. i was hoping kurt would do this...

if remove[i] == 0:
new_docs.append(docs[i])
else:
# We may still keep the doc if it contains the gold checked sentence(s).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case should we mark one of the other docs to be removed instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll just have one less, i dont think it's a big deal

@jaseweston jaseweston merged commit e86cc6b into main Nov 22, 2021
@jaseweston jaseweston deleted the k2r4 branch November 22, 2021 16:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants