Skip to content

Output of GCM is empty #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cesaSalaam opened this issue Aug 17, 2021 · 4 comments
Open

Output of GCM is empty #7

cesaSalaam opened this issue Aug 17, 2021 · 4 comments

Comments

@cesaSalaam
Copy link

cesaSalaam commented Aug 17, 2021

Hello,

I am able to run through the aligner and the pregcm stages of the toolkit but when it comes to gcm stage, the output is empty.

Below is a screenshot of the config file.
Screen Shot 2021-08-17 at 1 48 02 PM

Also, It seems that the problem is connected to def run_in_try(func, pipe, params): try: #print(params) ret = func(params) except Exception as e: ret = "fail" pipe.send(ret) pipe.close()

It seems to be returning fail consistently. Is there something that I am missing?

A file "out-cm-en-de.txt" is created, but nothing is in it.

@cesaSalaam
Copy link
Author

As I dig deeper, It seems that some please is generating a nonType
Screen Shot 2021-08-17 at 2 28 58 PM

@cesaSalaam
Copy link
Author

Hello, is anyone able to help me with this?

@mohdsanadzakirizvi
Copy link
Contributor

Hey Cesa,

The GCM isn't perfect and sometimes gives empty output if either the quality of alignments or parse trees isn't good.

The quality of parse trees should be something that you can check based on which parser you're using (stanford or benepar) and how well it supports the languages you're generating the parse trees for.

The main thing that you might want to check is the quality of the alignments. The fast_align aligner that we're using is a statistical aligner, which means is if you do not have data of decent length (>10k parallel sentences) then the quality of alignments learned from such a data wouldn't be of much help to the GCM.

Hope this helps,
Sanad

@mohdsanadzakirizvi
Copy link
Contributor

Also in language_1 and language_2 fields, you have to put complete name of the language as shown in the comments in the config file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants