Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mark_unknown doesn't work #82

Open
dimitarsh1 opened this issue May 6, 2020 · 5 comments · May be fixed by #111
Open

mark_unknown doesn't work #82

dimitarsh1 opened this issue May 6, 2020 · 5 comments · May be fixed by #111
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@dimitarsh1
Copy link

Hi,
when translating, setting mark_unknown to False, does not impact the translation at all, thus always placing a "*" in front of unknown and "@", "#" and "/" in front of errors.

Furthermore, in the translate function in __init__.py it seem that the mark_unknown argument does not do anything; it is not invoked or used anywhere.

Any idea how to fix this?

Thanks in advance,
Dimitar

@sushain97
Copy link
Member

Yeah, this is a bug.

@sushain97 sushain97 added bug Something isn't working help wanted Extra attention is needed labels May 6, 2020
@dimitarsh1
Copy link
Author

dimitarsh1 commented May 6, 2020 via email

@dimitarsh1
Copy link
Author

OK,

so, I took some time to get what's going on in the code and what calls what. In the lttoolbox there is no support for unknown.

However, I did a quick fix the following way:
In utils.py in handle_command_with_wrapper:

` ....
text = end.decode()
input_file.write(text)
input_file.close()

if 'lt-proc' == command[0]:
    fst = initialized_wrappers[command]
    lt_proc_command, dictionary_path, arg = command[:-1], command[-1], command[1]
    fst.lt_proc(lt_proc_command, input_file.name, output_file.name)

`

replaced with
`
text = end.decode()
input_file.write(text)
input_file.close()

-->if 'lt-proc' == command[0] and "-n" != command[1]:
fst = initialized_wrappers[command]
lt_proc_command, dictionary_path, arg = command[:-1], command[-1], command[1]
fst.lt_proc(lt_proc_command, input_file.name, output_file.name)
`

Then changed also parse_mode_file:

from
def parse_mode_file(mode_path: str) -> List[List[str]]: """ .... cmd = cmd.replace('$2', '').replace('$1', '-g') ....
to
def parse_mode_file(mode_path: str, mark_unknown: bool = True) -> List[List[str]]: .... if not mark_unknown: cmd = cmd.replace('$2', '').replace('$1', '-n') else: cmd = cmd.replace('$2', '').replace('$1', '-g') ....

Then in the translate/__init__.py I changed _get_commands:

`def _get_commands(self, l1: str, l2: str, mark_unknown: bool = True) -> List[List[str]]:
"""
Args:
l1 (str)
l2 (str)

    Returns:
        List[List[str]]
    """
    if (l1, l2) not in self.translation_cmds:
        mode_path = apertium.pairs['%s-%s' % (l1, l2)]
        self.translation_cmds[(l1, l2)] = parse_mode_file(mode_path, mark_unknown)
    return self.translation_cmds[(l1, l2)]

`

and Translator.translate:
cmds = list(self._get_commands(l1, l2)) to cmds = list(self._get_commands(l1, l2, mark_unknown))

Also, the default value for mark_unknown everywhere is set to "True".

Don't know if that's a good fix - haven't had the time to delve into lttoolbox and FST, but it seems to work for me.
System is ubuntu 18.04; python is 3.6.5

Kind regards,
Dimitar

@sushain97
Copy link
Member

Could you send a proper diff/patch or PR? Your comment is really hard to read.

@ygorg
Copy link

ygorg commented May 24, 2023

Reformatting the comment from @dimitarsh1 :
In the lttoolbox there is no support for unknown.

In

if 'lt-proc' == command[0]:
in handle_command_with_wrapper:

if 'lt-proc' == command[0] and "-n" != command[1]:

Then changed also parse_mode_file:

cmd = cmd.replace('$2', '').replace('$1', '-g')

to

# Add parameter mark_unknown
def parse_mode_file(mode_path: str, mark_unknown: bool = True) -> List[List[str]]


    if not mark_unknown:
        cmd = cmd.replace('$2', '').replace('$1', '-n')
    else:
        cmd = cmd.replace('$2', '').replace('$1', '-g')

Then in the translate/__init__.py I changed _get_commands:

self.translation_cmds[(lang1, lang2)] = parse_mode_file(mode_path)

# Add parameter mark_unknown
def _get_commands(self, l1: str, l2: str, mark_unknown: bool = True) -> List[List[str]]:

        self.translation_cmds[(l1, l2)] = parse_mode_file(mode_path, mark_unknown)

and Translator.translate:

cmds = list(self._get_commands(lang1, lang2))

cmds = list(self._get_commands(l1, l2, mark_unknown))

Also, the default value for mark_unknown everywhere is set to "True".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants