Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wildcards failing for Japanese #253

Closed
alecl opened this issue Dec 16, 2017 · 5 comments
Closed

wildcards failing for Japanese #253

alecl opened this issue Dec 16, 2017 · 5 comments

Comments

@alecl
Copy link

alecl commented Dec 16, 2017

English test case works with just [*]dog[*] to match any sentence with the substring dog but in Japanese this fails. Adding extra wildcard permutations of required/optional makes it work but that shouldn't be necessary.

UPDATE: Actually even the workaround doesn't work in the Javascript test runner for rsts though the workaround does work for the Java version. Even stranger.

this works in Japanese for the Java but not the Javascript interpreter to match the character for dog anywhere in the string (start, middle, end)

        # dog contained anywhere -> tell me more about dogs
        + ([*]犬[*]|*犬*|*犬[*]|[*]犬*)
        - 犬についてもっと教えてください

this doesn't work though it should

        # dog contained anywhere -> tell me more about dogs
        + [*]犬[*]
        - 犬についてもっと教えてください

Test cases below

us_partial_match:
  tests:
    - source: |
        # dog -> love dogs
        + dog
        - love dogs

        # hamster -> love hamster
        + hamster
        - love hamster

        # this isn't always correct form but just trying out matching
        # lost my dog or lost my hamster should return dog or hamster
        + *[ ]lost{weight=9}
        - <star>

        # dog contained anywhere -> tell me more about dogs
        + [*]dog[*]
        - tell me more about dogs             

    # dog alone -> love dogs
    - input: "dog"
      reply: "love dogs"

    # hamster lost -> hamster
    - input: "hamster lost"
      reply: "hamster"

    # dog lost -> dog
    - input: "dog lost"
      reply: "dog"     

    # I bought a new dog today -> tell me more about dogs
    - input: "I bought a new dog today"
      reply: "tell me more about dogs"

    # brown dog -> tell me more about dogs
    - input: "brown dog"
      reply: "tell me more about dogs"

     # brown dog -> tell me more about dogs
    - input: "dog gone it"
      reply: "tell me more about dogs"     

japanese_partial_match:
  utf8: true
  tests:
    - source: |
        # dog -> love dogs
        + 犬{weight=10}
        - 愛犬

        # this isn't always correct form but just trying out matching
        # lost my dog or lost my hamster should return dog or hamster
        + *を失った{weight=9}
        - <star>

        # dog contained anywhere -> tell me more about dogs
        + [*]犬[*]
        - 犬についてもっと教えてください

    # dog alone -> love dogs
    - input: "犬"
      reply: "愛犬"

    # lost my hamster -> hamster
    - input: "ハムスターを失った"
      reply: "ハムスター"

    # lost my dog -> dog
    - input: "犬を失った"
      reply: "犬"     

    # I bought a new dog -> tell me more about dogs
    - input: "私は新しい犬を買った"
      reply: "犬についてもっと教えてください"

    # brown dog -> tell me more about dogs
    - input: "茶色の犬"
      reply: "犬についてもっと教えてください"

    # dog collar -> tell me more about dogs
    - input: "犬の首輪"
      reply: "犬についてもっと教えてください"
@dcsan
Copy link
Contributor

dcsan commented Dec 17, 2017

@alecl we've noticed these same problems for Chinese, there are some tickets

#147
and a separate discussion here
aichaos/rivescript-wd#6

interested in your format above - is that for some type of testing?

@alecl
Copy link
Author

alecl commented Dec 17, 2017

@dcsan The test format is from https://github.com/aichaos/rsts

Thank you for the pointers.

@dcsan
Copy link
Contributor

dcsan commented Dec 18, 2017

I tried your version with the JS and the matches seem to be not reliable:

user
私が犬派
star1: 私が犬派
star2: 私が

user
私が犬派
star: 私が犬派
star1: 私が犬派
star2: 私が
star3: 派
star4: undefined

犬
犬大好き
star: handle 犬大好き
star1: handle 犬大好き
star2: handle 
star3: 大好き
star4: undefined

私の犬
犬

@alecl
Copy link
Author

alecl commented Dec 18, 2017

@dcsan right. I think that's what @kirsle had mentioned in aichaos/rivescript-wd#6 that different language Regex engines had different behavior. It didn't work for me in the JS engine either but I'm focused on the Java one so I might be OK there. Longer term would be great to consider fixing for all languages.

@ahmader
Copy link

ahmader commented Dec 29, 2017

I have the same problem on Arabic language, and I noticed when the optional is wrapped between the unicode text then it shows different results.

// hi [*] universe
+ السلام [*] والرحمة
- Matched all rules, but <star>, <star1> are undefined

// name [foo|boo] today
+ اسمي [احمد|محمد] اليوم
- Matched all rules, but <star>, <star1> are undefined

Where in my case if the trigger starts or ends by optional [*] it never matched.

// hello [*]
+ مرحبا [*]
- NEVER Matched

// [*] world
+ [*] عليكم
- NEVER Matched

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants