Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structure way defining of corpus data #469

Closed
vkosuri opened this issue Dec 6, 2016 · 16 comments
Closed

Structure way defining of corpus data #469

vkosuri opened this issue Dec 6, 2016 · 16 comments
Labels

Comments

@vkosuri
Copy link
Collaborator

vkosuri commented Dec 6, 2016

@gunthercox Is there any specific way to define corpus data for chatter bot?

for example

{ "statement": "response"}

If so how chatterbot will process this text

{
  "description": "Birds of Antarctica, grouped by family",
  "source": "https://en.wikipedia.org/wiki/List_of_birds_of_Antarctica",
  "birds": [
    {
      "family": "Albatrosses",
      "members": [
        "Wandering albatross",
        "Grey-headed albatross",
        "Black-browed albatross",
        "Sooty albatross",
        "Light-mantled albatross"
      ]
   }
 ]
}
@gunthercox
Copy link
Owner

Currently, ChatterBot's corpus format is essentially just a list of dialog sets. For example"

{
    "conversations": [
        [
                "...",
                "...",
                "...",
                "...",
        ],
        [
                "...",
                "...",
                "...",
                "...",
        ]
    ]
}

It would be a good idea to modify the format so that it can store more information as you suggested.

@kennetham
Copy link

Question: as per suggested, the corpus should be in json format. How do you train the corpus in the main application? Let's say I exported the training corpus data, how do you retrain it on another bot? There isn't any examples on that?

I basically wrote my own "adapter" to read the corpus in json format, then load it in JSON pair to train.

@gunthercox
Copy link
Owner

gunthercox commented Dec 7, 2016

@kennetham Right now you can export your chat bot's knowledge as a JSON file: http://chatterbot.readthedocs.io/en/latest/corpus.html?highlight=export#exporting-your-chat-bot-s-database-as-a-training-corpus

The ability to specify a file path for a training corpus will be added in #467

@vkosuri
Copy link
Collaborator Author

vkosuri commented Dec 10, 2016

@gunthercox I am planning to write PR for above two enhancements, do any have any ETA for these two issues #469 and #467?

@gunthercox
Copy link
Owner

@vkosuri I haven't started working on anything to allow custom paths for corpus data (#467) yet, feel free to start if you are interested in working on it.


For this ticket (#469) i'm in the process of researching the formats of other existing data corpora. I'm interested to see if there are any design patterns that might be beneficial to follow.

@gunthercox
Copy link
Owner

gunthercox commented Dec 13, 2016

I just wanted to post a link for later reference. This is for the current work-in-progress concept for the future version of ChatterBot's dialog corpus files. I'm still considering other ideas so this document will be updated in the future.

https://github.com/gunthercox/ChatterBot/wiki/ChatterBot-Corpus-Specification

@vkosuri
Copy link
Collaborator Author

vkosuri commented Dec 13, 2016

This looks good to me. Question

  1. How chatterbot will responds to previous statements using suggested model?
  2. If not, what are the other methods to achieve this?

@gunthercox
Copy link
Owner

In this model, responses are indicated by consecutive statements in each list.

@vkosuri
Copy link
Collaborator Author

vkosuri commented Dec 14, 2016

Apologies I am making this conversation longer 🔢 , From above statement, can i assume If the question multiple answer, do i need to two lists for same answer?


How do i make programmable responses? Is there way the If chatterbot not found the answer in corpus, suggest chatterbot to look for programmable response

@gunthercox
Copy link
Owner

gunthercox commented Dec 14, 2016

No problem, any questions you have about it are helpful because it lets me consider things that I might not have thought about. If you have any other questions, please ask them. I want to get as much feedback on the design as possible before committing to it.

Also, you are correct. For representing multiple responses to the same input, the input will have to be listed multiple times. I designed it this way to avoid deep nested lists of responses which might be difficult for developers to read, and more intensive for programs to traverse.

For programmable responses, I usually recommend some form of a customized logic adapter. However, I have seen valid cases where there are, for example, wildcards in statements. So a if a statement is something like: "My favorite color is {color}". In this case, color can be any valid color. These wildcards are something that is well supported by AIML, but it is currently something that is not well supported by ChatterBot. I will definitely look into the possibility of supporting AIML in the new corpus format, or something similar.

@vkosuri
Copy link
Collaborator Author

vkosuri commented Dec 15, 2016

By looking into http://www.alicebot.org/aaa.html it was amazing like chatterbot. If i want make a bot like alice using chatterbot, what are things/algorithms need update/write/create?

@gunthercox
Copy link
Owner

Were there any features that you saw in Alice bot that ChatterBot doesn't have?

@vkosuri
Copy link
Collaborator Author

vkosuri commented Dec 16, 2016

some of them i have found, please point/correct me if it already there

Bot Properties

I think it is good idea if we have similar kind of feature.
https://code.google.com/archive/p/aiml-en-us-foundation-alice/wikis/BotProperties.wiki

Preprocessing statements

This is my first choice of implementation, It's awesome feature
https://code.google.com/archive/p/aiml-en-us-foundation-alice/wikis/PreProcessor.wiki

Template mechanism

I am assuming this statements has template, if it correct could you please share your views on this?

<template>As a <bot name="age"/> year old <bot name="gender"/> I am not really interested in that discussion.</template>

Reusing of corpus data

Other than i am also looking into some part/entire/few statements can i reuse it in any other corpus data

Corpus search order

Are we fallowing any order to search corpus database?

@a-elhaddad
Copy link
Contributor

vkosuri did you implement aiml or add it to chatterBot ?

@vkosuri
Copy link
Collaborator Author

vkosuri commented Feb 8, 2017

We have added some of aiml copurs into chatterbot-corpus.

@lock
Copy link

lock bot commented Mar 10, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants