Added 3 python files in reference to the Task Maintioned in TODO for … #30

rohan12345a · 2024-04-15T18:30:12Z

GPTModel.py: Added a new Python file GPTModel.py which implements a text classification model based on the GPT architecture. This file contains the necessary code to preprocess text data, train the GPT model, and make predictions on new text inputs.

ensemble_method.py: Introduced a new Python file ensemble_method.py that implements an ensemble learning method. This method combines predictions from multiple base models to improve overall performance and robustness.

XLNetTransformer.py: Implemented a transformer-based feature transformer in XLNetTransformer.py. This file contains a class that preprocesses text data using the XLNet tokenizer, encoding it into input IDs suitable for the XLNet model.

These additions address the TODO tasks specified in the issues tab, enhancing the project's functionality and providing more options for text classification and model ensemble techniques.

…Ensemble_strategy, GTPModel and XLNetTransformer.

NirantK

I can't quite put my finger on why: This PR feels a bit AI Generated. Nevertheless, I've added suggestions and if AI can do that — why not.

NirantK · 2024-04-19T13:53:19Z

models/GPTModel.py

+        return predicted_labels
+
+def main():
+    data_path = "/kaggle/input/imdb-dataset-of-50k-movie-reviews/IMDB Dataset.csv"


Can you please avoid hard coding the paths?

NirantK · 2024-04-19T13:54:32Z

models/GPTModel.py

+def clean(text):
+    for token in ["<br/>", "<br>", "<br>"]:
+        text = re.sub(token, " ", text)
+
+    text = re.sub("[\s+\.\!\/_,$%^*()\(\)<>+\"\[\]\-\?;:\'{}`]+|[+——！，。？、~@#￥%……&*（）]+", " ", text)
+
+    return text.lower()


Given that this clean is being re-used everywhere, do you want to refactor this out into a separate file?

NirantK · 2024-04-19T13:56:11Z

models/GPTModel.py

+def load_imdb_dataset(data_path, nrows=100):
+    df = pd.read_csv(data_path, nrows=nrows)
+    texts = df['review'].apply(clean)
+    labels = df['sentiment']
+    return texts, labels


Perhaps a good idea to not tie to this to new data loaders and use/upgrade the existing ones?

Added 3 python files in reference to the Task Maintioned in TODO for …

8f5cacc

…Ensemble_strategy, GTPModel and XLNetTransformer.

NirantK requested changes Apr 19, 2024

View reviewed changes

wabyking merged commit c3dfd26 into FreedomIntelligence:master Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added 3 python files in reference to the Task Maintioned in TODO for … #30

Added 3 python files in reference to the Task Maintioned in TODO for … #30

Uh oh!

rohan12345a commented Apr 15, 2024

Uh oh!

NirantK left a comment

Uh oh!

NirantK Apr 19, 2024

Uh oh!

NirantK Apr 19, 2024

Uh oh!

NirantK Apr 19, 2024

Uh oh!

Uh oh!

Added 3 python files in reference to the Task Maintioned in TODO for … #30

Added 3 python files in reference to the Task Maintioned in TODO for … #30

Uh oh!

Conversation

rohan12345a commented Apr 15, 2024

Uh oh!

NirantK left a comment

Choose a reason for hiding this comment

Uh oh!

NirantK Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

NirantK Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

NirantK Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!