Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yuxuan/task 3 4 #259

Merged
merged 18 commits into from
Jul 24, 2024
Merged

Yuxuan/task 3 4 #259

merged 18 commits into from
Jul 24, 2024

Conversation

sundy1994
Copy link
Collaborator

@sundy1994 sundy1994 commented Jul 24, 2024

Basic Info

This PR is based on Helena's work on Task 3 and 4. It creates a feature dictionary and allows users to choose whether to include some of the features.

@xehu please review:

  • If the feature dictionary includes all features with correct info
  • If the dependencies/requires vect_data/.. are correct
  • In feature builder, if the default features look good

@sundy1994 sundy1994 requested a review from xehu July 24, 2024 12:32
@xehu
Copy link
Collaborator

xehu commented Jul 24, 2024

Remaining Actions in Review

  1. Connect the new feature dictionary to the way that we run tests. We should be able to verify that we have the right number(s) of features when we pull from the dictionary:
test_chat_df =  pd.read_csv("../output/chat/test_chat_level_chat.csv")
test_conv_df =  pd.read_csv("../output/conv/test_conv_level_conv.csv")
num_features_chat = test_chat_df.columns.size - 7
num_tested_chat = test_chat_df['expected_column'].nunique()
num_features_conv = test_conv_df.columns.size - 7
num_tested_conv = test_conv_df['expected_column'].nunique()

with open('test.log', 'w') as f:
    f.write(f'Tested {num_tested_chat} features out of {num_features_chat} chat level features: {num_tested_chat/num_features_chat * 100:.2f}% Coverage!\n')
    f.write(f'Tested {num_tested_conv} features out of {num_features_conv} conv level features: {num_tested_conv/num_features_conv * 100:.2f}% Coverage!\n')
    pass
  1. Update the Pull Request Template to as people to update the feature dictionary every time they make a PR.
  2. Connect the flags for vector data and sentiment data in the feature dictionary to check_embeddings.py --- in other words, ONLY check the embeddings if we have at least one feature that requires them.

Copy link
Collaborator

@xehu xehu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good!

@@ -69,6 +70,8 @@ def __init__(
output_file_path_chat_level: str,
output_file_path_user_level: str,
output_file_path_conv_level: str,
custom_features: list = [],
# excluded_features: list = [],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're not doing the excluded feature list, we should remove the comment-out!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

"Information Exchange",
"LIWC and Other Lexicons",
"Questions",
# TODO -- the below 3 functions are redundant because they share a function
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we need to address these comments

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

"User Level Aggregates"
]

# self.custom_features = [ # fill for testing purpose, all requires vect_data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

if func not in self.feature_methods_conv:
self.feature_methods_conv.append(func)

# set which chat / conversation feature methods we want to calculate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

@@ -37,6 +37,14 @@
output_file_path_chat_level = "../output/chat/test_chat_level_chat.csv",
output_file_path_user_level = "../output/user/test_chat_level_user.csv",
output_file_path_conv_level = "../output/conv/test_chat_level_conv.csv",
custom_features = [ # all requires vect_data
"Function Word Accommodation",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function and Content Word Accommodation don't require vector data and should be in the default set

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

@@ -47,6 +55,14 @@
output_file_path_chat_level = "../output/chat/test_conv_level_chat.csv",
output_file_path_user_level = "../output/user/test_conv_level_user.csv",
output_file_path_conv_level = "../output/conv/test_conv_level_conv.csv",
custom_features = [ # all requires vect_data
"Function Word Accommodation",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function and Content Word Accommodation don't require vector data and should be in the default set

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

@xehu xehu merged commit b5c72c3 into main Jul 24, 2024
1 check passed
@xehu xehu deleted the yuxuan/task_3_4 branch July 24, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants