Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.1.1 #274

Merged
merged 44 commits into from
Aug 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
60e3845
positive bert docs
agshruti12 Jul 6, 2024
782cac1
mimicry bert docs
agshruti12 Jul 6, 2024
c2814cd
copied mimicry bert to moving mimicry as template
agshruti12 Jul 6, 2024
07d09ff
moving mimicry docs
agshruti12 Jul 7, 2024
f601576
updating function & content word accommodation docs
agshruti12 Jul 8, 2024
4f397b8
updating content word docs
agshruti12 Jul 26, 2024
91c5eb3
merged main changes
agshruti12 Jul 26, 2024
6914fa9
resolve merge
xehu Jul 26, 2024
b4b49ad
fix merge with amy's docs
xehu Jul 31, 2024
173d053
edits to content word accommodation
xehu Aug 1, 2024
53ba2f6
edits to other mimicry pages
xehu Aug 1, 2024
0679cac
technical docstrings + conceptual fixes
agshruti12 Aug 6, 2024
74ade50
removing unfinished files
agshruti12 Aug 7, 2024
32da015
update docs
xehu Aug 8, 2024
c1f5d39
Merge branch 'main' into shruti/docs
xehu Aug 8, 2024
c9d946a
upgrade some packages and check for requirements conflicts
xehu Aug 8, 2024
83c33d3
update check embeddings to resolve error in which vectors were being …
xehu Aug 8, 2024
3e204b8
reset version number
xehu Aug 8, 2024
800abed
Priya/docs 2 (#269)
PriyaDCosta Aug 8, 2024
b2ca66e
Merge branch 'release_v0_1_1' of https://github.com/Watts-Lab/team_co…
xehu Aug 8, 2024
4252fda
merge in shruti's docs
xehu Aug 8, 2024
0be41db
update Positivity (BERT) feature name
xehu Aug 8, 2024
0cde102
update feature dict.
xehu Aug 8, 2024
7dbb10d
update documentation and add flash to pyproject
xehu Aug 8, 2024
c0d0c8c
update readme
xehu Aug 8, 2024
3e28b1d
slight changes to readme
xehu Aug 8, 2024
c35c144
add sponsor info to readme
xehu Aug 8, 2024
8af37e7
fix errors in priya and shruti's documentation and ensure docs build …
xehu Aug 8, 2024
28a4f0d
correct typo in readme
xehu Aug 8, 2024
1f54436
add readme contribution link
xehu Aug 8, 2024
ce0df32
add github getting started guide
xehu Aug 8, 2024
3ae9228
update conf
xehu Aug 8, 2024
8e2ea9e
add nltk resources to ensure feature builder works
xehu Aug 8, 2024
3b6053a
add nltk resources to ensure feature builder works
xehu Aug 8, 2024
bab88b4
update positivity related features
xehu Aug 8, 2024
7ec7c7a
update feature_dict with latest writeups and update examples with new…
xehu Aug 8, 2024
a63b273
update feature_dict with latest writeups and update examples with new…
xehu Aug 8, 2024
583dcb7
update documentation to clarify that nltk needs to be downloaded in p…
xehu Aug 8, 2024
b1b587b
small commit to make readthedocs.yml python syntax consistent
xehu Aug 8, 2024
1957a35
update docs a little more
xehu Aug 8, 2024
adb5161
change heading for rtd
xehu Aug 8, 2024
e088e12
fix dependency issues with spacy and nltk
sundy1994 Aug 8, 2024
6c171e9
clarify docs
xehu Aug 9, 2024
1211d75
update docs to include nltk one-liner
xehu Aug 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 78 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@
[![License](https://img.shields.io/badge/License-MIT-blue)](#license)

# The Team Communication Toolkit
The Team Communication Toolkit is a research project and Python package that aims to make it easier for social scientists to explore text-based conversational data.
The Team Communication Toolkit is a Python package that makes it easy for social scientists to analyze and understand *text-based communication data*. Our aim is to facilitate seamless analyses of conversational data --- especially among groups and teams! --- by providing a single interface for researchers to generate and explore dozens of research-backed conversational features.

We are a research project created by the [Computational Social Science Lab at UPenn](https://css.seas.upenn.edu/) and funded by the [Wharton AI and Analytics Initiative](https://ai-analytics.wharton.upenn.edu/).

<div align="center">

Expand All @@ -13,49 +15,99 @@ The Team Communication Toolkit is a research project and Python package that aim

</div>

## Getting Started
# Getting Started

To use our tool, please ensure that you have Python >= 3.10 installed and a working version of [pip](https://pypi.org/project/pip/), which is Python's package installer. Then, in your local environment, run the following:

If you are new to this repository, welcome! Please follow the steps below to get started.
```sh
pip install team_comm_tools
```

### Step 1: Clone the Repo
First, clone this repository into your local development environment:
You will also need to ensure that Spacy and NLTK are installed in addition to the required dependencies. The Spacy model should download `en_core_web_sm` automatically upon install. If you get an error that en_core_web_sm is not found, you should run the following in your terminal:

```sh
spacy download en_core_web_sm
```
git clone https://github.com/Watts-Lab/team_comm_tools.git

Additionally, we require several NLTK dependencies. If you don't have them in your environment, run this one-liner in your terminal:

```sh
import_nltk
```

### Step 2: Download Dependencies
## Import Recommendations: Virtual Environment and Pip

**Python Version**: We require >= `python3.10` when running this repository.
**We strongly recommend using a virtual environment in Python to run the package.** We have several specific dependency requirements. One important one is that we are currently only compatible with numpy < 2.0.0 because [numpy 2.0.0 and above](https://numpy.org/devdocs/release/2.0.0-notes.html#changes) made significant changes that are not compatible with other dependencies of our package. As those dependencies are updated, we will support later versions of numpy.

We *strongly* recommend using a virtual environment to install the dependencies required for the project.
**We also strongly recommend using thet your version of pip is up-to-date (>=24.0).** There have been reports in which users have had trouble downloading dependencies (specifically, the Spacy package) with older versions of pip. If you get an error with downloading `en_core_web_sm`, we recommend updating pip.

Running the following script will install all required packages and dependencies:

## Using the FeatureBuilder
After you import the package and install dependencies, you can then use our tool in your Python script as follows:

```python
from team_comm_tools import FeatureBuilder
```
./setup.sh

*Note*: PyPI treats hyphens and underscores equally, so `pip install team_comm_tools` and `pip install team-comm-tools` are equivalent. However, Python does NOT treat them equally, and **you should use underscores when you import the package, like this: `from team_comm_tools import FeatureBuilder`**.

Once you import the tool, you will be able to declare a FeatureBuilder object, which is the heart of our tool. Here is some sample syntax:

```python
# this section of code declares a FeatureBuilder object
my_feature_builder = FeatureBuilder(
input_df = my_pandas_dataframe,
# this means there's a column in your data called 'conversation_id' that uniquely identifies a conversation
conversation_id_col = "conversation_id",
# this means there's a column in your data called 'speaker_id' that uniquely identifies a speaker
speaker_id_col = "speaker_id",
# this means there's a column in your data called 'messagae' that contains the content you want to featurize
message_col = "message",
# this means there's a column in your data called 'timestamp' that conains the time associated with each message; we also accept a list of (timestamp_start, timestamp_end), in case your data is formatted in that way.
timestamp_col= "timestamp",
# this is where we'll cache things like sentence vectors; this directory doesn't have to exist; we'll create it for you!
vector_directory = "./vector_data/",
# give us names for the utterance (chat), speaker (user), and conversation-level outputs
output_file_path_chat_level = "./my_output_chat_level.csv",
output_file_path_user_level = "./my_output_user_level.csv",
output_file_path_conv_level = "./my_output_conversation_level.csv",
# if true, this will combine successive turns by the same speaker.
turns = False,
# these features depend on sentence vectors, so they take longer to generate on larger datasets. Add them in manually if you are interested in adding them to your output!
custom_features = [
"(BERT) Mimicry",
"Moving Mimicry",
"Forward Flow",
"Discursive Diversity"
],
)

# this line of code runs the FeatureBuilder on your data
my_feature_builder.featurize(col="message")
```

### Step 3: Run the Featurizer
At this point, you should be ready to run the featurizer! Navigate to the `examples` folder, and use the following command:
### Data Format
We accept input data in the format of a Pandas DataFrame. Your data needs to have three (3) required input columns and one optional column.

```
python3 featurize.py
```
This calls the `featurizer.py` file, which declares a FeatureBuilder object for different dataset of interest, and featurizes them using our framework. The `featurize.py` file provides an end-to-end worked example of how you can declare a FeatureBuilder and call it on data; equally, you can replace this file with any file / notebook of your choosing, as long as you import the FeatureBuilder module.
1. A **conversation ID**,
2. A **speaker ID**,
3. A **message/text input**, which contains the content that you want to get featurized;
4. (Optional) a **timestamp**. This is not necessary for generating features, but behaviors related to the conversation's pace (for example, the average delay between messages; the "burstiness" of a conversation) cannot be measured without it.

## Contributing Code and Automated Unit Testing
If you would like to contribute to the repository, we have implemented a [Pull Request Template](https://github.com/Watts-Lab/team_comm_tools/blob/main/.github/pull_request_template.md) with a basic checklist that you should consider when adding code (e.g., improving documentation or developing a new feature).
### Featurized Outputs: Levels of Analysis

We have also implemented automated unit testing of all code (which runs upon every push to GitHub), allowing us to ensure that new features function as expected and do not break any previous features. The points below highlight key steps to using our automated test suite.
Notably, not all communication features are made equal, as they can be defined at different levels of analysis. For example, a single utterance ("you are great!") may be described as a "positive statement." An individual who makes many such utterances may be described as a "positive person." Finally, the entire team may enjoy a "positive conversation," an interaction in which everyone speaks positively to each other. In this way, the same concept of positivity can be applied to three levels:

1. Draft test inputs (`conversation_num`, `speaker`, `message`) and expected outputs for your feature.
1. The **utterance**,
2. The **speaker**, and
3. The **conversation**

- For example, "This is a test message." should return 5 for `num_words` at the chat level (note that `conversation_num` and `speaker` have no effect on the ultimate result, so they can be chosen arbitrarily).
- Testing a conversation level feature, say `discursive_diversity`, requires a series of chats rather than just one chat. For example, "This is a test message." (speaker 1), "This is a test message." (speaker 1), "This is a test message." (speaker 2), "This is a test message." (speaker 2), within the same conversation, should return 0. Note that the `conversation_num` for each new test should be distinct from all previous `conversation_num`, even if the feature being tested is different.
**We generate a separate output file for each level.** When you declare a FeatureBuilder, you will need to specify an output path for each level of analysis.

2. Once you have test inputs, add each CHAT (and its associated conversation_num and speaker) as a separate row in either `test_chat_level.csv` or `test_conv_level.csv`, within `./tests/data/cleaned_data`. The format of the CSV is as follows: `id, conversation_num, speaker_nickname, message, expected_column, expected_value`, where `expected_column` is the feature name (i.e. num_words).
For more information, please refer to the [Introduction on our Read the Docs Page](https://conversational-featurizer.readthedocs.io/en/latest/intro.html#intro).

3. Push all your changes to GitHub, including feature development and test dataset additions. Go under the "Actions" tab in the toolbar. Notice there's a new job running called "Testing-Features". A green checkmark at the conclusion of this job indicates all new tests have passed. A red cross means some test has failed. Navigate to the uploaded "Artifact" (near the bottom of the status page) for list of failed tests and their associated inputs/outputs.
# Learn More
Please visit our website, [https://teamcommtools.seas.upenn.edu/](https://teamcommtools.seas.upenn.edu/), for general information about our project and research. For more detailed documentation on our features and examples, please visit our [Read the Docs Page](https://conversational-featurizer.readthedocs.io/en/latest/).

4. Debug and iterate!
# Becoming a Contributor
If you would like to make pull requests to this open-sourced repository, please read our [GitHub Repo Getting Started Guide](/github_repo_getting_started.md). We welcome new feature contributions or improvements to our framework.
5 changes: 5 additions & 0 deletions docs/.readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,11 @@ build:
pre_install: # Stuff in src/requirements.txt depends on en_core_web_sm, which in turn depends on spacy
- pip install spacy==3.7.2
- bash -c "python3 -m spacy download en_core_web_sm"
post_install: # Install NLTK resources after the install step
- python3 -m nltk.downloader nps_chat
- python3 -m nltk.downloader punkt
- python3 -m nltk.downloader stopwords
- python3 -m nltk.downloader wordnet

# Build documentation in the "docs/" directory with Sphinx
sphinx:
Expand Down
Binary file modified docs/build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/build/doctrees/examples.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/feature_builder.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/basic_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/burstiness.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/certainty.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/discursive_diversity.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/fflow.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/get_all_DD_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/get_user_network.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/hedge.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/index.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/info_exchange_zscore.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/information_diversity.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/lexical_features_v2.doctree
Binary file not shown.
Binary file not shown.
Binary file modified docs/build/doctrees/features/other_lexical_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/politeness_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/politeness_v2.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/politeness_v2_helper.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/question_num.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/readability.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/reddit_tags.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/temporal_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/textblob_sentiment_analysis.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/turn_taking_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/user_centroids.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/variance_in_DD.doctree
Binary file not shown.
Binary file not shown.
Binary file modified docs/build/doctrees/features/word_mimicry.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features_conceptual/TEMPLATE.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified docs/build/doctrees/features_conceptual/index.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified docs/build/doctrees/index.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/intro.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/assign_chunk_nums.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/calculate_chat_level_features.doctree
Binary file not shown.
Binary file not shown.
Binary file modified docs/build/doctrees/utils/calculate_user_level_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/check_embeddings.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/gini_coefficient.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/preload_word_lists.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/preprocess.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/summarize_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/zscore_chats_and_conversation.doctree
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/build/html/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 75f637156aeb7a84b151cb277f439962
config: 9a01a2cd3d4384710101b4a99edd7683
tags: 645f666f9bcd5a90fca523b33c5a78b7
22 changes: 17 additions & 5 deletions docs/build/html/_sources/examples.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,24 @@ To use our tool, please ensure that you have Python >= 3.10 installed and a work

pip install team_comm_tools

You will also need to ensure that Spacy and NLTK are installed in addition to the required dependencies. The Spacy model should download ``en_core_web_sm`` automatically upon install. If you get an error that en_core_web_sm is not found, you should run the following in your terminal:

You will also need to manually install some additional required dependencies to set up the package. In your terminal, run the following:
.. code-block::

spacy download en_core_web_sm

Additionally, we require several NLTK dependencies. If you don't have them in your environment, run this one-liner in your terminal:

.. code-block::

./setup.sh

Once complete, you should see, "Installation and requirements check completed successfully." This means you are ready to go!
import_nltk

Import Recommendations: Virtual Environment and Pip
+++++++++++++++++++++++++++++++++++++++++++++++++++++

**We strongly recommend using a virtual environment in Python to run the package.** We have several specific dependency requirements. One important one is that we are currently only compatible with numpy < 2.0.0 because `numpy 2.0.0 and above <https://numpy.org/devdocs/release/2.0.0-notes.html#changes>`_ made significant changes that are not compatible with other dependencies of our package. As those dependencies are updated, we will support later versions of numpy.

**We also strongly recommend that your version of pip is up-to-date (>=24.0).** There have been reports in which users have had trouble downloading dependencies (specifically, the Spacy package) with older versions of pip. If you get an error with downloading ``en_core_web_sm``, we recommend updating pip.

Using the Package
******************
Expand All @@ -31,14 +41,16 @@ After you install it, the Team Communication Toolkit can be imported at the top
Importing the Package
++++++++++++++++++++++

At the top of your Python script, write the following:
After you import the package and install dependencies, you can then use our tool in your Python script as follows:

.. code-block:: python

from team_comm_tools import FeatureBuilder

Now you have access to the :ref:`feature_builder`. This is the main class that you'll need to interact with the Team Communication Toolkit.

*Note*: PyPI treats hyphens and underscores equally, so "pip install team_comm_tools" and "pip install team-comm-tools" are equivalent. However, Python does NOT treat them equally, and **you should use underscores when you import the package, like this: from team_comm_tools import FeatureBuilder**.

Running the FeatureBuilder on Your Data
++++++++++++++++++++++++++++++++++++++++

Expand Down
4 changes: 3 additions & 1 deletion docs/build/html/_sources/features/index.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,10 @@ Once utterance-level features are computed, we compute conversation-level featur
burstiness
information_diversity
../utils/gini_coefficient
discursive_diversity
get_all_DD_features
discursive_diversity
variance_in_DD
within_person_discursive_range
turn_taking_features

Speaker- (User) Level Features
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _TEMPLATE:
.. _TEMPLATE:

FEATURE NAME
============
Expand Down
Loading
Loading