Skip to content

Commit

Permalink
Convokit 3.0 Mega Pull Request (#199)
Browse files Browse the repository at this point in the history
* fix use of mutability in Coordination transformer.

* run black formatter

* fixed coordination with efficient implementation

* comments for changes

* metadata field deepcopy

* documentation and website update for V3.0

* get dataframe mutation fix

* fix get dataframe mutability

* modify 3.0 documentation

* revert get dataframe fixes

* pairer maximize pair mode fix

* backendMapper, config documentation

* goodbye to python3.7

* release date update

* remove all storage reference

* update release date

---------

Co-authored-by: Cristian Danescu-Niculescu-Mizil <cristiandnm@users.noreply.github.com>
  • Loading branch information
seanzhangkx8 and cristiandnm authored Jul 25, 2023
1 parent 7630163 commit dbca34f
Show file tree
Hide file tree
Showing 62 changed files with 20,667 additions and 7,521 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@
<!-- ALL-CONTRIBUTORS-BADGE:END -->

[![pypi](https://img.shields.io/pypi/v/convokit.svg)](https://pypi.org/pypi/convokit/)
[![py\_versions](https://img.shields.io/badge/python-3.7%2B-blue)](https://pypi.org/pypi/convokit/)
[![py\_versions](https://img.shields.io/badge/python-3.8%2B-blue)](https://pypi.org/pypi/convokit/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![license](https://img.shields.io/badge/license-MIT-green)](https://github.com/CornellNLP/ConvoKit/blob/master/LICENSE.md)
[![Slack Community](https://img.shields.io/static/v1?logo=slack&style=flat&color=red&label=slack&message=community)](https://join.slack.com/t/convokit/shared_invite/zt-1axq34qrp-1hDXQrvSXClIbJOqw4S03Q)


This toolkit contains tools to extract conversational features and analyze social phenomena in conversations, using a [single unified interface](https://convokit.cornell.edu/documentation/architecture.html) inspired by (and compatible with) scikit-learn. Several large [conversational datasets](https://github.com/CornellNLP/ConvoKit#datasets) are included together with scripts exemplifying the use of the toolkit on these datasets. The latest version is [2.5.3](https://github.com/CornellNLP/ConvoKit/releases/tag/v2.5.2) (released 16 Jan 2022); follow the [project on GitHub](https://github.com/CornellNLP/ConvoKit) to keep track of updates.
This toolkit contains tools to extract conversational features and analyze social phenomena in conversations, using a [single unified interface](https://convokit.cornell.edu/documentation/architecture.html) inspired by (and compatible with) scikit-learn. Several large [conversational datasets](https://github.com/CornellNLP/ConvoKit#datasets) are included together with scripts exemplifying the use of the toolkit on these datasets. The latest version is [3.0.0](https://github.com/CornellNLP/ConvoKit/releases/tag/v3.0.0) (released July 17, 2023); follow the [project on GitHub](https://github.com/CornellNLP/ConvoKit) to keep track of updates.

Read our [documentation](https://convokit.cornell.edu/documentation) or try ConvoKit in our [interactive tutorial](https://colab.research.google.com/github/CornellNLP/ConvoKit/blob/master/examples/Introduction_to_ConvoKit.ipynb).

Expand Down
10 changes: 5 additions & 5 deletions convokit/convokitConfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@


DEFAULT_CONFIG_CONTENTS = (
"# Default Storage Parameters\n"
"# Default Backend Parameters\n"
"db_host: localhost:27017\n"
"data_directory: ~/.convokit/saved-corpora\n"
"default_storage_mode: mem"
"default_backend: mem"
)

ENV_VARS = {"db_host": "CONVOKIT_DB_HOST", "default_storage_mode": "CONVOKIT_STORAGE_MODE"}
ENV_VARS = {"db_host": "CONVOKIT_DB_HOST", "default_backend": "CONVOKIT_BACKEND"}


class ConvoKitConfig:
Expand Down Expand Up @@ -52,5 +52,5 @@ def data_directory(self):
return self.config_contents.get("data_directory", "~/.convokit/saved-corpora")

@property
def default_storage_mode(self):
return self._get_config_from_env_or_file("default_storage_mode", "mem")
def default_backend(self):
return self._get_config_from_env_or_file("default_backend", "mem")
16 changes: 14 additions & 2 deletions convokit/coordination/coordination.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from collections import defaultdict
from typing import Callable, Tuple, List, Dict, Optional, Collection, Union
import copy

import pkg_resources

Expand Down Expand Up @@ -108,11 +109,22 @@ def transform(self, corpus: Corpus) -> Corpus:
utterance_thresh_func=self.utterance_thresh_func,
)

# Keep record of all score update for all (speakers, target) pairs to avoid redundant operations
todo = {}

for (speaker, target), score in pair_scores.items():
if self.coordination_attribute_name not in speaker.meta:
speaker.meta[self.coordination_attribute_name] = {}
speaker.meta[self.coordination_attribute_name][target.id] = score

key = (speaker, target.id)
todo.update({key: score})

for key, score in todo.items():
speaker = key[0]
target = key[1]
# For avoiding mutability for the sake of DB corpus
temp_dict = copy.deepcopy(speaker.meta[self.coordination_attribute_name])
temp_dict[target] = score
speaker.meta[self.coordination_attribute_name] = temp_dict
assert isinstance(speaker, Speaker)

return corpus
Expand Down
Loading

0 comments on commit dbca34f

Please sign in to comment.