Skip to content

mrallenchen/venmo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

venmo money, venmo problems

Design:

Motivation: What is venmo used for? Apply NLP techniques to analyze venmo transactions.

Some specific motivating questions: Do the type of transactions change over time? For example, during football season, there could be betting/fantasy football pools that have transacations for football. Post-covid, Has there been a change in the social outings, rent/utilities payments? Can I detect tips and professional jobs on Venmo? Can I identify superusers? What transactions are the superusers doing?
while this project does not get around to answer all these questions, those that aren't remain potential for Future Work

Presentation:

Slide Deck

Approach:

Data

01_Data_Collection

  • Using venmo-api, data was collected from venmo on 8/9.
  • API capabilities:
    • Search for users based on name
    • Pull transactions for each user (limit of 50 latest transactions, though there may be another way around this)
  • Because the API does not allow pulling from a feed of the most recent transactions, to get a list of transactions, the following process was adopted:
    1. Find the most popular American names, based on information from the Social Security Administration
    2. Using searches for those names, generate a list of users (with more users for the more popular names)
    3. Pull transactions for each of the users
    • Note that this process results in data that is intended to reflect the broad population of U.S. Citizens and their usage of Venmo.

Note: For privacy considerations, individual transaction data is not be available in the public repo. This includes raw data, processed data, and outputs within notebooks.

02_Data_Processing\

  • NLP to clean the data
  • Handles Emojis
  • Custom processing to allow for word embeddings from Google's file to be used (e.g. french_fries instead of "french fries")

03_EDA

  • Initial EDA performed on the data

Modeling

Initial Modeling

Ultimate Modeling

  • This was the final approach to developing themes for each note
  • 05_Word_Embeddings_download
    • obtain word embeddings for the word2vec google-news-200 dataset
  • 06_Word_Vector_Matrix\
    • Convert each transaction note into word count vectorizer
    • Apply word embeddings
    • Result in vector for each note
  • 07_Modeling_Clustering #This is the Champion Model
    • Use Kmeans cluster to arrive at themes for each note
    • Export dataset with themes

Exploratory Data Analysis

Tools:

  • Word Embeddings:
    • Google-news-300
  • Python libraries
    • venmo-api
    • emoji
    • pandas
    • sklearn
    • seaborn
    • matplotlib
    • gensim
    • pyLDA
    • re
  • Tableau
    • visualization

Data source:

Venmo Social Security Administration (for common names)

Future Work:

  • File cleanup:
    • Can add more discussion to the 04 models, to better illustrate why those were not used

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published