Skip to content

Latest commit

 

History

History
87 lines (70 loc) · 3.57 KB

Readme.md

File metadata and controls

87 lines (70 loc) · 3.57 KB

venmo money, venmo problems

Design:

Motivation: What is venmo used for? Apply NLP techniques to analyze venmo transactions.

Some specific motivating questions: Do the type of transactions change over time? For example, during football season, there could be betting/fantasy football pools that have transacations for football. Post-covid, Has there been a change in the social outings, rent/utilities payments? Can I detect tips and professional jobs on Venmo? Can I identify superusers? What transactions are the superusers doing?
while this project does not get around to answer all these questions, those that aren't remain potential for Future Work

Presentation:

Slide Deck

Approach:

Data

01_Data_Collection

  • Using venmo-api, data was collected from venmo on 8/9.
  • API capabilities:
    • Search for users based on name
    • Pull transactions for each user (limit of 50 latest transactions, though there may be another way around this)
  • Because the API does not allow pulling from a feed of the most recent transactions, to get a list of transactions, the following process was adopted:
    1. Find the most popular American names, based on information from the Social Security Administration
    2. Using searches for those names, generate a list of users (with more users for the more popular names)
    3. Pull transactions for each of the users
    • Note that this process results in data that is intended to reflect the broad population of U.S. Citizens and their usage of Venmo.

Note: For privacy considerations, individual transaction data is not be available in the public repo. This includes raw data, processed data, and outputs within notebooks.

02_Data_Processing\

  • NLP to clean the data
  • Handles Emojis
  • Custom processing to allow for word embeddings from Google's file to be used (e.g. french_fries instead of "french fries")

03_EDA

  • Initial EDA performed on the data

Modeling

Initial Modeling

Ultimate Modeling

  • This was the final approach to developing themes for each note
  • 05_Word_Embeddings_download
    • obtain word embeddings for the word2vec google-news-200 dataset
  • 06_Word_Vector_Matrix\
    • Convert each transaction note into word count vectorizer
    • Apply word embeddings
    • Result in vector for each note
  • 07_Modeling_Clustering #This is the Champion Model
    • Use Kmeans cluster to arrive at themes for each note
    • Export dataset with themes

Exploratory Data Analysis

Tools:

  • Word Embeddings:
    • Google-news-300
  • Python libraries
    • venmo-api
    • emoji
    • pandas
    • sklearn
    • seaborn
    • matplotlib
    • gensim
    • pyLDA
    • re
  • Tableau
    • visualization

Data source:

Venmo Social Security Administration (for common names)

Future Work:

  • File cleanup:
    • Can add more discussion to the 04 models, to better illustrate why those were not used