Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoP: Data Science: Create Text Analysis Tutorial #153

Open
5 tasks
akhaleghi opened this issue Apr 1, 2022 · 3 comments
Open
5 tasks

CoP: Data Science: Create Text Analysis Tutorial #153

akhaleghi opened this issue Apr 1, 2022 · 3 comments
Assignees
Labels
documentation Improvements or additions to documentation feature: guide All issues related to guide role: data analysis size: 1pt Can be done in 6 hours or less

Comments

@akhaleghi
Copy link
Contributor

akhaleghi commented Apr 1, 2022

Overview

Update the Text Analysis page with resources and an article header.

Action Items

  • Create a Google Doc in the folder provided under resources
    • Draft an introductory paragraph explaining what the tutorial resources cover and why a new data scientist would use them for working with data at Hack For LA
    • Identify resources with vetted tutorials covering important skills within the tutorial area, adding to the draft
  • Review the draft with the Data Science CoP
  • Add to the wiki page

Resources/Instructions

Wiki page

Text Analysis Tutorial

Location for any files you might need to upload (drafts, images, etc.)

Tools that are core that should be mentioned:

  • nltk
  • SpaCy

Examples of resources that would be useful to include:

  • Web how-to/tutorial/walk-throughs
  • Youtube playlists or videos demonstrating tools
  • Links to blogs or platforms with subject matter experts
@akhaleghi akhaleghi added documentation Improvements or additions to documentation feature: guide All issues related to guide role: org size: 1pt Can be done in 6 hours or less labels Apr 1, 2022
@bfang22
Copy link
Member

bfang22 commented Apr 23, 2024

Hi - I might be able to work on this. I brainstormed an outline of topics, but would appreciate some feedback on scope and where to draw the line. Should this be more focused on text processing/analysis fundamentals or do we want to go all the way to building and evaluating predictive/classification models?

Potential tutorials:

Use Cases

  • Most frequent tokens, entities (beginner)
  • Prediction/Classification (intermediate)
    • Sentiment, Next Word, other user-defined outcomes
  • Language Inference, Translation (advanced)

Text Analysis Basics

Pre-processing

  • Stop Words
  • Lemmatization vs Stemming
  • Tokenization
  • N-grams

Analysis basics (with libraries)

  • Named Entity Recognition
  • Part of Speech Tagging
  • Topic Modeling

More advanced...

Vectorization/Encoding

  • One-hot encoding (Beginner)
  • BOW (Beginner)
  • TF-IDF (Intermediate)
  • Word embeddings (advanced)

Building Models

  • Logistic Regression
  • MLP
  • SVM

Hyperparameter Tuning

  • Bias/variance tradeoff
  • Learning rate, epochs, etc

Metrics

  • Accuracy, Precision, Recall, and F1

@bfang22
Copy link
Member

bfang22 commented Apr 23, 2024

Feedback from DS meeting:

  • Connect topics and tutorials to HFLA datasets and projects

  • Build tutorial/notebook using HFLA dataset

  • Review 311 dataset to see if there's textual data that's a good fit

  • Explore alternatives: affordable housing or scraping HFLA agenda issues

@bfang22
Copy link
Member

bfang22 commented Jun 11, 2024

  • Identified 2 datasets with textual data: Los Angeles County Department of Arts and Culture's data on Community Impact Art Grants and Organizational Grants Program
  • Create tutorial (jupyter notebook)
    • pre-processing
      • using nltk library (stopwords, tokenizers, stem)

@ExperimentsInHonesty ExperimentsInHonesty closed this as completed by moving to Filled in HfLA: Open Roles Jun 18, 2024
@ExperimentsInHonesty ExperimentsInHonesty changed the title Create Text Analysis Tutorial CoP: Data Science: Create Text Analysis Tutorial Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation feature: guide All issues related to guide role: data analysis size: 1pt Can be done in 6 hours or less
Projects
Status: In progress (actively working)
Status: Filled
Development

No branches or pull requests

3 participants