Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conduct analysis of 99 NC website technologies #52

Open
2 of 7 tasks
ebele-oputa opened this issue Oct 18, 2021 · 22 comments
Open
2 of 7 tasks

Conduct analysis of 99 NC website technologies #52

ebele-oputa opened this issue Oct 18, 2021 · 22 comments

Comments

@ebele-oputa
Copy link
Contributor

ebele-oputa commented Oct 18, 2021

Dependency

Overview

We need to conduct a comparative analysis of the technologies that were used to build the 99 NC websites so that we can understand the commonalities, differences and create a list of recommendations for the NCs.

Action Items

  • Read the project one sheet
  • Read about the NC system in LA
  • Analyse the spreadsheet (ongoing)
    • technology used most frequently
    • website used most technology
  • Write report
  • Review report

Resources/Instructions

@ebele-oputa
Copy link
Contributor Author

@ShikaZzz Please rewrite the action items after having a conversation with rajinder

@ebele-oputa
Copy link
Contributor Author

@ShikaZzz and @sonu-k we're not yet done with data collection so we can't do analysis. I'll move this to the Ice Box and assign you both on the other issue

@ExperimentsInHonesty ExperimentsInHonesty added the dependency This issue cannot be worked on until another issue is completed label Nov 13, 2021
@ShikaZzz
Copy link

ShikaZzz commented Dec 13, 2021

Progress:

  • did:
    1. extracted data from scrapped json file
    2. analyzed how many websites use a technology (e.g 44 websites use COVID-19)
    3. analyzed how many times a technology is used on a website, including all the webpages (e.g Google font API is used over 600 times on all the webpages across all the NC's websites)
    4. analyzed the total number of different technologies used on a website (e.g Mission Hills use 25 different technologies)
  • to do:
    - refine visualizations: put the number on the top of each bar
    - will focus on items 2, 3 (only considering Google Tag Widget), 4 and if a website uses a technology and create their spreadsheets
    - Python functions
    - rescrape 37 NC's that didn't have returned results
    - Update Google Colab and issue board

Blockers: 37 NC's are not successfully scrapped (returned nothing)
Availability: 10 hours
ETA: by next meeting

@ShikaZzz
Copy link

ShikaZzz commented Dec 20, 2021

Progress:

  • did:
    • finished coding to generate csv files (spreadsheets) that contain:
      1. if a technology is used on a website. the file name is in the format of count_xxx.csv, e.g count_Widgets.csv
      2. how many times a technology is used on a website. the file name is in the format of sum_xxx.csv, e.g sum_Widgets.csv
    • results and code, spreadsheets are in the results folder
    • total number of usage of Google Tag Manager is in google sheets. Go to the column Google Tag Manager
    • Note: The format of a spreadsheet is: columns are the technology name, website, NC name, empowerLA.org; each row is for 1 NC. The file is generated this way so that it is easier to add information to an NC.

Blockers: found bugs in scrapping code and working on debugging

Availability: 10 hours during weekend
ETA: 1 week

@kalyaniraman

This comment was marked as resolved.

@ShikaZzz

This comment was marked as outdated.

@ExperimentsInHonesty
Copy link
Member

sent this message in slack

@Abe Khaleghi the #community-survey project is working with Data Science on the Neighborhood Council website technology scrape. @shika Zhou was on our team (here is the issue)but has to step back. Please connect @kalyani Raman (pm of open community survey project) and @rajinder Mavi to each other and identify what issue he is working from so we can wrap this project.

@kalyaniraman kalyaniraman self-assigned this Jan 24, 2022
@kalyaniraman

This comment was marked as resolved.

@kalyaniraman

This comment was marked as outdated.

@kalyaniraman
Copy link
Member

Followed up on Slack - @Abe Khaleghi Hello Abe, I have been trying to reach @rajinder Mavi and i have not heard back regarding the status of the issue #52 Conduct analysis of 99 NC website technologies, on where he was and if there is any work pending? Can you please assist in getting a feedback so OCS team can progress on this issue. Thankyou so much for all your help.

@kalyaniraman
Copy link
Member

Response on following up with Abe and Rajinder -
Rajinder responded - My work is on my GitHub https://github.com/rajindermavi/Webscraping I found technologies for webpages using the site builtwith. But I do run into my ip address getting blocked

@kalyaniraman
Copy link
Member

image

@akhaleghi
Copy link

@rajinder could you do a few more things on this issue?

  • Please move the output spreadsheet to this folder so that we can access it on the data science team.
  • It looks like, on the spreadsheet output, the tech table with all the technologies is complete, but the tech grouping tab is not complete. Is that correct? If so, we can have someone else tackle that.

@kalyaniraman
Copy link
Member

@akhaleghi @rajinder
Please provide update

  1. Progress: "What is the current status of your project? What have you completed and what is left to do?"
  2. Blockers: "Difficulties or errors encountered."
  3. Availability: "How much time will you have this week to work on this issue?"
  4. ETA: "When do you expect this issue to be completed?"
  5. Pictures (if necessary): "Add any pictures that will help illustrate what you are working on."

@kalyaniraman
Copy link
Member

Emailed Abe and Rajinder from open community survey gmail account to get an update and also asked if they could join the team meeting to resolve this ongoing issue. Waiting to hear back from them.

@kalyaniraman
Copy link
Member

Last update received on March 3 from Rajinder
image

@kalyaniraman
Copy link
Member

Added this issue to Data Science PM meeting Agenda

@kalyaniraman
Copy link
Member

@akhaleghi @rajinder
Please provide update

  1. Progress: "What is the current status of your project? What have you completed and what is left to do?"
  2. Blockers: "Difficulties or errors encountered."
  3. Availability: "How much time will you have this week to work on this issue?"
  4. ETA: "When do you expect this issue to be completed?"
  5. Pictures (if necessary): "Add any pictures that will help illustrate what you are working on."

@akhaleghi
Copy link

akhaleghi commented Mar 29, 2022

@kalyaniraman We've covered all of this on Slack. OCS: Builtwith data on 99 NCs technologies is the output. The technologies have been identified on the 99 sites but the groupings (the second and third tabs of that spreadsheet) have not. Rajinder is no longer working on this, but what's left to do seems to be just taking the list on the first tab (probably focusing initially on the first 65 items with 2 or more instances) and categorizing them into a particular group.

@ExperimentsInHonesty
Copy link
Member

@kalyaniraman the github handle of rajinder is not
@rajinder its @rajindermavi but he is no longer on the project, so only message @akhaleghi

@ExperimentsInHonesty
Copy link
Member

the issue on the Data Science community of practice is hackforla/data-science#44

@kalyaniraman
Copy link
Member

We have asked for more details from Data Science Team hackforla/data-science#44 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ice Box
Development

No branches or pull requests

6 participants