Skip to content

Commit a1cd23f

Browse files
authored
Update PROPOSAL.md
1 parent acea94b commit a1cd23f

File tree

1 file changed

+29
-30
lines changed

1 file changed

+29
-30
lines changed

Learn.md/PROPOSAL.md

+29-30
Original file line numberDiff line numberDiff line change
@@ -2,27 +2,27 @@
22

33
## Finding Insights from Stack Overflow Developer Survey
44

5-
Stack Overflow is a professional community for developers, conducting an annual survey. The collected data from 2011 onwards has been available for open source on the web, with the latest dataset released in 2020. Analyzing this dataset professionally using modern tools would enable us to answer real-world questions effectively. The dataset includes responses to 275 questions.
5+
Stack Overflow is a professional community for developers that conducts an annual survey. The data collected from 2011 onwards is available as open-source and the latest dataset was released in 2020. Analyzing this dataset professionally using modern tools enables us to answer real-world questions effectively. The dataset includes responses to 275 questions.
66

7-
### Project Goal:
7+
### Project Goal
88

9-
1. **Perform Analysis on 3 years of Stack Overflow Dataset:** Extract insights from the data.
10-
2. **Data Analysis Goals:** Answer the following questions:
9+
1. **Perform Analysis on 3 Years of Stack Overflow Dataset:** Extract valuable insights from the data.
10+
2. **Data Analysis Goals:** Address the following questions:
1111
- What is the impact of higher education on the salary of surveyed developers?
1212
- How do education, experience, and responsibilities affect gender inequalities?
1313
- How does ethnicity impact participation rates?
1414
- Is there a difference in income between men and women?
1515
- How does the previous year's interest in a language affect its popularity in the current year?
1616
3. **Data Visualization Goals:**
17-
- Identify the most commonly used language.
18-
- Analyze the distribution of surveyors based on their developer roles.
17+
- Identify the most commonly used programming languages.
18+
- Analyze the distribution of survey respondents based on their developer roles.
1919
- Explore factors affecting job satisfaction.
20-
- Predict the growth of languages for upcoming years based on survey answers.
20+
- Predict the growth of programming languages for upcoming years based on survey answers.
2121
- Provide insights for IT environment, hiring employees, job seekers, and building a solid résumé.
2222

2323
### Data Source and Background
2424

25-
The dataset is sourced from the annual Stack Overflow developer survey, covering responses from developers in 180 countries. The data range from 2011 to 2020, with the focus being on the last 3 years. Respondents primarily come from the US, India, and EMEA regions, with a background in developer/coding experience. The dataset includes survey data gathered from 180 countries, with responses ranging from "Not at all important" to "Very important" and "Not at all satisfied" to "Very satisfied."
25+
The dataset is sourced from the annual Stack Overflow developer survey, covering responses from developers in 180 countries. The data spans from 2011 to 2020, with the focus being on the last 3 years. Respondents primarily come from the US, India, and EMEA regions, with a background in developer/coding experience. The dataset includes survey data gathered from 180 countries, with responses ranging from "Not at all important" to "Very important" and "Not at all satisfied" to "Very satisfied."
2626

2727
### Data Format
2828

@@ -37,40 +37,39 @@ The data is in CSV format, consisting of 252,199 observations and 62 variables.
3737

3838
#### Techniques Expected to Use in the Project
3939

40-
- ML Algorithms: Utilize algorithms like Random Forest, KNN, AUC for classification problems, logistic regression, and linear regression.
41-
- Data Visualization: Employ data visualization techniques for better understanding and presentation of insights.
42-
- Parameter Analysis: Analyze parameters to fine-tune models and improve accuracy.
40+
- **ML Algorithms:** Utilize algorithms like Random Forest, KNN, AUC for classification problems, logistic regression, and linear regression.
41+
- **Data Visualization:** Employ data visualization techniques for better understanding and presentation of insights.
42+
- **Parameter Analysis:** Analyze parameters to fine-tune models and improve accuracy.
4343

4444
#### Project Plan
4545

46-
**Week 8:** Project Base Setup
46+
**Week 8: Project Base Setup**
4747
- Source control setup on [GitHub](https://github.com/Recode-Hive/Stackoverflow-Analysis)
48-
- Project Management using tools like MS Project
49-
- Complete Data Wrangling & Basic Analysis
48+
- Project management using tools like MS Project
49+
- Complete data wrangling and basic analysis
5050

51-
**Week 10:** Baseline Model Building
51+
**Week 10: Baseline Model Building**
5252
- Implement algorithms and build baseline models
5353

54-
**Week 11:** Model Evaluation
54+
**Week 11: Model Evaluation**
5555
- Run tests and evaluate the performance of models
5656

57-
**Week 12:** Finalization
58-
- Prepare video presentation summarizing the analysis and insights
57+
**Week 12: Finalization**
58+
- Prepare a video presentation summarizing the analysis and insights
5959

6060
#### Additional Technical Details
6161

62-
> Linear regression(RFE techniques)
63-
64-
$$
65-
y = O_1X + O_2
66-
$$
67-
68-
> Root Mean Squared Error Calculations
69-
70-
$$
71-
rmse = \sqrt{\left(\frac{1}{n}\right)\sum_{i=1}^{n}(y_{i} - x_{i})^{2}}
72-
$$
73-
62+
> **Linear Regression (RFE techniques):**
63+
>
64+
> $$
65+
> y = O_1X + O_2
66+
> $$
67+
68+
> **Root Mean Squared Error Calculations:**
69+
>
70+
> $$
71+
> rmse = \sqrt{\left(\frac{1}{n}\right)\sum_{i=1}^{n}(y_{i} - x_{i})^{2}}
72+
> $$
7473
7574
## Potential Impact and Benefits
7675

0 commit comments

Comments
 (0)