CS109_crunchbase

We are an analytics firm that provides consulting services for investors based on our data science expertise. Unfortunately there is no way to know ahead of time which companies will succeed or fail, however, we can try to predict success based on the huge amounts of data available online about startups. For this project we will be analyzing data obtained through the CrunchBase API.

You can learn more about us and play with our visualization on our website:

http://nicodri.github.io/CS109_crunchbase/

or watch our video:

https://www.youtube.com/watch?v=M5FSEExBVDs

Table of Contents of the Process Notebooks

Data Collection Notebook

Used to pull data from the CrunchBase API.

Scraping Data

Organization-List
Excel-API
Relationships

Ensemble Analysis Notebook

Used to analyze the CrunchBase data by building individual Models and combining them into an ensemble.

Predicting Startup Success

Data-Cleaning
Exploratory-Data-Analysis
Bring out the Models
- The Baseline Model
- K-Nearest Neighbors
- Logistic Regression
- SVM
- Naive Bayes
- Random Forests
Building an Ensemble
ROC/Profit Curves

Similarity Graph Notebook

Used to build a similarity graph of the companies.

Similarity Graph

Formating the Data
- Loading Data
- Dimensionality Reduction
Distance Matrix
- Closest Neighbors
- Multi-Dimensional Scaling (MDS)
Unsupervised Learning
- k Means
- Gaussian Mixture Models
- Results
Tuned Similarity Mapping
- Competitors Graph
- Weighted Graph

System Requirements

We developed a Python process using Python 2.7.9 on OS X.

You need the following libraries to run the code:

numpy
pandas
scikit learn
networkx
scipy
json
requests

Reference

We would like to quote here the tools we use to build our website:

Peter Finlan for the website template
Canvasjs for the slider animation
Mapbox for the map
Alchemy.j for the nodes graph

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.gitignore		.gitignore
DataScraping.ipynb		DataScraping.ipynb
EnsembleAnalysis.ipynb		EnsembleAnalysis.ipynb
README.md		README.md
Similarity Graph.ipynb		Similarity Graph.ipynb
test.xlsx		test.xlsx
tmp.zip		tmp.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS109_crunchbase

Table of Contents of the Process Notebooks

Data Collection Notebook

Ensemble Analysis Notebook

Similarity Graph Notebook

System Requirements

Reference

About

Releases

Packages

Contributors 4

Languages

nicodri/CS109_crunchbase

Folders and files

Latest commit

History

Repository files navigation

CS109_crunchbase

Table of Contents of the Process Notebooks

Data Collection Notebook

Ensemble Analysis Notebook

Similarity Graph Notebook

System Requirements

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages