We are an analytics firm that provides consulting services for investors based on our data science expertise. Unfortunately there is no way to know ahead of time which companies will succeed or fail, however, we can try to predict success based on the huge amounts of data available online about startups. For this project we will be analyzing data obtained through the CrunchBase API.
You can learn more about us and play with our visualization on our website:
http://nicodri.github.io/CS109_crunchbase/
or watch our video:
https://www.youtube.com/watch?v=M5FSEExBVDs
Used to pull data from the CrunchBase API.
- Organization-List
- Excel-API
- Relationships
Used to analyze the CrunchBase data by building individual Models and combining them into an ensemble.
- Data-Cleaning
- Exploratory-Data-Analysis
- Bring out the Models
- The Baseline Model
- K-Nearest Neighbors
- Logistic Regression
- SVM
- Naive Bayes
- Random Forests
- Building an Ensemble
- ROC/Profit Curves
Used to build a similarity graph of the companies.
- Formating the Data
- Loading Data
- Dimensionality Reduction
- Distance Matrix
- Closest Neighbors
- Multi-Dimensional Scaling (MDS)
- Unsupervised Learning
- k Means
- Gaussian Mixture Models
- Results
- Tuned Similarity Mapping
- Competitors Graph
- Weighted Graph
We developed a Python process using Python 2.7.9 on OS X.
You need the following libraries to run the code:
- numpy
- pandas
- scikit learn
- networkx
- scipy
- json
- requests
We would like to quote here the tools we use to build our website:
- Peter Finlan for the website template
- Canvasjs for the slider animation
- Mapbox for the map
- Alchemy.j for the nodes graph