Skip to content

Commit

Permalink
feat: update experiences
Browse files Browse the repository at this point in the history
  • Loading branch information
elahe-dastan committed Aug 14, 2023
1 parent cb76b41 commit f3c7238
Show file tree
Hide file tree
Showing 4 changed files with 51 additions and 28 deletions.
2 changes: 1 addition & 1 deletion main.tex
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@

\mobile{(+98) 920 570 5417}
\email{elahe.dstn@gmail.com}
\homepage{https://elahe-dastan.github.io}
\homepage{elahe-dastan.github.io}
\github{elahe-dastan}
% \linkedin{}
% \gitlab{gitlab-id}
Expand Down
66 changes: 43 additions & 23 deletions resume/experience.tex
Original file line number Diff line number Diff line change
Expand Up @@ -34,28 +34,49 @@
{
\begin{cvitems} % Description(s) of tasks/responsibilities
\item The goal of the team is to give accurate, scalable and fast ETA (estimated time of arrival)
\item Used matrix factorization techniques like ALS to recommend speed for streets we don't have sufficient data for and fed it to routing engines, it improved our coverage from one million shard streets to 3 millions
\item Developed and deployed a benchmarking service with Golang to benchmark our routing engines and models and report the results online in Grafana dashboards. This service benchmarks around 90,000 rides per day
\item Trained a regression model to improve routing engine ETA by 2\% on MAE metric
\item Gathered data from different sources like company central Clickhouse and other teams' databases, discussed with product managers and other teams to understand the data well. Created a data gathering pipeline and ran it periodically using Airflow. Collected over 30 million rides for 2 months
\item Cleaned the data using our knowledge form the data and columns declaring confidence on the rides' ATA (Actual Time of Arrival) also used outlier removal algorithms like isolation forest to remove outliers. It reduced our data to 1/2.
\item Did feature engineering on the data for example using time as a cyclic feature, adding extra features like Haversine distance, adding an understanding of traffic behavior to feature vector, discretizing geometric features for some model and etc. . Extended our feature vector size from 4 to 11.
\item Did EDA on data and sharded our dataset to 4 smaller shards, we had to train a model for each shard but could reduce models' size and increase their accuracy
\item Trained and tested more than 5 different models like Random Forests and Fully Connected Neural Networks. Used Keras Tuner to find best structure for NN models
\item Developed a complete pipeline to train NN models with different structures using Tensorflow. It outputs results on metrics we found cooperatively with product and commercial like R2 and negative error share. It saves the model, its Tensorboard information and etc. .
\item Deployed NN models using Tensorflow server and and Random Forest models using Fast API on Kubernetes.
\item Included preprocessing layers in the NN model to avoid needing any middleware for data preprocessing
\item Load tested models using K6. The NN model's p90 response time was around 10ms and Random Forest's p90 response time was around 30ms
\item Made sure of online benchmarking, monitoring, tracing etc.
\item Reviewed and re‐desigined our data pipeline and services to improve its performance. I replaced old spark‐based solution for driver location gathering with Golang to handle 40K driver locations per second instead of former 8K per second.
\item Upgraded Cassandra cluster to handle 200k per second write ops instead of 73k per second
\item Used Apache beam over Spark so we could have tests for our pipeline stages.
\item Deployed our services in 3 different regions
\item Changed the structure of data gathering to data driven using Kafka as CMQ. Our Kafka handles over 80k messages per second
\item Deployed and used data tools in our data pipeline for example Airflow for data gathering and preprocessing, Auto ML tools like H2O to reduce time in training and testing models, Feast as feature store etc
\item Mentored interns on projects
\item Helped team on interviews and hiring process
\item used ONNX to increase inference speed by 10%
\item PoP
\begin{itemize}
\item Used matrix factorization techniques like ALS to recommend speed for streets we don't have sufficient data for and fed it to routing engines, it improved our coverage from one million shard streets to 3 millions
\item Trained a regression model to improve routing engine ETA by 2\% on MAE metric
\end{itemize}
\item Farsanj
\begin{itemize}
\item Developed and deployed a benchmarking service with Golang to benchmark our routing engines and models and report the results online in Grafana dashboards. This service benchmarks around 90,000 rides per day
\end{itemize}
\item Nostradamus
\begin{itemize}
\item Gathered data from different sources like company central Clickhouse and other teams' databases, discussed with product managers and other teams to understand the data well. Created a data gathering pipeline and ran it periodically using Airflow. Collected over 30 million rides for 2 months
\item Cleaned the data using our knowledge form the data and columns declaring confidence on the rides' ATA (Actual Time of Arrival) also used outlier removal algorithms like isolation forest to remove outliers. It reduced our data to 1/2.
\item Did feature engineering on the data for example using time as a cyclic feature, adding extra features like Haversine distance, adding an understanding of traffic behavior to feature vector, discretizing geometric features for some models and etc. . Extended our feature vector size from 4 to 11.
\item Did EDA on data and sharded our dataset of Tehran rides to 4 smaller shards, we had to train a model for each shard but could reduce models' size and increase their accuracy
\item Trained and tested more than 5 different models like Random Forests and Fully Connected Neural Networks. Used Keras Tuner to find best structure for NN models
\item Developed a complete pipeline to train NN models with different structures using Tensorflow. It outputs results on metrics we found cooperatively with product and commercial like R2 and negative error share. It saves the model, its Tensorboard information and etc. .
\item Deployed NN models using Tensorflow server and and Random Forest models using Fast API on Kubernetes.
\item Included preprocessing layers in the NN model to avoid needing any middleware for data preprocessing
\item Load tested models using K6. The NN model's p90 response time was around 10ms and Random Forest's p90 response time was around 30ms
\item Made sure of online benchmarking, monitoring, tracing etc.
\item Used Streamlit for my model for better communication with product team
\item Hierarchiacally clustered Iran cities from over 40 down to 10 which helped a lot in reducing number of models
\item Reduced short rides (rides under 10 minutes) MAPE by 6\%
\item Reduced total rides MAPE by 2\%
\item Become the first provider for three customers (navigation, car pooling and offering) in 7 cities
\end{itemize}
\item Data Pipeline
\begin{itemize}
\item Reviewed and re‐desigined our data pipeline and services to improve its performance. I replaced old spark‐based solution for driver location gathering with Golang to handle 40K driver locations per second instead of former 8K per second.
\item Upgraded Cassandra cluster to handle 200k per second write ops instead of 73k per second
\item Used Apache beam over Spark so we could have tests for our pipeline stages.
\item Changed the structure of data gathering to data driven using Kafka as CMQ. Our Kafka handles over 80k messages per second
\item Deployed and used data tools in our data pipeline for example Airflow for data gathering and preprocessing, Auto ML tools like H2O to reduce time in training and testing models, Feast as feature store etc
\end{itemize}
\item Extracurricular
\begin{itemize}
\item Deployed our services in 3 different regions
\item Mentored interns on projects
\item Helped team on interviews and hiring process
\item used ONNX to increase inference speed by 10\%
\item Contributed in team's AI conference
\end{itemize}
\end{cvitems}
}

Expand All @@ -79,5 +100,4 @@
{Spring 2020} % Date(s)
{Under Supervision of Eng.~Alvani}


\end{cventries}
2 changes: 1 addition & 1 deletion resume/references.tex
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
{Snapp!}
{Email: parham.alvani@snapp.cab}{}

\cventry{Map Tech. Specialist}{MohammadReza Jafari}
\cventry{Map Engineering Manager}{MohammadReza Jafari}
{Snapp!}
{Email: mohammadreza.jafari@snapp.cab}{}

Expand Down
9 changes: 6 additions & 3 deletions resume/skills.tex
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,15 @@
%---------------------------------------------------------
\cvskill
{Data Science} % Category
{NumPy, Pandas, Scikit‐learn, Pytorch, Tensorflow, Flask, Fast API, PySpark, Classic machine learning, Deep learning} % Skills
{
Predicate Modeling, Data Mining, Classic machine learning (XGBoost, SVM, etc.), Deep learning
Tensorflow, Pytorch, PySpark, Flask, Scikit‐learn,
} % Skills

%---------------------------------------------------------
\cvskill
{Backend Development} % Category
{Go, Python3, C, Java, C\#} % Skills
{Go, Python, C, Java, C\#} % Skills

%---------------------------------------------------------
\cvskill
Expand All @@ -37,7 +40,7 @@
%---------------------------------------------------------
\cvskill
{ML/Data Tools} % Category
{MLFlow, H2O AI, Apache Airflow, Apache Beam, Temporal.io} % Skills
{MLFlow, H2O AI, Apache Airflow, Apache Beam, Temporal.io, Kubeflow} % Skills

%---------------------------------------------------------
\cvskill
Expand Down

0 comments on commit f3c7238

Please sign in to comment.