Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python version of py-stats19 #250

Open
UCLWilson opened this issue Aug 2, 2024 · 7 comments
Open

Python version of py-stats19 #250

UCLWilson opened this issue Aug 2, 2024 · 7 comments

Comments

@UCLWilson
Copy link

Hi Robin @Robinlovelace ,

I hope this message finds you well.

We’ve likely met at GISRUK, GIScience, and other data science conferences. I am Xiaowei, a final-year PhD student at UCL, supervised by James Haworth. My research focuses on using graph deep learning for traffic crash prediction, specifically for the UK.

I am grateful for your work on the R package for STATS19. To further support deep learning and machine learning analyses for road safety in the UK, my colleague Jinshuai , from Data Science Insitute, LSE and I have developed a Python version of this package, named py-stats19. This project is under the supervision of Dr. James Haworth and Prof. Tao Cheng (Director of SpaceTimeLab).

Our Python package extends the R version by providing access to data from 1979 onward, with features for easily referencing specific years. It also incorporates temporal information and geometry to support spatiotemporal analysis. We are currently in the early stages of development and aim to include LLM and visualization tools to make the package more accessible and interpretable for public users, policymakers, and researchers. We are targeting a completion date by the end of this year and have already purchased a domain name for the project.

As this is our first open-resource package, we would greatly appreciate any insights or support you might offer.

Thank you for your time and consideration.

Best regards,
Xiaowei

@Robinlovelace
Copy link
Member

Hi @UCLWilson thanks for your interest. We have an issue tracking the development of a Python version: #230. Great to think about features to support, the R version does provide access to the 1979-present dataset. Look forward to giving your package a go, but cannot see any code here: https://github.com/Mayazure/py-stats19

@Robinlovelace
Copy link
Member

I see this currently, do you have a different link for the source code?
image

@UCLWilson
Copy link
Author

I see this currently, do you have a different link for the source code? image

Sorry, robin. We just made it as public. https://github.com/Mayazure/py-stats19

Please feel free to let us know how we could help in further.

@UCLWilson
Copy link
Author

We are fixing some data pulling bugs now, will updated a new one later today, sorry.

@layik
Copy link
Member

layik commented Aug 2, 2024

Great! I am away but when I have time will try to contribute as my Py is a little sharper than R. As suggested in #230 it would be great to have some common code in the two packages.

Will watch your work.

@Robinlovelace
Copy link
Member

Robinlovelace commented Aug 2, 2024

Just took a look, great to see more open code for working with road collision data, and the fact it's a Python package should make it accessible to many people. Great also that it allows user to set a default directory, like the R package.

One question: have you thought about using duckdb or polars in addition or as an alternative to pandas?

@UCLWilson
Copy link
Author

Hi @layik and @Robinlovelace ,
Thank you so much for your encouraging words. As noted in the README for the Python version, your R package provided a solid foundation for our development of py-stats19. We look forward to collaborating in the future to contribute to open code and enhance road safety in the UK. Please feel free to share this information, as promoting the Python version could help others who are interested in analysis and modelling.

Regarding data processing, we initially chose to use pandas for compatibility. However, we are considering switching to Polars for improved performance after I complete my PhD thesis, which is expected around September or October. If you believe this could benefit further research, we would be happy to contribute to the STATS19 project.

If you’d like to discuss this further, please feel free to reach out to me via email.

We greatly appreciate your inspiring work and ongoing contributions to open code, which have been a great motivation for our project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants