-
Notifications
You must be signed in to change notification settings - Fork 7
MCMC for forensic science
Probabilistic genotyping algorithms are used to compute match probabilities in forensic science. But the field is dominated by closed-source proprietary software.
Some examples of other closed-source, proprietary software to compare against:
- Peak height and size as input: STRMIX (author=buckleton), BulletProof (free for public use)
- TrueAllele (author=perlin takes raw fsa files as input)
- seqinr::read.abif reads the binary fsa format in which many EPGs are saved.
Write a new R package that implements the existing probabilistic genotyping algorithm described in https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0008327
Prototyping using the "Bayesian packages for general model fitting" listed here: https://cloud.r-project.org/web/views/Bayesian.html
After prototyping we may think about implementing our own (more efficient?) MCMC sampler in C++ and interfacing that with R.
This project will provide a new R package which will be useful for researchers in forensic science. It will provide a reference implementation of algorithms that are currently only available in proprietary/closed-source software.
Students, please contact mentors below after completing at least one of the tests below.
- EVALUATING MENTOR: Toby Hocking toby.hocking@r-project.org is the author of many R packages related to machine learning, and has been GSOC-R mentor since 2013.
- Edoardo Serra edoardoserra@boisestate.edu
Students, please do one or more of the following tests before contacting the mentors above.
MENTORS: write several tests that potential students can do to demonstrate their capabilities for this particular project. Ask some hard questions that will give you insight about how the students write code to solve problems. You'll see that the harder the questions that you ask, the easier it will be for you to choose between the students that apply for your project! Please modify the suggestions below to make them specific for your project.
- Easy: download data from https://lftdi.camden.rutgers.edu/provedit/files/ then use seqinr::read.abif to read and seqinr::plotabif to plot an fsa file.
- Medium: make a similar multi-panel ggplot with facet_grid.
- Hard: Demonstrate your capability in one of the "Bayesian packages for general model fitting" listed here: https://cloud.r-project.org/web/views/Bayesian.html, or in writing an R package with C++ code.
Students, please post a link to your test results here.
-
EXAMPLE STUDENT 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.
-
Name: Sarah Abraham,
Email: sarah1999abraham@gmail.com,
Github Profile: https://github.com/sarahab23,
Test Results: https://github.com/sarahab23/MCMC_for_Forensic_Science -
Name: khaled ahmed abdelgalil,Email: khaled.abdelgalil96@gmail.com,Github Profile: https://github.com/khaledlooda, Test Results: https://github.com/khaledlooda/Tests-for-the-project-MCMC-for-Forensic-Science-R-Projects-GSOC-2020
-
Name : Dhruv Aggarwal,
Email : dhruvaggarwal6@gmail.com,
Github Profile : Dhruv Aggarwal,
Solution to Easy Test : EasyTest,
Solution to Medium Test : MediumTest,
Solution to Hard Test : HardTest -
Name: Quazi Irfan
Email: quazirfan@gmail.com
GitHub profile: https://github.com/quazi-irfan
All test solutions: https://github.com/quazi-irfan/R_GSoC_2020