-
Notifications
You must be signed in to change notification settings - Fork 667
GSoC 2017 Project Ideas
The current proposed projects are:
- Implement efficient parallel analysis of trajectories
- Improve distance search
- Add new MD-Formats
- Start using Pytest
Or work on your your own idea! Get in contact with us to propose an idea and we will work with you to flesh it out into a full project. Raise an issue in the Issue Tracker or contact us via the developer Google group.
You can find the list of all available mentors for MDAnalysis here.
Mentors: Manuel Intensity: Hard
Molecular simulation trajectories are very often analyzed frame-by-frame. This is frequently an embarrassingly parallel procedure, in which work can be efficiently divided simply by splitting the trajectory and letting each worker process one of the chunks. The goal of this project is to implement a parallelization framework that automates all the trajectory splitting, work distribution, and eventual result collection.
A parallelization framework should put the least burden possible on the end-user, so that minimal changes are required to turn serial code into parallel. Likewise, the parallelization framework must blend naturally with the analysis API of MDAnalysis. In this way, analyses written using analysis.base will automatically become parallelizable.
Implementing parallelization in Python code can be done in many ways. Aspects to consider when choosing one or several approaches are:
- Most users will primarily have access to SMP parallelization;
- Notwithstanding the above point, many users also typically have access to multi-node HPC clusters, and we should be able to leverage their use;
- In an analysis context, being able to write results to shared memory will improve the memory usage footprint and simplify result collection;
- GPU parallelization is attractive for its wide availability (though possibly more complex to implement in a meaningful way).
Mentors: Manuel, Richard Intensity: Hard
Analysis of molecular dynamics simulations typically involves calculations of based upon atoms which are spatially close to each other. For example a radial distribution function is often only interesting up to distances of around 1.6 nm. The naive approach to calculate this is to calculate the distance between each pair of atoms, however as the size of the system grows the number of useful pair distances decreases while the computational cost scales as N^2.
To greatly improve the efficiency of this operation, we can first decompose the total simulation volume into smaller cells. We can then calculate the distances between atom pairs in neighbouring cells. If atoms are not in neighbouring cells we already know that the distance is to large to be interesting. A theoretical description of this algorithm can be found in this book Appendix F
One domain decomposition algorithm is cell grids.
In this project you would integrate the cell grid algorithm into MDAnalysis.
Mentors: Richard, Jonathan Intensity: Moderate
One of the strengths of MDAnalysis is its ability to support a wide range of different MD-formats. The formats read/written by MDAnalysis typically offer a combination of XYZ coordinate data with other format-dependent attributes (forces, for instance). The But we are still missing some like the new TNG file format from Gromacs , H5MD or the HALMD format. Alternatively, you can also add a format that you want to use personally in MDAnalysis. To tackle this you should definitely get acquainted with
- MD (Molecular Dynamics) simulations, and especially the type of structural data one gets from such simulations, and
- the MDAnalysis code base, with emphasis on the I/O part. For this (and as our requirement prior to application) you should solve one of the open MDAnalysis issues. Solving issue #517 should help you get a feel for a part of the I/O code and at the same time get practice going through our git/github workflow.
This project will familiarize you with working with and connecting different APIs, as well as giving insight into how modern portable data storage file formats work. It is vitally important that data is read correctly, otherwise analysis will fail at the very first step. For this reason, there will be a heavy emphasis on the testing for any code written, and so the project will also teach good practice in software testing.
Mentors: Richard Intensity: Moderate
Related Issue: #884
Software testing is extremely important to the MDAnalysis project, so that users of the package can be confident in the results of their analysis. With the current test package nose, ceasing to be developed we have decided to move to using py.test.
We started a wiki page where plans to outline the transition to pytest are tracked. There is also a thread on the mailing-list about the topic.
In this project, you will help migrate our existing test package to use py.test. You will learn about functional and unit testing in application to a modern scientific analysis framework. In addition you will learn how to use the py.test package, including the use of fixtures and plugins.