Skip to content

Python Analysis Structure

LonelyProf edited this page Jan 10, 2024 · 6 revisions

Cluster program

The cluster.py program is self contained. It reads in a configuration of atomic positions and produces a circular linked list for each cluster identified within the configuration. The best value of the critical separation r_cl depends on the particular physical system being considered. To illustrate, we have provided a file cluster.inp consisting of N=256 atoms at low overall density, generated by a short quench from a disordered system at high temperature to a low temperature inhomogeneous state (not yet at equilibrium). This file should be copied into the working directory before running the program. With the default value r_cl=1.5, six well-separated clusters should be identified. The results in this case are moderately insensitive to the value of r_cl, but increasing it above 3 includes all atoms in a single cluster, while reducing it below 1.15 will start to separate isolated atoms into clusters of their own.

Clustering algorithms are part of the standard toolkit of data analysis, and in practical applications it may be more efficient and convenient to use a packaged implementation of an algorithm such as dbscan

A Python implementation of dbscan is available as part of the sklearn library. For systems in periodic boundaries, rather than supplying the atomic positions, the user should compute a distance matrix using the minimum image convention, and supply that to the routine, as suggested by Turci.

Pair distribution function

The program pair_distribution.py reads in a set of configurations and calculates the pair correlation function g(r). We limit the number of configurations to a maximum of 1000 (numbered from 000 to 999) simply so as to use a fixed naming scheme for the input configurations; in a practical application, a trajectory file would be used instead. The sum over all pairs is performed using a vectorized approach, so as to take advantage of NumPy routines. Two versions appear here as examples, both different from the methods used in our MD programs.

In the simplest version, a matrix of all pair distances is computed from the coordinate array r, and a NumPy indexing function is used to extract the upper triangle of N(N-1)/2 values which are processed together. For the cases tested, this proved significantly faster than an alternative (commented out in the code) based on S Brode and R Ahlrichs, Comput Phys Commun, 42, 51 (1986). In the alternative version, the coordinate array r is compared with a cyclically-shifted copy, to give n (that is, N) separation vectors rij which are processed all at once. Only n//2 shifts are needed to cover every distinct ij pair. There is a slight subtlety on the last shift, if n is even: both ij and ji pairs appear, and so the usual incrementing factor 2 is replaced by a factor 1. For either version of the code, the actual histogramming is conveniently performed by the built-in NumPy histogram routine.

We have tested this on a set of 500 configurations of N=256 Lennard-Jones atoms, cut (but not shifted) at Rc=2.5σ, at the usual state point ρ=0.75, T=1.0. The interval between configurations was 100 MC sweeps. This data set is provided in the file pair_distribution_data.zip in the Data repository. Using the default resolution of 0.02σ, identical results were obtained as for the Fortran example.

g(r) test results

Interface pair correlation function

The program grint.py reads in a set of configurations and calculates the pair correlation function for a system that is inhomogeneous in the z direction. It is assumed that the configurations consist of a liquid slab, surrounded by gas, with the interfaces lying in the xy-plane. The two interfaces are located by fitting the instantaneous density profile to a difference of two tanh functions. Then the single-particle density function, relative to each of the interface locations, is calculated. To make the process as robust as possible, an initial guess at the mid-point of the liquid slab should be provided, and this is updated automatically as successive configurations are read in, so as to shift the liquid slab into the middle of the periodic box, before fitting. Also, the results of one fit are passed on as the starting point of the next one. The program handles cubic boxes only; the modifications necessary to handle non-cubic boxes are fairly easy to make, but we will not do that here. No attempt is made to correct for capillary-wave fluctuations affecting the width of the interface.

Having located, and combined, the two interfaces, the histograms for calculating the one-body and two-body densities are accumulated, in a coordinate system which has its origin at the interface. The one-body density is then fitted by a single tanh curve: this could easily be changed by the user if desired. For simplicity, in the normalization of the two-body density to give the pair correlation function, we use the fitted tanh form of the single particle density. This is obviously an approximation, and could be replaced by a better fit, or an interpolation scheme, to evaluate the function at arbitrary z. Finally, the program writes out the average, overall, density profile, the single-particle density (with fit) and slices at selected values of z and c through the pair correlation function.

In testing this program, it is important to use a large enough system so that all values of z1 and z2 of interest (measured relative to the interface position) lie far from the other interface position.

The program was tested on a system of N=10000 atoms, interacting through the Lennard-Jones potential cut (but not shifted) at Rc=2.5σ, in a cubic box of side 30σ, at a temperature T=0.90. For this system, ρG ≈ 0.024, ρL ≈ 0.713 (see A Trokhymchuk, J Alejandre, J Chem Phys, 111, 8510 (1999)). A set of 100 configurations from this run, together with the output of the Fortran example, grint.f90, with default parameters, are provided in the file grint_data.zip in the Data repository. The output of grint.py is very similar to that of grint.f90. The Python version takes advantage of a curve fitting routine provided in SciPy, and the one-dimensional and multi-dimensional histogramming routines in NumPy.

Clone this wiki locally