Skip to content

Python Introduction

Michael-P-Allen edited this page May 19, 2025 · 7 revisions

Brief Guide to Python Examples

The python_examples subdirectory contains Python versions of some of the example programs.

Here are some notes to assist in running the programs. Some details of tests carried out on the programs are also given. Do not expect to duplicate these results exactly, they are simply a guide as to the kind of behaviour to expect. If you find a definite programming error, please report it via the Issues tab above (you will need to be signed into GitHub to do this).

Python has some advantages over Fortran: it is an interpreted language, rather than a compiled one, which allows faster code development, and it has a natural (and readable) programming style. Variables may be created and redefined at runtime, as required. Object-oriented programming fits very well with the Python design, but it is well suited to other styles as well. It is widely used as a vehicle to introduce students to scientific programming. For an excellent introductory text, see:

Python has one major drawback, when compared with Fortran and other compiled languages: it is very slow in execution. To partially counter this, the NumPy package offers more efficient handling of array structures. The syntax of NumPy is surprisingly close to that of Fortran in some respects, especially for combining arrays. On top of this, NumPy and the SciPy package provide a huge variety of scientific libraries, and our Python examples make extensive use of these. Our codes were originally written using NumPy v1, but the transition to NumPy v2 has involved very few, minor, changes to them.

Nonetheless, any code which cannot be handled in vectorized form by NumPy will still run slowly. Several strategies are available to address these issues (for example, Cython, SWIG, and F2PY which is part of NumPy), but we do not attempt to follow these here. As always, the main aim of these examples is to illustrate ideas in the text, not to provide programs for practical use. The reader should feel free to experiment with ways to make the programs run faster!

In the past few years, the community has made the transition from Python 2 to Python 3. There are some incompatibilities between the two, and since a choice had to be made, we settled on Python 3 for these examples. We indicate this by the string

#!/usr/bin/env python3

at the top of each source file. On many systems, this will allow the files to be run directly by typing their name, for example ./sample_mean.py; otherwise it will be necessary to type, for example, python3 sample_mean.py, or just python sample_mean.py, depending on your particular installation of Python. (Of course, in most cases, it will also be necessary to input data to the program.) The examples will not work with Python 2! For an introduction to the differences between Python 2 and Python 3, see the What's New in Python 3.0 page. The most obvious changes are

  1. print(a) is a function; print a will return an error.
  2. An expression like 1/2 will return a float; if you want a truncated integer, use 1//2.
  3. Various methods, such as dict.keys(), return views, and various functions, such as zip, return iterators, instead of lists.

Anyone coming from a Fortran background should note the use of indentation in Python to indicate the range of conditional constructs and loops; there is no end if statement! Fortran experts should also be aware that indices for arrays (and other entities) follow the C-convention of numbering from 0 upwards. Arrays cannot have negative indices; rather, the notation a[-1] refers to numbering backwards from the end of the array. A more subtle point is that, in the slice notation a[i:j], the upper index j is excluded, so for example a[1:3] consists of the elements a[1] and a[2], whereas in Fortran the analogous notation a(i:j) refers to elements i through j inclusive. Yet more subtlety lies in the distinction between an assignment statement which makes a fresh copy of an object, such as an array or array slice, and an assignment which merely gives a name to a view of that object. In the second case, no new memory locations are used, and changing the value associated with one of the objects will affect both of them: they are really the same object. This is a powerful, and memory-efficient, approach, but can be confusing!

Data Input

In the Fortran examples we use a namelist to input a few parameters from standard input, but Python does not have this. Instead, to provide a keyword based syntax, we input these values using the widespread JSON format. As a minimum, the program will expect to read {} (an empty list), and on a Linux/Unix/bash system a typical command line would be

echo '{}' | ./mc_nvt_lj.py

or similar. To change the parameters, the JSON list might include the new values like this

{ "nblock":20, "nstep":1000, "temperature":2.0 }

Alternatively the list may be supplied in a file, for instance run.inp containing

{
"nblock":20,
"nstep":10000,
"temperature":2.0
}

and the program run like this

./mc_nvt_lj.py < run.inp

As indicated, the "key":value pairs may be set out on different lines if you wish. The appearance is very similar to a Python dictionary, and indeed the data is loaded and parsed into a dictionary for further processing. Note carefully that we use a colon : rather than an equals sign to separate the key from the value, and that the keys should be enclosed in double quotes "...". To avoid some fairly basic exceptions later in the program, we usually type-check the data (Python enthusiasts may disapprove of this) so integer values must not have a decimal point, while floating-point values must have one (and at least one digit following the point, for example 1.0, otherwise JSON raises an exception). String values need to be enclosed, like the keys, in double quotes. Boolean (logical) values are indicated by true or false (whereas in Python they are denoted True and False respectively). The variables which may be set in this way are typically considered one by one in our programs: those whose names correspond to keys supplied in the input file are given the input values, while the others are given values taken from another defaults dictionary.

Initial Configuration

Simulation runs for bulk liquids require a starting configuration which can usually be prepared using the initialize.py program. The default parameters produce an FCC configuration of 256 atoms at reduced density ρ=0.75, writing out just the positions (for an MC program) to a file cnf.inp. You would run the program as described above, for instance on a Linux/Unix/bash system like this

echo '{}' | ./initialize.py

If the parameter "velocities":true is supplied within the JSON list, then positions and velocities are written to the file, corresponding to a reduced temperature T = 1.0. These values of ρ and T (see below) lie in the liquid region of the Lennard-Jones phase diagram. Non-default values may, of course, be supplied for this or other models. The cnf.inp file may then be copied to the directory in which the run is carried out. Typically, runs produce a final configuration cnf.out (which may be renamed to cnf.inp as a starting point for further runs) and intermediate configurations cnf.001, cnf.002 etc during the run.

Some of the programs simulate a single chain of atoms, without periodic boundary conditions. Initial configurations for these may also be prepared using the initialize.py program, selecting "molecules":"chain", an appropriate number of atoms, for example "n":13, and "velocities":true if required. There is an option "constraints":true if the velocities should be chosen with constraints applied relative to the bonds between neighbouring atoms in the chain.

A utility program, adjust.py takes in an MC or MD configuration and scales the velocities to change the kinetic energy per atom by a specified amount, and/or the positions (and the box length) to change the density by a specified amount. You may prefer to write your own program or script to perform these types of operation.

Visualizing configurations

Our simulation configuration files have a very simple format: the first line is the number of molecules, the second line is the periodic box length (we always use cubic boxes), or occasionally the bond length (for chain molecules which are simulated without a box), and the third and subsequent lines each contain the coordinates of one atom or molecule. The first three numbers on each of these lines are always the (x,y,z) position, in simulation units (e.g. Lennard-Jones σ=1). The rest of the numbers on each line contain velocities, orientation vectors etc., as appropriate.

This format is not compatible with most molecular visualization programs, such as JMOL or VMD. However, conversion into the basic XYZ format is easily achieved using a simple program, script, or editor. For example, on most Linux/Unix systems, the awk language will be available:

awk '(NR==1) {print} (NR==2) {print "Comment line"} (NR>2) {printf "%5s%15.6f%15.6f%15.6f\n", ("Ar"),($1*3.4),($2*3.4),($3*3.4)}' cnf.inp > cnf.xyz

This produces a file which should be recognized as a set of Argon atoms with positions (in Angstroms) appropriate to their van der Waals diameter. This can be read into a molecular visualizer, which will typically have various options for representing the atoms. No attempt is made here to represent the periodicity of the system in the cnf.xyz file.

Clone this wiki locally