Skip to content

Coding style

fredjaya edited this page Nov 1, 2023 · 1 revision

As project size increases, consistency increases in importance. Unit testing and a consistent style are critical to having trusted code to integrate. Also, guesses about names and interfaces will be correct more often.

Naming variables

We aim to adhere, to a large extent, to PEP8.

  • Choose the name that people will most likely guess. Make it descriptive, but not too long: curr_record is better than c, or curr, or current_genbank_record_from_database.

  • Good names are hard to find. Don't be afraid to change names except when they are part of interfaces that other people are also using. It may take some time working with the code to come up with reasonable names for everything: if you have unit tests, it's easy to change them, especially with global search and replace.

  • Use singular names for individual things, plural names for collections. For example, you'd expect self.name to hold something like a single string, but self.names to hold something that you could loop through like a list or dict. Sometimes the decision can be tricky: is self.index an int holding a positon, or a dict holding records keyed by name for easy lookup? If you find yourself wondering these things, the name should probably be changed to avoid the problem: try self.position or self.look_up.

  • Don't make the type part of the name. You might want to change the implementation later. Use Records rather than RecordDict or RecordList, etc. Don't prefix the name with the type (i.e. Hungarian Notation).

  • Make the name as precise as possible. If the variable is the name of the input file, call it infile_name, not input or file (which you shouldn't use anyway, since they're keywords), and not infile (because that looks like it should be a file object, not just its name).

  • Use result to store the value that will be returned from a method or function. Use data for input in cases where the function or method acts on arbitrary data (e.g. sequence data, or a list of numbers, etc.) unless a more descriptive name is appropriate.

  • One-letter variable names should only occur in math functions or as loop iterators with limited scope. Limited scope covers things like for k in keys: print k, where k survives only a line or two. Loop iterators should refer to the variable that they're looping through: for k in keys, i in items, or for key in keys, item in items. If the loop is long or there are several 1-letter variables active in the same scope, rename them.

  • Limit your use of abbreviations. A few well-known abbreviations are OK (see below), but you don't want to come back to your code in 6 months and have to figure out what sptxck2 is. It's worth it to spend the extra time typing species_taxon_check_2, but that's still a horrible name: what's check number 1? Far better to go with something like taxon_is_species_rank that needs no explanation, especially if the variable is only used once or twice.

Acceptable abbreviations

The following list of abbreviations can be considered well-known and used with i mpunity within mixed name variables, but some should not be used by themselves a s they would conflict with common functions, python built-in's, or raise an exce ption. Do not use the following by themselves as variable names: dir, exp (a common math module function), in, max, and min. They can, however, be used as part of a name, eg matrix_exp.

Full Abbreviated
alignment aln
archaeal arch
auxillary aux
bacterial bact
citation cite
current curr
dictionary dict
directory dir
end of file eof
eukaryotic euk
frequency freq
expected exp
index idx
input in
maximum max
minimum min
mitochondrial mt
number num
observed obs
original orig
output out
parameter param
phylogeny phylo
previous prev
probability prob
protein prot
record rec
reference ref
sequence seq
standard deviation stdev
statistics stats
string str
structure struct
temporary temp
taxonomic tax
variance var

Comments

TODO: Refer to the numpy way of commenting

  • Always update the docstring when the code changes. Like outdated comments, outdated docstrings can waste a lot of time. "Correct examples are priceless, but incorrect examples are worse than worthless." Jim Fulton

Docstrings

See the numpy guidelines

Testing

TODO: Update to pytest.

Sample data

The directory tests/data contains a number of sample files that are useful for demonstration purposes.

>>> from cogent3 import load_aligned_seqs, load_tree
>>> aln = load_aligned_seqs('<path/to/Cogent3>/tests/data/brca1.fasta', moltype='dna')
>>> tree = load_tree('<path/to/Cogent3>/tests/data/murphy.tree')