Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions news/tn-structuredf.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
**Added:**

* create ClusterDataFrame class (DataFrame) to store a cluster of atoms

**Changed:**

* <news item>

**Deprecated:**

* <news item>

**Removed:**

* <news item>

**Fixed:**

* <news item>

**Security:**

* <news item>
51 changes: 51 additions & 0 deletions src/diffpy/clusterrender/clusterdataframe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
"""This module defines class ClusterDataFrame.

A local structure or cluster of atoms is represented in a DataFrame
format.
"""

import pandas as pd

# -------------------------


class ClusterDataFrame(pd.DataFrame):
"""Define a cluster of atoms in a pandas DataFrame format.

Each row corresponds to an atom, and columns represent
atomic properties: species, xyz coordinates, and (optionally)
coordination shells, specifying the central atom (0) and its
neighboring atoms (1, 2, ...).

Methods
-------
parse_structure(structure_input, site_index=0)
Parse structure data from a structure, a file, a dictionary,
or a DataFrame into ClusterDataFrame.

Attributes
----------
_constructor : property
Ensures that DataFrame operations return ClusterDataFrame objects.
"""

@property
def _constructor(self):
return ClusterDataFrame

def __init__(self, structure_input, site_index=0):
"""Initialize ClusterDataFrame from a Structure object, a file,
or generic DataFrame arguments.

Parameters
----------
structure_input : pymatgen.core.Structure, pathlib.Path, str,
dict, or pd.DataFrame
The input structure or cluster of atoms to be visualized.
site_index : int, optional
The index of atom in the structure to be treated as the
central atom.
Default is 0.
"""
# parse and load structure_input
self._parse_structure(structure_input, site_index)
64 changes: 64 additions & 0 deletions tests/test_clusterdataframe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import pandas as pd
import pytest

from diffpy.clusterrender.clusterdataframe import ClusterDataFrame

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably is working but it is a slightly different structure than our most recent standard. I think that Yuchen is writing good standard tests, and so is Caden. The standards make the tests easier to read for the reviewers because the intent becomes very clear, and the test to test the intent is right below it and so things become easier to review. Could I maybe ask you to check with those guys, or check their code and try and use that style? Thanks so much.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caden sent me an example from cmi. Will try to get to it today!

Tests for the ClusterDataFrame class.
"""
# set up test data
dict_input = {
"species": ["C", "O", "O"],
"x": [0.0, 1.0, -1.0],
"y": [0.0, 0.0, 0.0],
"z": [0.0, 0.0, 0.0],
}
df_input = pd.DataFrame(dict_input)

# basic outputs (specifying center but not coordination shells)
output_C0 = pd.DataFrame(
{
"species": ["C", "O", "O"],
"x": [0.0, 1.0, -1.0],
"y": [0.0, 0.0, 0.0],
"z": [0.0, 0.0, 0.0],
"shell": [0, None, None],
}
)
output_O1 = pd.DataFrame(
{
"species": ["O", "C", "O"],
"x": [0.0, -1.0, -2.0],
"y": [0.0, 0.0, 0.0],
"z": [0.0, 0.0, 0.0],
"shell": [0, None, None],
}
)

test_data = [
# (input, expected_output) or
# (input, test_index, expected_output)
# basic inputs: read from dict or DataFrame
# without any changes
(dict_input, df_input),
(df_input, df_input),
# with site_index specified
(dict_input, 0, df_input),
(dict_input, 1, output_O1),
]


@pytest.mark.parametrize("input_test_data", test_data)
def test_clusterdataframe(input_test_data):
"""Test ClusterDataFrame initialization and parsing."""
if len(input_test_data) == 2:
input_structure, expected_output = input_test_data
cdf = ClusterDataFrame(input_structure)
else:
input_structure, site_index, expected_output = input_test_data
cdf = ClusterDataFrame(input_structure, site_index=site_index)

# check if the output matches the expected DataFrame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need comments like this as they are a bit trivial.

pd.testing.assert_frame_equal(
cdf.reset_index(drop=True), expected_output.reset_index(drop=True)
)
Loading