Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding the Principal component analysis with CPPTRAJ #1122

Open
satyajitkhatua09 opened this issue Dec 21, 2024 · 4 comments
Open

Regarding the Principal component analysis with CPPTRAJ #1122

satyajitkhatua09 opened this issue Dec 21, 2024 · 4 comments
Assignees
Labels

Comments

@satyajitkhatua09
Copy link

Hi Users/Developers,

I am having issues running PCA analysis with CPPTRAJ. It always stops showing the segmentation fault.

I am only considering the CA atoms of protein and P atoms of nucleic acid (a total of 28486 atoms). Therefore, my Covariance Matrix Size is 85,308 × 85,308 (3 × 28486 = 85,308), consuming a substantial memory of

Memory (bytes) = (85,308 × 85,308) ×size of a double-precision float (typically 8 bytes) ≈ 58GB.

This prompts me to do the analysis on ANDES HPC in Oak Ridge National Laboratory. However, the run still fails to produce results. The error message remains the same. So, are there any limitations on the number of atoms to be considered for the PCA analysis in CPPTRAJ? If so, how to change that (if possible)?

@drroe
Copy link
Contributor

drroe commented Jan 14, 2025

Hi, sorry for the delay here.

I haven't delved into the code yet but my suspicion is that there is an int somewhere being used in matrix indexing; max size of an int is 2147483647, which is much smaller than the size of your matrix. I'll try to get to this ASAP. Thanks for the report!

@drroe drroe self-assigned this Jan 14, 2025
@drroe drroe added the bug label Jan 14, 2025
@drroe
Copy link
Contributor

drroe commented Feb 26, 2025

Sorry, for the delay on this; recent issues at my workplace have made things challenging.

I should have asked this from the beginning: what version of cpptraj are you using? If it's an older version, this issue may have been fixed. Using the latest version (6.29.10 available via GitHub) I am able to process a 28486 x 28486 atom matrix without issues:

CPPTRAJ: Trajectory Analysis. V6.29.10 (GitHub)
    ___  ___  ___  ___
     | \/ | \/ | \/ | 
    _|_/\_|_/\_|_/\_|_

| Date/time: 02/26/25 09:08:23
| Available memory: 394.050 GB

INPUT: Reading input from 'largematrix.in'
  [parm amber.parm7]
	Reading 'amber.parm7' as Amber Topology
	Radius Set: modified Bondi radii (mbondi)
  [trajin final.1.nc 1 10]
	Reading 'final.1.nc' as Amber NetCDF
  [matrix name Large covar @1-28486 @28487-56972]
    MATRIX: Calculating covariance matrix, output is by atom.
	Matrix data set is 'Large'
	Start: 1  Stop: Final frame
	Mask1 is '@1-28486'
	Mask2 is '@28487-56972'
  [run]
---------- RUN BEGIN -------------------------------------------------

PARAMETER FILES (1 total):
 0: amber.parm7, 856922 atoms, 261525 res, box: Truncated octahedron, 257753 mol, 256476 solvent

INPUT TRAJECTORIES (1 total):
 0: 'final.1.nc' is a NetCDF (NetCDF3) AMBER trajectory with coordinates, time, box, Parm amber.parm7 (Truncated octahedron box) (reading 10 of 100)
  Coordinate processing will occur on 10 frames.

BEGIN TRAJECTORY PROCESSING:
.....................................................
ACTION SETUP FOR PARM 'amber.parm7' (1 actions):
  0: [matrix name Large covar @1-28486 @28487-56972]
	Mask [@1-28486] corresponds to 28486 atoms.
	Mask [@28487-56972] corresponds to 28486 atoms.
----- final.1.nc (1-10, 1) -----
 0% 11% 22% 33% 44% 56% 67% 78% 89% 100% Complete.

Read 10 frames and processed 10 frames.
TIME: Avg. throughput= 0.1719 frames / second.

ACTION OUTPUT:
TIME: Analyses took 0.0000 seconds.

DATASETS (1 total):
	Large "Large" (double matrix, matrix(covariance)), size is 7303069764 (58.425 GB)
    Total data set memory usage is at least 58.425 GB

RUN TIMING:
TIME:		Init               : 0.0000 s (  0.00%)
TIME:		Trajectory Process : 58.1840 s ( 86.16%)
TIME:		Action Post        : 9.3472 s ( 13.84%)
TIME:		Analysis           : 0.0000 s (  0.00%)
TIME:		Data File Write    : 0.0000 s (  0.00%)
TIME:		Other              : 0.0003 s (  0.00%)
TIME:	Run Total 67.5315 s
---------- RUN END ---------------------------------------------------
TIME: Total execution time: 68.3428 seconds.
--------------------------------------------------------------------------------
To cite CPPTRAJ use:
Daniel R. Roe and Thomas E. Cheatham, III, "PTRAJ and CPPTRAJ: Software for
  Processing and Analysis of Molecular Dynamics Trajectory Data". J. Chem.
  Theory Comput., 2013, 9 (7), pp 3084-3095.

@satyajitkhatua09
Copy link
Author

satyajitkhatua09 commented Feb 26, 2025 via email

@drroe
Copy link
Contributor

drroe commented Feb 26, 2025

I tried V6.24.0 which is bundled with Amber24 and it also worked, so maybe try that. I still recommend using the GitHub version if it's convenient for you though as that version is updated more frequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants