Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add command to extract Neff scores for MSA #647

Merged
merged 4 commits into from
Jan 30, 2023

Conversation

neftlon
Copy link
Contributor

@neftlon neftlon commented Dec 12, 2022

As discussed in #638, this code adds a new command to output Neff scores. The command is called profile2neff. It takes a profile database as input and outputs per-residue Neff scores for a query sequence.

The scores are written to a DBWriter that then contains two lines for each sequence: a header similar to profile2pssm's output and a line containing tab-separated Neff scores (from the range [1;255]) for each residue. The score is converted from the internal float representation to char using the convertNeffToChar function from MathUtil.h.

@neftlon
Copy link
Contributor Author

neftlon commented Dec 12, 2022

It appears like some of the CI tests are not passing. Am I missing something or are parts of the CI pipeline broken? Can someone help me on that?

@milot-mirdita
Copy link
Member

No idea why windows is failing in azure, you didn't change anything that would affect that. Cirrus is currently okay to fail, something changed on their side and I didn't get around to fix the issue.

I think you are still using the wrong function. Neff is stored as a char, you need to use convertNeffToFloat to convert it back.

@neftlon
Copy link
Contributor Author

neftlon commented Dec 13, 2022

Ok good, then I will ignore these pipelines.

The Neff scores I use come from the neffM field of the Sequence.h class. According to the following code, these are stored as floats.

float *neffM;

Therefore I think the convertNeffToChar function is more appropriate since it takes a Neff score that is stored a float. (The convertNeffToFloat function expects its parameter as an unsigned char, which I don't have when using said neffM field.)

static char convertNeffToChar(const float neff) {
float retVal = std::min(255.0f, 1.0f+64.0f*flog2(neff) );
return std::max(static_cast<unsigned char>(1), static_cast<unsigned char>(retVal + 0.5) );
}
static float convertNeffToFloat(unsigned char neffToScale) {
float retNeff = fpow2((static_cast<float>(neffToScale)-1.0f)/64.0f);;
return retNeff;
}

Sorry if I am missing something here. Is there another location/a better way of extracting the Neff scores?

(I don't know whether this is just personal preference, but I like the idea of values not being floats when writing them to an output. A fixed range from [1;255] somehow sound more appealing to me than a floating point number with an obscure precision.)

@milot-mirdita
Copy link
Member

Okay, sorry I didn't remember the code very well. Your initial implementation without the MathUtil functions was correct, the Sequence object already deals with the correction to float. I wouldn't use convertNeffToChar here, it just spreads the possible range of Neffs (0 to 20, but more realistically 0 to 14) over the char range (0 to 255). I don't think it makes a lot of sense to print a value from 0 to 255.

@neftlon
Copy link
Contributor Author

neftlon commented Jan 16, 2023

Sorry for the long round-trip delay, I've reverted my changes to the original implementation :)

@milot-mirdita milot-mirdita merged commit 4148e09 into soedinglab:master Jan 30, 2023
@milot-mirdita
Copy link
Member

Thank you. I was traveling and forgot about the PR, sorry!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants