-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update web VEP input detection in JavaScript (e111) #717
Conversation
if ( | ||
data.length === 1 && | ||
data[0].match(/^([^:]+):(\d+)-(\d+)(:[-\+]?1)?[\/:]([a-z]{3,}|[ACGTN-]+)$/i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex looks complex.
- Can you please break it down and also move it to a separate method?
- If you are marking it case insensitive you don't need to capitalise ACGTN.
- Please add in more examples of regions in the comments (line 269)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @jyothishnt, when you mention breaking it down, do you mean something like this?
var chrom = "([^:]+)";
var pos = "(\\d+)-(\\d+)";
var strand = "(:[-\+]?1)?";
var allele = "([a-z]{3,}|[ACGTN-]+)";
const re = new RegExp(`^${chrom}:${pos}${strand}[\/:]${allele}$`, "i");
data[0].match(re);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I had something rather like this in mind:
const parts = data[0].split(':');
const [chromosome, position, allele] = parts;
if (parts.length !== 3 || !chromosome || !position || !allele) {
return;
}
const isValidPositionFormat = /^\d+-d+$/.test(position);
const isValidAlleleFormat = /^[a-z]{3,}$/i.test(allele) || /^[ACGTN-]+$/i.test(allele);
// chromosome has already been checked in the if-statemebt above
if (isValidPositionFormat && isValidAlleleFormat) {
return 'region';
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return;
...
return 'region';
Sorry, I meant return false or return true in my snippet above. I imagined it as a method that would be called inside of the if-condition that currently contains the long regex. Something like if (this.isRegionFormat(data)) { return 'region'; }
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jyothishnt @azangru ,
Thanks for reviewing this PR.
-
Breaking that function by splitting
:
has a problem because the elements are not always separated by:
. -
This function implements the same logic as
detect_format
function used in ensembl-vep and ensembl-variation. Having same logic in different places was causing issue when we are doing updates in one places and the other place was not being updated. So, we recently made changes to have to only have one function in ensembl-variation -
Fix web VEP input format detection ensembl-vep#1477
Detect input format using VEP code ensembl-variation#1026
Unfortunately we need to keep this function also (making changes in 2 places is better than 3 at least !). Saying all these things is prelude to saying it is better for us if we can keep the logic same as the other function so it is easy to update in future -
https://github.com/Ensembl/ensembl-variation/blob/86bad9e537344c88df5fb04e8ee0fce0a7d59876/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm#L388
Let me know if that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaking that function by splitting : has a problem because the elements are not always separated by :.
But your regular expression was also checking for the presence of :
, wasn't it? If a string does not have a :
, it would fail this particular check (parts.length
will not equal 3), just as the regex would.
Anyway, if you have a strong preference for the long regex, I don't have any objections. It is not the new site, after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the regex allows the second :
to be not present -
(:[-\+]?1)?[\/:]
Example here -
https://www.ensembl.org/info/docs/tools/vep/vep_formats.html#region
Yeah, it is better to not introduce a potential breaking changes for the current sites. Thanks!
Related with:
Footnotes
ENSVAR-5945: CAID is not currently supported in REST, so the preview will simply state that the variant ID was not found. ↩