Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update web VEP input detection in JavaScript (e111) #717

Merged
merged 3 commits into from
Sep 5, 2023

Conversation

nuno-agostinho
Copy link
Contributor

@nuno-agostinho nuno-agostinho commented Aug 15, 2023

  1. Update JavaScript regex to validate VEP input formats.
  2. Preview CAID1 and Region input.

Related with:

Footnotes

  1. ENSVAR-5945: CAID is not currently supported in REST, so the preview will simply state that the variant ID was not found.

@nuno-agostinho nuno-agostinho changed the title Update web VEP input detection (e111) Update web VEP input detection in JavaScript (e111) Aug 15, 2023
@nakib103 nakib103 self-requested a review August 15, 2023 12:27
@nakib103 nakib103 self-assigned this Aug 15, 2023
@nuno-agostinho nuno-agostinho marked this pull request as ready for review August 15, 2023 14:01
if (
data.length === 1 &&
data[0].match(/^([^:]+):(\d+)-(\d+)(:[-\+]?1)?[\/:]([a-z]{3,}|[ACGTN-]+)$/i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex looks complex.

  1. Can you please break it down and also move it to a separate method?
  2. If you are marking it case insensitive you don't need to capitalise ACGTN.
  3. Please add in more examples of regions in the comments (line 269)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jyothishnt, when you mention breaking it down, do you mean something like this?

var chrom  = "([^:]+)";
var pos    = "(\\d+)-(\\d+)";
var strand = "(:[-\+]?1)?";
var allele = "([a-z]{3,}|[ACGTN-]+)";

const re   = new RegExp(`^${chrom}:${pos}${strand}[\/:]${allele}$`, "i");
data[0].match(re);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I had something rather like this in mind:

const parts = data[0].split(':');
const [chromosome, position, allele] = parts;
if (parts.length !== 3 || !chromosome || !position || !allele) {
  return;
}

const isValidPositionFormat = /^\d+-d+$/.test(position);
const isValidAlleleFormat = /^[a-z]{3,}$/i.test(allele) || /^[ACGTN-]+$/i.test(allele);

// chromosome has already been checked in the if-statemebt above
if (isValidPositionFormat && isValidAlleleFormat) {
  return 'region';
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return;
...
return 'region';

Sorry, I meant return false or return true in my snippet above. I imagined it as a method that would be called inside of the if-condition that currently contains the long regex. Something like if (this.isRegionFormat(data)) { return 'region'; }.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jyothishnt @azangru ,

Thanks for reviewing this PR.

Let me know if that makes sense.

Copy link
Contributor

@azangru azangru Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking that function by splitting : has a problem because the elements are not always separated by :.

But your regular expression was also checking for the presence of :, wasn't it? If a string does not have a :, it would fail this particular check (parts.length will not equal 3), just as the regex would.

Anyway, if you have a strong preference for the long regex, I don't have any objections. It is not the new site, after all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the regex allows the second : to be not present -
(:[-\+]?1)?[\/:]

Example here -
https://www.ensembl.org/info/docs/tools/vep/vep_formats.html#region

Yeah, it is better to not introduce a potential breaking changes for the current sites. Thanks!

@jgtate jgtate merged commit 621dc48 into Ensembl:main Sep 5, 2023
@nuno-agostinho nuno-agostinho deleted the fix/web-vep-js-input-form branch September 5, 2023 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants