Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gorule-0000057 filter lines by provided_by #1553

Closed
dustine32 opened this issue Sep 16, 2020 · 12 comments
Closed

gorule-0000057 filter lines by provided_by #1553

dustine32 opened this issue Sep 16, 2020 · 12 comments
Assignees

Comments

@dustine32
Copy link
Contributor

For the MOD imports project, one requirement is that we filter the MOD GPAD to keep only lines where the Provided_by column (aka Assigned_by) equals the MOD. So only Provided_by=MGI in mgi.gpad or Provided_by=WB in wb.gpad. Provided_by=UniProt lines would be filtered out.

GOOD: MGI     MGI:1920971     enables GO:0043014      MGI:MGI:3794006|PMID:18163442   ECO:0000314                     20081211        MGI
BAD: MGI     MGI:1920971     part_of GO:0002177      MGI:MGI:3794006|PMID:18163442   ECO:0000314                     20120221        UniProt

We can handle this by expanding on the filter_out pattern currently existing in the mgi.yaml and wb.yaml dataset files by adding a separate filter_for or filter_in (maybe required_attributes?) section:

filter_for:
  provided_by:
    - MGI

Accepting a list of provided_by values will allow flexibility if we later want to start importing some other non-MOD-source lines like UniProt. I'll update the datasets.schema.yaml, mgi.yaml and wb.yaml files in a test branch.

Tagging @dougli1sqrd @kltm

@dustine32 dustine32 self-assigned this Sep 16, 2020
@ukemi
Copy link
Contributor

ukemi commented Sep 16, 2020

This looks correct. An alternative strategy would be to have MGI filter the file for only the annotations that will be imported, but it seems that having this general ability on the GOC end of things would be useful.

@dustine32
Copy link
Contributor Author

@ukemi Yeah, upstream filtering would be a sure way of handling this. I just automatically started porting over the pre-existing filtering functionality from gocamgen but we can use it as needed.

@pgaudet
Copy link
Contributor

pgaudet commented Sep 16, 2020

I agree with the upstream filtering, otherwise we need to restrict how Rule57 is applied, that seems to just move the problem elsewhere.

@pgaudet pgaudet changed the title GoRule57 filter lines by provided_by gorule-0000057 filter lines by provided_by Sep 17, 2020
@pgaudet
Copy link
Contributor

pgaudet commented Sep 17, 2020

Hi @dustine32 To make it easier (or even just possible) to find any references to rule, we've been rigorous about the format in tickets, please use gorule-nnnnnnn.

Thanks, Pascale

@ukemi
Copy link
Contributor

ukemi commented Sep 17, 2020

OK. @dustine32, we will create a GPAD2.0 file that only contains the annotations made by MGI curators using the MGI editorial interface. We will put it out on the test site for you.

@dustine32
Copy link
Contributor Author

Thanks @pgaudet for correcting the title!

@ukemi Yes, having the file already filtered at its upstream location would definitely do the job. I can start translating that GPAD2.0 file once we get the ontobio GPAD parser to consume 2.0 in a short while.

@pgaudet
Copy link
Contributor

pgaudet commented Mar 10, 2023

Hi @dustine32

What is the status of this? I suppose some version of this in done in the pipeline, but is not documented here: https://github.com/geneontology/go-site/blob/master/metadata/rules/gorule-0000057.md

@pgaudet
Copy link
Contributor

pgaudet commented Jan 8, 2024

Noting that this rule is being reported in the reports: http://snapshot.geneontology.org/reports/assigned-by-gorule-report.html; based on Dustin's answer below: should we suppress this? or is this relevant for the production code and we should include tests?

Other AI: clarify the formulation of the rule, mention 'filter_out' in the datasets.yaml files, and change status to implemented.

@pgaudet
Copy link
Contributor

pgaudet commented Jan 11, 2024

Hi @dustine32

Is this specifically applying to imports, and how is this triggered?

Thanks, Pascale

@dustine32
Copy link
Contributor Author

@pgaudet Yep, this was proposed for the imports project but not needed. I'll close but feel free to reopen.

@pgaudet
Copy link
Contributor

pgaudet commented Feb 22, 2024

Thanks, we'll just make sure to remove it from the reports (not sure why it's even coming up)

@kltm
Copy link
Member

kltm commented Feb 22, 2024

There is a reports filter list (variable in the pipeline), if something needs to be disappeared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants