Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overall Tasks on Repair #19

Open
17 of 31 tasks
yogeswarl opened this issue Mar 7, 2023 · 4 comments
Open
17 of 31 tasks

Overall Tasks on Repair #19

yogeswarl opened this issue Mar 7, 2023 · 4 comments
Assignees
Labels
Storyboard Describes all tasks in the project updates note down issues and update on a particular dataset or pipeline flow

Comments

@yogeswarl
Copy link
Member

yogeswarl commented Mar 7, 2023

No context

  1. Transformer Fine-Tuning
    • #24
    • Bert (or other one)
  2. Choice of pairing
    • query.docs
    • query.doc
    • docs.query
    • doc.query
  3. Query Set
    • msmarco.passage
    • msmarco.document
    • Aol-title
    • Aol-tile-url
    • Aol-text
    • Yahoo Q & A

Information Retrieval

  1. Sparse Retrieval
    • BM25
    • qld
  2. Dense Retrieval
  • Hybrid Retrieval

Evaluation

  • MAP
  • MRR
  • Success
  • nDCG

Stats

Supervised Query Refinement

  • Acg
  • Anmt
  • Hredqs
  • transformer

Context

Paper Writeup

  • Related Work
  • Benchmark on RePair
  • ...
@yogeswarl yogeswarl added updates note down issues and update on a particular dataset or pipeline flow Storyboard Describes all tasks in the project labels Mar 7, 2023
@yogeswarl yogeswarl self-assigned this Mar 7, 2023
@hosseinfani
Copy link
Member

@yogeswarl
any update?

@yogeswarl
Copy link
Member Author

Sorry @hosseinfani, Forgot to update this issue page. I have completed the MRR and boxes computation of title ,title-url . I am keeping them in a separate folder from the current publicly available data.
I have finished computing TCT Colbert for MSMarco originals. I still have to do the 25 predicted queries. The only problem is the tct index is 31GB with encoders of 441MB. The search is relatively slow when compared to BM25. But I am hoping to give you the results of MSMARCO's dense by this weekend. If this works. I will start with Hybrid retrieval as well. Once these are done, I will move on to AOL context version and run the whole results in the pipeline!

@hosseinfani
Copy link
Member

hosseinfani commented Apr 21, 2023

@yogeswarl
These are the remaining items. Please read and come up with timeline. I need you to drop by during office every week and report on your progress.

sparse experiment:

  • result of mrr, files, ...

colbert experiment:

  • Details of training, indexing, ... in readme
  • Details of subset of queries we did the retrieval and explain why? Selection criteria??
  • results of mrr, map, files, ...

Benchmarks

  • bm25 on acg, hredqs, ...
  • colbert on acg, hredqs, ...

On paper:

  • Benchmarks on RePair’s Gold Standard Datasets
  • RELATED WORK

@yogeswarl
Copy link
Member Author

Hello @DelaramRajaei and @ZahraTaherikhonakdar, This is the main tasks list of completed works on Repair. Can you please add the query sets and variants to this task to keep track of all the works done on repair.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Storyboard Describes all tasks in the project updates note down issues and update on a particular dataset or pipeline flow
Projects
None yet
Development

No branches or pull requests

2 participants