Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite FASTA parser to Megaparsec #67

Merged
merged 23 commits into from
Oct 4, 2022

Conversation

maksbotan
Copy link
Contributor

No description provided.

src/Bio/FASTA/Parser.hs Outdated Show resolved Hide resolved
@maksbotan maksbotan force-pushed the maksbotan/fasta-parser-megaparsec branch from 3c56683 to 511f68a Compare June 21, 2022 21:37
Comment on lines 32 to 33
parseToken :: (Char -> Bool) -> Parser a
parseToken :: Parsec Void Text a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

а можешь объяснить это изменение — почему мы предикат убрали?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

а он не особо нужен, как мне кажется, мы можем прям на месте, где проверяем элемент, скормить парсеру функцию проверки на подходящий символ. Тащить её через все функции, учитывая что она одна, ну такое

src/Bio/FASTA/Parser.hs Outdated Show resolved Hide resolved
fastaLine :: ParsableFastaToken a => (Char -> Bool) -> Parser [a]
fastaLine predicate = concat <$> many1' (many1' (parseToken predicate) <* many' (char ' ')) <* eol
seqName :: Parser Text
seqName = strip . pack <$> ((symbol ">" <?> ">") *> (manyTill anySingle myEnd <?> "sequence name"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

а <?> ">" точно нужен, он сам это не выводит красиво?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

да, это уже я упоролся

src/Bio/FASTA/Parser.hs Outdated Show resolved Hide resolved
Copy link
Contributor Author

@maksbotan maksbotan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

что тут делать дальше

  1. я запросил у Леры выгрузку фаст, которые наш старый парсер не ел, надо будет их проверить
  2. надо подключить эту ветку как зависимость в cobot-tools и попробовать его обновить на использование этого парсера

сделать это можно так — добавить в cabal.project

source-repository-package
  type: git
  location: https://github.com/biocad/cobot-io.git
  tag: ХЭШ КОММИТА С ТВОЕЙ ВЕТКИ

src/Bio/FASTA/Parser.hs Outdated Show resolved Hide resolved
Comment on lines 40 to 44
fastaP :: ParsableFastaToken a => Parser (Fasta a)
fastaP = fastaPGeneric

fastaPGeneric :: ParsableFastaToken a => Parser (Fasta a)
fastaPGeneric = many item
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

эти функции ведь теперь ничем не отличаются, может только одну оставим?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@maksbotan maksbotan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

на мой взгляд уже красиво!
теперь надо послушать что скажет Настя

let res = parseOnly fastaP ">this is my sequence\nIWELKKDVYVVELDWYPDAPGEMVVLTCDTPEEGITWTLDQSSE\n\n\nYYYYYYYYYYYYYYYYYYYYYYYY"
res `shouldBe` Right [FastaItem @Char "this is my sequence" (bareSequence "IWELKKDVYVVELDWYPDAPGEMVVLTCDTPEEGITWTLDQSSEYYYYYYYYYYYYYYYYYYYYYYYY")]
it "correctly parses incorrect sequence with several \\n between sequence parts" $ do
let res = parseOnly (fastaP @Char) ">this is my sequence\nIWELKKDVYVVELDWYPDAPGEMVVLTCDTPEEGITWTLDQSSE\n\n\nYYYYYYYYYYYYYYYYYYYYYYYY"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

хм
надо будет уточнить у Насти и остальных насколько это incorrect

можешь пожалуйста в общий чат abscan+ylab2 написать вопрос? покажи пример фасты с такой дыркой, скажи что сейчас наш парсер не будет читать, скажи почему так хотим сделать и спроси норм ли

потому что старый читал, значит это будет регрессия

toughParserTests

parseOnly :: Parsec Void Text (Fasta a) -> Text -> Either String (Fasta a)
parseOnly p s = first errorBundlePretty $ parse (p <* eof) "test.fasta" s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

а этот eof точно надо теперь? мы же запихали его в сам парсер

@maksbotan
Copy link
Contributor Author

@Gmihtt надо добавить тестовые фаста файлы в extra-source-files в package.yaml. можно через звездочку

fastaP = many (item isLetter) <* hidden space <* eof

fastaPGeneric :: ParsableFastaToken a => (Char -> Bool) -> Parser (Fasta a)
fastaPGeneric = many . item
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

добавь сюда тоже <* hidden space <* eof

@maksbotan maksbotan marked this pull request as ready for review September 26, 2022 14:53
@oddsome oddsome self-requested a review September 26, 2022 15:34
, FastaItem "Empty_ha_ha_ha" (bareSequence "")
]

correctFasta3 :: Fasta Char
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

а где correctFasta2?))))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

забыл(

@Gmihtt Gmihtt force-pushed the maksbotan/fasta-parser-megaparsec branch from f45b1cf to cd1f69d Compare October 3, 2022 10:42
@Gmihtt Gmihtt requested a review from CarrollNew October 3, 2022 11:27
cobot-io.cabal Outdated Show resolved Hide resolved
@maksbotan maksbotan force-pushed the maksbotan/fasta-parser-megaparsec branch from cd1f69d to f45b1cf Compare October 3, 2022 12:51
Copy link
Contributor Author

@maksbotan maksbotan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarrollNew пофикшено

@Gmihtt Gmihtt requested a review from CarrollNew October 3, 2022 13:47
@maksbotan maksbotan merged commit 061e1d4 into master Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants