-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite FASTA parser to Megaparsec #67
Conversation
3c56683
to
511f68a
Compare
src/Bio/FASTA/Type.hs
Outdated
parseToken :: (Char -> Bool) -> Parser a | ||
parseToken :: Parsec Void Text a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
а можешь объяснить это изменение — почему мы предикат убрали?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
а он не особо нужен, как мне кажется, мы можем прям на месте, где проверяем элемент, скормить парсеру функцию проверки на подходящий символ. Тащить её через все функции, учитывая что она одна, ну такое
src/Bio/FASTA/Parser.hs
Outdated
fastaLine :: ParsableFastaToken a => (Char -> Bool) -> Parser [a] | ||
fastaLine predicate = concat <$> many1' (many1' (parseToken predicate) <* many' (char ' ')) <* eol | ||
seqName :: Parser Text | ||
seqName = strip . pack <$> ((symbol ">" <?> ">") *> (manyTill anySingle myEnd <?> "sequence name")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
а <?> ">"
точно нужен, он сам это не выводит красиво?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
да, это уже я упоролся
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
что тут делать дальше
- я запросил у Леры выгрузку фаст, которые наш старый парсер не ел, надо будет их проверить
- надо подключить эту ветку как зависимость в
cobot-tools
и попробовать его обновить на использование этого парсера
сделать это можно так — добавить в cabal.project
source-repository-package
type: git
location: https://github.com/biocad/cobot-io.git
tag: ХЭШ КОММИТА С ТВОЕЙ ВЕТКИ
src/Bio/FASTA/Parser.hs
Outdated
fastaP :: ParsableFastaToken a => Parser (Fasta a) | ||
fastaP = fastaPGeneric | ||
|
||
fastaPGeneric :: ParsableFastaToken a => Parser (Fasta a) | ||
fastaPGeneric = many item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
эти функции ведь теперь ничем не отличаются, может только одну оставим?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
да, можно
а на счет лексера https://hackage.haskell.org/package/megaparsec-9.2.2/docs/Text-Megaparsec-Char-Lexer.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
на мой взгляд уже красиво!
теперь надо послушать что скажет Настя
let res = parseOnly fastaP ">this is my sequence\nIWELKKDVYVVELDWYPDAPGEMVVLTCDTPEEGITWTLDQSSE\n\n\nYYYYYYYYYYYYYYYYYYYYYYYY" | ||
res `shouldBe` Right [FastaItem @Char "this is my sequence" (bareSequence "IWELKKDVYVVELDWYPDAPGEMVVLTCDTPEEGITWTLDQSSEYYYYYYYYYYYYYYYYYYYYYYYY")] | ||
it "correctly parses incorrect sequence with several \\n between sequence parts" $ do | ||
let res = parseOnly (fastaP @Char) ">this is my sequence\nIWELKKDVYVVELDWYPDAPGEMVVLTCDTPEEGITWTLDQSSE\n\n\nYYYYYYYYYYYYYYYYYYYYYYYY" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
хм
надо будет уточнить у Насти и остальных насколько это incorrect
можешь пожалуйста в общий чат abscan+ylab2 написать вопрос? покажи пример фасты с такой дыркой, скажи что сейчас наш парсер не будет читать, скажи почему так хотим сделать и спроси норм ли
потому что старый читал, значит это будет регрессия
test/FastaParserSpec.hs
Outdated
toughParserTests | ||
|
||
parseOnly :: Parsec Void Text (Fasta a) -> Text -> Either String (Fasta a) | ||
parseOnly p s = first errorBundlePretty $ parse (p <* eof) "test.fasta" s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
а этот eof
точно надо теперь? мы же запихали его в сам парсер
@Gmihtt надо добавить тестовые фаста файлы в |
src/Bio/FASTA/Parser.hs
Outdated
fastaP = many (item isLetter) <* hidden space <* eof | ||
|
||
fastaPGeneric :: ParsableFastaToken a => (Char -> Bool) -> Parser (Fasta a) | ||
fastaPGeneric = many . item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
добавь сюда тоже <* hidden space <* eof
, FastaItem "Empty_ha_ha_ha" (bareSequence "") | ||
] | ||
|
||
correctFasta3 :: Fasta Char |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
а где correctFasta2?))))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
забыл(
f45b1cf
to
cd1f69d
Compare
cd1f69d
to
f45b1cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarrollNew пофикшено
No description provided.