Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have dedicated HTML parsers for specific pages #27

Open
wtimme opened this issue Jul 8, 2023 · 0 comments
Open

Have dedicated HTML parsers for specific pages #27

wtimme opened this issue Jul 8, 2023 · 0 comments

Comments

@wtimme
Copy link
Collaborator

wtimme commented Jul 8, 2023

The current solution with the generic HTML parser works great for sites such as "Ultimate G...". For some sites (such as https://www.chords-and-tabs.net, for example), it does not work so great. I have to clean up the parsed text file quite a bit.

Therefore, I propose the following solution: The app should contain multiple HTML parsers, each for a dedicated site. As a fallback, the current "generic" parser could be used. The page-specific parsers could be unit-tested, ensuring that they work as expected. In addition, this allows for people to request/contribute parsers for the sites that they use without impacting the "generic" parser that we have right now.

The interface for the parsers could look as follows:

interface WebPageChordParser {
    // Provides feedback if the parser supports the given URL.
    fun supportsURL(url: String): Boolean

    // Attempts to converts the given HTML text to a plaintext document
    // which contains just the chords.
    fun convertHtmlToText(htmlText: String): String?

    // Attempts to determine the BPM of the song from the given HTML.
    fun extractBPMFromHtml(htmlText: String): Int?
}

Each parser (the generic one, too) would implement this interface. The WebSearchViewModel could then be provided with a list of these WebPageChordParser object and iterate over each of them, asking it if it supports the given URL. If no parser supports the given URL, the generic parser would take over, as a fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant