Have dedicated HTML parsers for specific pages #27

wtimme · 2023-07-08T17:35:11Z

The current solution with the generic HTML parser works great for sites such as "Ultimate G...". For some sites (such as https://www.chords-and-tabs.net, for example), it does not work so great. I have to clean up the parsed text file quite a bit.

Therefore, I propose the following solution: The app should contain multiple HTML parsers, each for a dedicated site. As a fallback, the current "generic" parser could be used. The page-specific parsers could be unit-tested, ensuring that they work as expected. In addition, this allows for people to request/contribute parsers for the sites that they use without impacting the "generic" parser that we have right now.

The interface for the parsers could look as follows:

interface WebPageChordParser {
    // Provides feedback if the parser supports the given URL.
    fun supportsURL(url: String): Boolean

    // Attempts to converts the given HTML text to a plaintext document
    // which contains just the chords.
    fun convertHtmlToText(htmlText: String): String?

    // Attempts to determine the BPM of the song from the given HTML.
    fun extractBPMFromHtml(htmlText: String): Int?
}

Each parser (the generic one, too) would implement this interface. The WebSearchViewModel could then be provided with a list of these WebPageChordParser object and iterate over each of them, asking it if it supports the given URL. If no parser supports the given URL, the generic parser would take over, as a fallback.

The text was updated successfully, but these errors were encountered:

wtimme mentioned this issue Aug 2, 2023

Use artist and title from UG for the filename #32

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have dedicated HTML parsers for specific pages #27

Have dedicated HTML parsers for specific pages #27

wtimme commented Jul 8, 2023

Have dedicated HTML parsers for specific pages #27

Have dedicated HTML parsers for specific pages #27

Comments

wtimme commented Jul 8, 2023