Skip to content

Releases: atecce/investigations

Recovery

07 Dec 00:19
Compare
Choose a tag to compare

Mainly a bugfix which addresses an issue where the database connection closed on an error.

Go conventions

06 Jul 15:52
Compare
Choose a tag to compare

Restructured to standard Go directory layout. Ready to run out of the box as expected.

Concurrency

15 Jun 05:12
Compare
Choose a tag to compare

Now scrapes lyrics.net concurrently at the lowest depth of the tree (the songs in the album).

Initial

05 Jun 13:53
Compare
Choose a tag to compare

Does a depth-first search scrape of lyrics on the website http://www.lyrics.net. Has a flag -s which allows you to start at the first artist that matches the regular expression passed to it in case the search was interrupted somehow. However, on the Amazon server it's running on, this version has not encountered any errors yet after two days of running. Has a "communicate" method which robustly handles non-OK status codes.

Looking to add concurrency right now with goroutines. Having a lot of fun learning about them, but stalled out for the time being in a position where too many forks happen and the program panics, and still learning the API of waitGroup.

This is a rewrite of what was probably already ready in Python a month ago, and then again ready in another implementation using scrapy. I was dissatisfied with the multiprocessing capability of Python and took it as an opportunity to teach myself golang, as concurrency is designed into the core of the language. Now just got back to where I was, nothing but improvement lies ahead.

Plan to implement scrapers on other websites as well.