A library for reading public RSS feeds using Spark Streaming.
Run a demo via:
# compile scala, run tests, build fat jar
sbt assembly
# run on spark
spark-submit --class RSSDemo target/scala-2.11/streaming-rss-html-assembly-0.0.1.jar http://somehost/somepath/to/rss
Add to your own project by adding this dependency in your build.sbt
:
libraryDependencies ++= Seq(
//...
"com.github.catalystcode" %% "streaming-rss-html" % "1.0.2",
//...
)
Currently, this RDDInputDStream polls the given RSS feed at the specified rated. All scraping of any HTML content is up to the caller.
- Configure your credentials via the
SONATYPE_USER
andSONATYPE_PASSWORD
environment variables. - Update
version.sbt
- Run
sbt
then from the sbt shell, do this:
sonatypeOpen "enter staging description here"
publishSigned
sonatypeRelease