scrala

scrala is a web crawling framework for scala, which is inspired by scrapy.

Installation

From Docker

Create a Dockerfile in your project.

FROM gaocegege/scrala:latest

// COPY the build.sbt and the src to the container

Run a single command in docker

docker run -v <your src>:/app/src -v <your ivy2 directory>:/root/.ivy2  gaocegege/scrala

From SBT

Step 1. Add it in your build.sbt at the end of resolvers:

resolvers += "jitpack" at "https://jitpack.io"

Step 2. Add the dependency

libraryDependencies += "com.github.gaocegege" % "scrala" % "0.1.5"

From Source Code

git clone https://github.com/gaocegege/scrala.git
cd ./scrala
sbt assembly

You will get the jar in ./target/scala-<version>/.

Example

import com.gaocegege.scrala.core.spider.impl.DefaultSpider
import com.gaocegege.scrala.core.common.response.Response
import java.io.BufferedReader
import java.io.InputStreamReader
import com.gaocegege.scrala.core.common.response.impl.HttpResponse
import com.gaocegege.scrala.core.common.response.impl.HttpResponse

class TestSpider extends DefaultSpider {
  def startUrl = List[String]("http://www.gaocegege.com/resume")

  def parse(response: HttpResponse): Unit = {
    val links = (response getContentParser) select ("a")
    for (i <- 0 to links.size() - 1) {
      request(((links get (i)) attr ("href")), printIt)
    }
  }

  def printIt(response: HttpResponse): Unit = {
    println((response getContentParser) title)
  }
}

object Main {
  def main(args: Array[String]) {
    val test = new TestSpider
    test begin
  }
}

Just like the scrapy, what you need to do is define a startUrl to tell me where to start, and override parse(...) to parse the response of the startUrl. And request(...) function is like yield scrapy.Request(...) in scrapy.

You can get the example project in the ./example/

For Developer

scrala is under active development, feel free to contribute documentation, test cases, pull requests, issues, and anything you want. I'm a newcomer to scala so the code is hard to read. I'm glad to see someone familiar with scala coding standards could do some code reviews for the repo :)

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
example		example
project		project
src/main		src/main
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
build.sbt		build.sbt
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scrala

Installation

From Docker

Create a Dockerfile in your project.

Run a single command in docker

From SBT

From Source Code

Example

For Developer

About

Releases

Packages

Contributors 3

Languages

dyweb/scrala

Folders and files

Latest commit

History

Repository files navigation

scrala

Installation

From Docker

Create a Dockerfile in your project.

Run a single command in docker

From SBT

From Source Code

Example

For Developer

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages