Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncing up code moves from IA #6

Merged
merged 29 commits into from
Dec 12, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
28c2995
ZipNum: fixes: cached locations: don't throw RunTimeException if requ…
ikreymer Oct 3, 2013
41d3a58
CDX: Add CloseableCompositeIterator which iterates in sequence, optim…
ikreymer Oct 4, 2013
b4f639d
ZIP & SLR improvements: store connectedUrl immediately on connection,…
ikreymer Oct 6, 2013
612cdeb
ADD: CloseableIteratorWrapper utility class for wrapping regular iter…
ikreymer Oct 14, 2013
c5dbc67
ZipNumBlock Loader: * add info on which shard failed to RuntimIOExcep…
ikreymer Oct 14, 2013
1fc6019
ZIPNUMLoader: attempt better error msgs by propagating full error in …
ikreymer Oct 14, 2013
936fb49
FIX: ApacheHttp31SLR save the connected url even on error!
ikreymer Oct 15, 2013
0474441
FIX: ApacheSLR: turn off cookies when using manual cookie
ikreymer Oct 17, 2013
f82aead
HttpSLR: add optional error header which can be saved
ikreymer Oct 17, 2013
647adec
MultiCDXInputSource: Make comparator public
ikreymer Oct 18, 2013
7d4fbf4
FIX: MultiCDXInputSource: add optimization for output lazy initing of…
ikreymer Oct 18, 2013
2d2d30e
FEATURE: add getTotalCount() to cdx input sources
ikreymer Oct 23, 2013
baf1ad8
log RuntimeIOException
ikreymer Oct 24, 2013
154386c
add ThreadLocalHttpConnectionManager (from heritrix-commons)
ikreymer Nov 1, 2013
cd43101
fix path for ThreadLocalHttpConnectionManager
ikreymer Nov 1, 2013
1c58cba
fix package
ikreymer Nov 1, 2013
da37977
move ThreadLocalHttpConnectionManager to original package org.archive…
ikreymer Nov 1, 2013
68dab84
FEATURE: add support for httpcore 4.3 for SeekableLineReader!
ikreymer Nov 2, 2013
54bdd7d
apach43SLR: better reading of entire buffer
ikreymer Nov 2, 2013
1e8c1af
canonicalizer: add ExtractRule and RewriteRule
ikreymer Nov 4, 2013
0ebbad2
FIX: add cdxlinefactory doesn't require custom format
ikreymer Nov 16, 2013
5077ad8
Extract outlinks/hopinfo from warc/metadata records
vinaygoel Nov 23, 2013
329ff22
reference pom.xml for building with CDH4
vinaygoel Nov 25, 2013
fc24be8
moving org.archive.net.PublicSuffixes to ia-web-commons
nlevitt Dec 6, 2013
a54dd8e
moving a bunch of stuff from heritrix-commons to ia-web-commons so th…
nlevitt Dec 6, 2013
908c7a1
Merge branch 'master' of github.com:internetarchive/ia-web-commons
anjackson Dec 6, 2013
a732a9e
Removed older PublicSuffix code.
anjackson Dec 11, 2013
4c195ca
Merge branch 'master' of github.com:iipc/iipc-web-commons
anjackson Dec 12, 2013
7653cc0
Added link to parent project.
anjackson Dec 12, 2013
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,6 @@ OpenWayback Web Commons

[![Build Status](https://travis-ci.org/iipc/iipc-web-commons.png?branch=master)](https://travis-ci.org/iipc/iipc-web-commons/)

This repository contains common utility code for the OpenWayback project.
This repository contains common utility code for the [OpenWayback][1] project.

[1]: https://github.com/iipc/openwayback
229 changes: 229 additions & 0 deletions pom-cdh4.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>org.archive</groupId>
<artifactId>ia-web-commons</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>

<name>ia-web-commons</name>
<url>http://maven.apache.org</url>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<build.time>${maven.build.timestamp}</build.time>
<maven.build.timestamp.format>yyyyMMddhhmmss</maven.build.timestamp.format>
</properties>

<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>

<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>14.0.1</version>
</dependency>

<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20090211</version>
</dependency>
<dependency>
<groupId>org.htmlparser</groupId>
<artifactId>htmlparser</artifactId>
<version>1.6</version>
</dependency>

<dependency>
<groupId>org.mozilla</groupId>
<artifactId>juniversalchardet</artifactId>
<version>1.0.3</version>
</dependency>

<dependency>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
<version>3.1</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>2.0.0-mr1-cdh4.2.0</version>
<exclusions>
<exclusion>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
<exclusion>
<groupId>javax.servlet.jsp</groupId>
<artifactId>jsp-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty</artifactId>
</exclusion>
<exclusion>
<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty-util</artifactId>
</exclusion>
<exclusion>
<groupId>tomcat</groupId>
<artifactId>jasper-runtime</artifactId>
</exclusion>
<exclusion>
<groupId>tomcat</groupId>
<artifactId>jasper-compiler</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.0.0-cdh4.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
<version>2.0.0-cdh4.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.0.0-cdh4.2.0</version>
</dependency>

<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pig</artifactId>
<version>0.11.1</version>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.5</version>
</dependency>

<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.4</version>
</dependency>

<dependency>
<groupId>org.gnu.inet</groupId>
<artifactId>libidn</artifactId>
<version>1.15</version>
</dependency>
<dependency>
<groupId>it.unimi.dsi</groupId>
<artifactId>mg4j</artifactId>
<version>1.0.1</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
<version>4.3</version>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<finalName>ia-web-commons</finalName>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
<resources>
<resource>
<directory>src/main/resources</directory>
<filtering>true</filtering>
</resource>
</resources>

</build>
<repositories>
<repository>
<id>internetarchive</id>
<name>Internet Archive Maven Repository</name>
<url>http://builds.archive.org:8080/maven2</url>
<layout>default</layout>

<releases>
<enabled>true</enabled>
<updatePolicy>daily</updatePolicy>
<checksumPolicy>warn</checksumPolicy>
</releases>
<snapshots>
<enabled>true</enabled>
<updatePolicy>daily</updatePolicy>
<checksumPolicy>warn</checksumPolicy>
</snapshots>
</repository>

<repository>
<id>cloudera</id>
<name>Cloudera Hadoop</name>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
<layout>default</layout>

<releases>
<enabled>true</enabled>
<updatePolicy>daily</updatePolicy>
<checksumPolicy>warn</checksumPolicy>
</releases>
<snapshots>
<enabled>true</enabled>
<updatePolicy>daily</updatePolicy>
<checksumPolicy>warn</checksumPolicy>
</snapshots>
</repository>

</repositories>

<distributionManagement>
<repository>
<id>repository</id>
<!--Pass as command-line system property to maven-->
<url>${repository.url}</url>
</repository>
</distributionManagement>

</project>
18 changes: 17 additions & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,23 @@
<artifactId>dsiutils</artifactId>
<version>2.0.12</version>
<scope>compile</scope>
</dependency>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
<version>4.3</version>
</dependency>
<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>1.6</version>
</dependency>
<dependency>
<groupId>fastutil</groupId>
<artifactId>fastutil</artifactId>
<version>5.0.7</version>
<scope>compile</scope>
</dependency>
</dependencies>

<build>
Expand Down
Loading