Skip to content

What's new in URLFrontier 2.0

Compare
Choose a tag to compare
@jnioche jnioche released this 28 Apr 10:03
· 163 commits to master since this release

This is the first release named 2.x and a major step towards URL Frontier 2, which is being funded through the NGI0 Discovery Fund.

The main goal of this release was to introduce the concept of a distributed frontier in the API and have an implementation of the service which could work in a distributed fashion. For the latter, we implemented a service based on Apache Ignite. Ignite handles the detection of nodes, replication, failure management as well as key value storage. In addition, we used Apache Lucene for ordering and accessing the URLs within a queue.

The main changed to the API are the addition of the listNodes endpoint to return the list of nodes in the Frontier cluster, as well as the addition of a local field in most of the messages used by the API. This is in order to determine whether the corresponding action (e.g. GetStats) should be applied to the cluster as a whole (by default) or only to the targeted node.

The other two implementations of the service (Memory and RocksDB) work as previously.

The next releases will be focusing on robustness and resilience.