Skip to content

Releases: andrejev/Scalable.OR

Advanced Practical Magnus

08 Jul 15:19
Compare
Choose a tag to compare
Pre-release

Create a more stable version

The idea is that a small sample of the large data set is created. With the sample, the OpenRefine program can be created. This program is then executed in Scalable.OR, not on the sample. but on the entire data set. If one or multiple errors occur during the execution of Spark, the affected lines of the large data set should be automatically appended to the sample. As the sample now also contains the new, faulty lines, the OpenRefine program can be adjusted to handle these lines, too.