Create a more stable version

The idea is that a small sample of the large data set is created. With the sample, the OpenRefine program can be created. This program is then executed in Scalable.OR, not on the sample. but on the entire data set. If one or multiple errors occur during the execution of Spark, the affected lines of the large data set should be automatically appended to the sample. As the sample now also contains the new, faulty lines, the OpenRefine program can be adjusted to handle these lines, too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced Practical Magnus

Create a more stable version