Skip to content

Advanced Practical Magnus

Pre-release
Pre-release
Compare
Choose a tag to compare
@MagnusBieneck MagnusBieneck released this 08 Jul 15:19

Create a more stable version

The idea is that a small sample of the large data set is created. With the sample, the OpenRefine program can be created. This program is then executed in Scalable.OR, not on the sample. but on the entire data set. If one or multiple errors occur during the execution of Spark, the affected lines of the large data set should be automatically appended to the sample. As the sample now also contains the new, faulty lines, the OpenRefine program can be adjusted to handle these lines, too.