Skip to content

Commit

Permalink
ENH: added functions to set format and storage options of spark dataf…
Browse files Browse the repository at this point in the history
…rames

Functions to set format and storage options of spark dataframes when calling spark namematching save.
Example usage:
   nm_obj.write().format('parquet').options(**options_dict).save(path)
  • Loading branch information
mbaak committed Apr 19, 2024
1 parent f616907 commit e43e6ed
Showing 1 changed file with 24 additions and 0 deletions.
24 changes: 24 additions & 0 deletions emm/helper/spark_custom_reader_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,30 @@ def _get_metadata_to_save(self):
}
return json.dumps(metadata, separators=[",", ":"])

def format(self, file_format: str):
"""Set the file format of ground truth datasets that are saved
Args:
file_format: storage format of spark dataframes, default is parquet.
Returns:
self
"""
self.file_format = file_format
return self

def options(self, **kwargs):
"""Set the other file storage options of ground truth datasets that are saved
Args:
kwargs: storage kw-args, passed on to: sdf.write.save(path, format=self.file_format, **self.kwargs)
Returns:
self
"""
self.store_kws = kwargs
return self


class SparkCustomReader(MLReader):
"""Spark Custom class reader"""
Expand Down

0 comments on commit e43e6ed

Please sign in to comment.