Refactor py #12

ypriverol · 2024-09-25T12:58:11Z

PR Type

enhancement, tests

Description

Implemented new classes and methods for feature selection, including univariate, multivariate, and machine learning approaches.
Added comprehensive tests for the new feature selection methods and data handling classes.
Updated utility functions and constants to support new feature selection functionalities.
Introduced example scripts for data conversion and pipeline execution.
Updated package configuration and dependencies to reflect the new project scope.

Changes walkthrough 📝

Relevant files

Enhancement

9 files

ml.py `Add ML feature selection and cross-validation classes` fslite/fs/ml.py Introduced `FSMLMethod` class for ML feature selection. Added `MLCVModel` class for ML model creation with cross-validation. Implemented feature selection and evaluation methods.	+399/-0
fdataframe.py `Implement FSDataFrame class for feature selection` fslite/fs/fdataframe.py Introduced `FSDataFrame` class for feature selection. Implemented methods for handling sparse and dense matrices. Added feature scaling and selection methods.	+311/-0
multivariate.py `Add multivariate feature selection methods` fslite/fs/multivariate.py Added `FSMultivariate` class for multivariate feature selection. Implemented correlation and variance-based selection methods.	+219/-0
univariate.py `Implement univariate feature selection methods` fslite/fs/univariate.py Introduced `FSUnivariate` class for univariate feature selection. Implemented various univariate selection methods.	+217/-0
constants.py `Define constants and utilities for feature selection` fslite/fs/constants.py Defined constants for feature selection methods. Added utility functions for method validation.	+156/-0
loom2parquetchunks.py `Add script for loom to parquet conversion` examples/loom2parquetchunks.py Added script to convert loom files to parquet chunks. Implemented metadata handling and chunk processing.	+118/-0
utils.py `Update utilities for feature selection` fslite/fs/utils.py Updated utility functions for feature selection. Replaced pyspark imputer with sklearn's SimpleImputer.	+33/-16
methods.py `Define abstract class for feature selection methods` fslite/fs/methods.py Introduced abstract class `FSMethod` for feature selection. Defined error classes for invalid methods and data.	+78/-0
loom2parquetmerge.py `Add script for merging parquet files` examples/loom2parquetmerge.py Added script to merge parquet files incrementally. Implemented batch processing to manage memory usage.	+62/-0

Tests

4 files

test_univariate_methods.py `Add tests for univariate feature selection` fslite/tests/test_univariate_methods.py Added tests for univariate feature selection methods. Included tests for various selection techniques.	+146/-0
test_fsdataframe.py `Add tests for FSDataFrame class` fslite/tests/test_fsdataframe.py Added tests for `FSDataFrame` initialization and scaling. Included memory usage tests for large datasets.	+121/-0
test_multivariate_methods.py `Add tests for multivariate feature selection` fslite/tests/test_multivariate_methods.py Added tests for multivariate feature selection methods. Included tests for correlation and variance methods.	+122/-0
generate_big_tests.py `Add script to generate large test datasets` fslite/tests/generate_big_tests.py Added script to generate large test datasets. Implemented chunk processing for memory efficiency.	+56/-0

Documentation

1 files

fs_pipeline_example.py `Add example pipeline for feature selection` fslite/pipeline/fs_pipeline_example.py Added example pipeline for feature selection. Demonstrated univariate, multivariate, and ML methods.	+69/-0

Configuration changes

1 files

setup.py `Update package setup and dependencies` setup.py Updated package name and dependencies. Changed project references from `fsspark` to `fslite`.	+13/-10

💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

…refactor-py # Conflicts: # fsspark/fs/fdataframe.py

# Conflicts: # requirements.txt

…sts.

…rrelation

Co-authored-by: codiumai-pr-agent-pro[bot] <151058649+codiumai-pr-agent-pro[bot]@users.noreply.github.com>

codiumai-pr-agent-pro · 2024-09-25T13:00:13Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Reviewer Guide 🔍

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Key issues to review Performance Concern The `FSDataFrame` class uses a memory threshold to decide between sparse and dense matrix storage. This approach may not be optimal for all datasets and could lead to performance issues with very large datasets. Code Smell The `MLCVModel` class has a complex initialization process with many parameters. Consider refactoring to use a builder pattern or configuration object to simplify the interface. Potential Bug The memory usage test in `test_memory_fsdataframe` uses `memory_usage` which may not accurately measure the memory usage of the `FSDataFrame` class, especially for large datasets.

codiumai-pr-agent-pro · 2024-09-25T13:01:12Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Code Suggestions ✨

Category	Suggestion	Score
Enhancement	Use parameterized tests to reduce code duplication in univariate method tests Consider using parameterized tests to reduce code duplication across the different test functions for univariate methods. fslite/tests/test_univariate_methods.py [8-58] -def test_univariate_filter_corr(): +import pytest + +@pytest.mark.parametrize("fs_method, selection_mode, selection_threshold, expected_features", [ + ("u_corr", None, 0.3, 211), + ("anova", "percentile", 0.8, 4), + # Add more test cases for other univariate methods +]) +def test_univariate_filter(fs_method, selection_mode, selection_threshold, expected_features): """ - Test univariate_filter method with 'u_corr' method. - :return: None + Test univariate_filter method with different parameters. """ - - # import tsv as pandas DataFrame df = pd.read_csv(get_tnbc_data_path(), sep="\t") - - # create FSDataFrame instance fs_df = FSDataFrame(df=df, sample_col="Sample", label_col="label") - # create FSUnivariate instance - fs_univariate = FSUnivariate(fs_method="u_corr", selection_threshold=0.3) - - fsdf_filtered = fs_univariate.select_features(fs_df) - - assert fs_df.count_features() == 500 - assert fsdf_filtered.count_features() == 211 - - # Export the filtered DataFrame as Pandas DataFrame - df_filtered = fsdf_filtered.to_pandas() - df_filtered.to_csv("filtered_tnbc_data.csv", index=False) - - -# test the univariate_filter method with 'anova' method -def test_univariate_filter_anova(): - """ - Test univariate_filter method with 'anova' method. - :return: None - """ - - # import tsv as pandas DataFrame - df = pd.read_csv(get_tnbc_data_path(), sep="\t") - - # create FSDataFrame instance - fs_df = FSDataFrame(df=df, sample_col="Sample", label_col="label") - - # create FSUnivariate instance fs_univariate = FSUnivariate( - fs_method="anova", selection_mode="percentile", selection_threshold=0.8 + fs_method=fs_method, + selection_mode=selection_mode, + selection_threshold=selection_threshold ) fsdf_filtered = fs_univariate.select_features(fs_df) assert fs_df.count_features() == 500 - assert fsdf_filtered.count_features() == 4 + assert fsdf_filtered.count_features() == expected_features - # Export the filtered DataFrame as Pandas DataFrame df_filtered = fsdf_filtered.to_pandas() - df_filtered.to_csv("filtered_tnbc_data.csv", index=False) + df_filtered.to_csv(f"filtered_tnbc_data_{fs_method}.csv", index=False) Apply this suggestion Suggestion importance[1-10]: 8 Why: Parameterized tests significantly reduce code duplication and improve test maintainability, making this a valuable enhancement.	8
	Add a more informative docstring to the package's __init__.py file Consider adding a more descriptive comment or docstring to provide information about the package's purpose and functionality. fslite/init.py [1-2] -# eam -# 18.07.22 +""" +fslite: A Python package for memory-efficient, high-performance feature selection on big and small datasets. +Author: eam +Date: 18.07.22 +""" + Apply this suggestion Suggestion importance[1-10]: 8 Why: Adding a descriptive docstring improves code readability and provides context about the package's purpose, which is beneficial for users and maintainers.	8
	Use a more specific exception type for invalid scoring methods Consider using a more specific exception type instead of ValueError for invalid scoring methods. This can help in better error handling and debugging. fslite/fs/ml.py [99-102] if score_func not in score_func_mapping: - raise ValueError( + raise InvalidMethodError( f"Invalid score_func '{score_func}'. Valid options are: {list(score_func_mapping.keys())}" ) Apply this suggestion Suggestion importance[1-10]: 7 Why: Using a more specific exception type like InvalidMethodError can improve error handling and debugging, making it easier to identify the source of the error.	7
	Use an Enum class for feature selection methods to improve type safety and maintainability Consider using an Enum class for the feature selection methods instead of a nested dictionary. This would provide better type safety and make the code more maintainable. fslite/fs/constants.py [7-31] +from enum import Enum, auto + +class FSMethod(Enum): + ANOVA = auto() + U_CORR = auto() + F_REGRESSION = auto() + MUTUAL_INFO_REGRESSION = auto() + MUTUAL_INFO_CLASSIFICATION = auto() + FS_METHODS = { "univariate": { "title": "Univariate Feature Selection", "description": "Univariate feature selection refers to the process of selecting the most relevant features for " "a machine learning model by evaluating each feature individually with respect to the target " "variable using univariate statistical tests. It simplifies the feature selection process by " "treating each feature independently and assessing its contribution to the predictive " "performance of the model.", "methods": [ { - "name": "anova", + "name": FSMethod.ANOVA, "description": "Univariate ANOVA feature selection (f-classification)", }, - {"name": "u_corr", "description": "Univariate Pearson's correlation"}, - {"name": "f_regression", "description": "Univariate f-regression"}, + {"name": FSMethod.U_CORR, "description": "Univariate Pearson's correlation"}, + {"name": FSMethod.F_REGRESSION, "description": "Univariate f-regression"}, { - "name": "mutual_info_regression", + "name": FSMethod.MUTUAL_INFO_REGRESSION, "description": "Univariate mutual information regression", }, { - "name": "mutual_info_classification", + "name": FSMethod.MUTUAL_INFO_CLASSIFICATION, "description": "Univariate mutual information classification", }, ], }, Apply this suggestion Suggestion importance[1-10]: 7 Why: Using an Enum class can improve type safety and maintainability, but the current dictionary structure is functional and the improvement is not critical.	7
	Add import statements and function definitions to the io.py module Consider adding import statements or function definitions to make the io.py module more useful and functional. fslite/utils/io.py [1] +from typing import Union +import pandas as pd +import pyarrow as pa +from pyspark.sql import DataFrame, SparkSession +def import_table(file_path: str, sep: str = '\t', n_partitions: int = 5) -> DataFrame: + # Implementation here + pass +def import_table_as_psdf(file_path: str, sep: str = '\t', n_partitions: int = 5) -> pd.DataFrame: + # Implementation here + pass + Apply this suggestion Suggestion importance[1-10]: 7 Why: The suggestion adds useful functionality to the module, making it more practical and complete, although the specific implementations are not provided.	7
	Use a custom exception class for invalid data errors instead of ValueError Consider using a more specific exception type for `InvalidDataError` instead of `ValueError`. This would make it easier to catch and handle specific errors related to invalid data. fslite/fs/methods.py [72-78] -class InvalidDataError(ValueError): +class InvalidDataError(Exception): """ - Error raised when an invalid feature selection method is used. + Error raised when invalid data is provided for feature selection. """ def __init__(self, message): super().__init__(f"Invalid data frame: {message}") Apply this suggestion Suggestion importance[1-10]: 6 Why: Using a custom exception class improves error handling specificity, but the current use of ValueError is not incorrect.	6
	Add specific information about the datasets used in the experiments Consider adding more specific information about the datasets used in the experiments, such as their names, sizes, and characteristics. docs/EXPERIMENTS.md [1-4] ## Experiments and Benchmarks This document contains the experiments and benchmarks that were conducted to evaluate the performance of fslite. The experiments were conducted on the following datasets: +1. Dataset A: [Brief description, size, characteristics] +2. Dataset B: [Brief description, size, characteristics] +3. Dataset C: [Brief description, size, characteristics] + +Each dataset was chosen to represent different scenarios and challenges in feature selection tasks. + Apply this suggestion Suggestion importance[1-10]: 6 Why: Providing detailed information about the datasets enhances the documentation's usefulness and helps users understand the context and applicability of the experiments.	6
Best practice	Use numpy's isclose() function for more robust floating-point comparison when checking sparsity Consider using numpy's built-in `np.isclose()` function instead of a direct comparison when checking for sparsity. This can help avoid potential floating-point precision issues. fslite/fs/fdataframe.py [118-143] -if sparsity > sparse_threshold: +if np.isclose(sparsity, sparse_threshold, atol=1e-8) or sparsity > sparse_threshold: if dense_matrix_size < memory_threshold * available_memory: # Use dense matrix if enough memory is available logging.info( f"Data is sparse (sparsity={sparsity:.2f}) but enough memory available. " f"Using a dense matrix." ) self.__matrix = numerical_df.to_numpy(dtype=np.float32) self.__is_sparse = False else: # Use sparse matrix due to memory constraints logging.info( f"Data is sparse (sparsity={sparsity:.2f}), memory insufficient for dense matrix. " f"Using a sparse matrix representation." ) self.__matrix = sparse.csr_matrix( numerical_df.to_numpy(dtype=np.float32) ) self.__is_sparse = True else: # Use dense matrix since it's not sparse logging.info( f"Data is not sparse (sparsity={sparsity:.2f}), using a dense matrix." ) self.__matrix = numerical_df.to_numpy(dtype=np.float32) self.__is_sparse = False Apply this suggestion Suggestion importance[1-10]: 8 Why: This suggestion addresses potential floating-point precision issues, which can be crucial for ensuring correct behavior in numerical computations.	8
	Use a constant for the testdata directory path to improve maintainability Consider using a constant for the 'testdata' directory path to avoid repetition and make it easier to update if the directory structure changes. fslite/utils/datasets.py [7-22] +TESTDATA_DIR = Path(__file__).parent.parent / "testdata" + def get_tnbc_data_path() -> str: """ Return path to example dataset (TNBC) with 44 samples and 500 features. - """ - tnbc_path = Path(__file__).parent.parent / "testdata/TNBC.tsv.gz" - return tnbc_path.__str__() - + return str(TESTDATA_DIR / "TNBC.tsv.gz") def get_tnbc_data_missing_values_path() -> str: """ Return path to example dataset (TNBC) with missing values. + """ + return str(TESTDATA_DIR / "TNBC_missing.tsv") - """ - tnbc_path = Path(__file__).parent.parent / "testdata/TNBC_missing.tsv" - return tnbc_path.__str__() - Apply this suggestion Suggestion importance[1-10]: 7 Why: Defining a constant for the directory path enhances maintainability and reduces the risk of errors if the path changes, but the current implementation is not problematic.	7
	Update code examples to use keyword arguments for clarity Consider updating the code examples to use f-strings for better readability and consistency with modern Python practices. docs/README.data.md [42-44] -sdf = import_table('data.tsv.bgz', +sdf = import_table(file_path='data.tsv.bgz', sep='\t', n_partitions=5) Apply this suggestion Suggestion importance[1-10]: 5 Why: Using keyword arguments improves code readability and clarity, but the change is minor and does not significantly impact functionality.	5
Maintainability	Use a dictionary mapping for univariate methods to improve maintainability and extensibility Consider using a dictionary mapping for univariate methods instead of multiple if-elif statements. This can make the code more maintainable and easier to extend with new methods in the future. fslite/fs/univariate.py [145-162] -if univariate_method == "anova": - selected_features = self.univariate_feature_selector( - df, score_func="f_classif", kwargs - ) -elif univariate_method == "f_regression": - selected_features = self.univariate_feature_selector( - df, score_func="f_regression", kwargs - ) -elif univariate_method == "u_corr": - selected_features = univariate_correlation_selector(df, kwargs) -elif univariate_method == "mutual_info_classification": - selected_features = self.univariate_feature_selector( - df, score_func="mutual_info_classif", kwargs - ) -elif univariate_method == "mutual_info_regression": - selected_features = self.univariate_feature_selector( - df, score_func="mutual_info_regression", kwargs - ) +method_mapping = { + "anova": lambda: self.univariate_feature_selector(df, score_func="f_classif", kwargs), + "f_regression": lambda: self.univariate_feature_selector(df, score_func="f_regression", kwargs), + "u_corr": lambda: univariate_correlation_selector(df, kwargs), + "mutual_info_classification": lambda: self.univariate_feature_selector(df, score_func="mutual_info_classif", kwargs), + "mutual_info_regression": lambda: self.univariate_feature_selector(df, score_func="mutual_info_regression", kwargs), +} +selected_features = method_mapping.get(univariate_method, lambda: [])() + Apply this suggestion Suggestion importance[1-10]: 6 Why: This change enhances code maintainability and readability by reducing the complexity of method selection, making it easier to add new methods in the future.	6
Performance	Consider using a more efficient correlation computation method for large datasets Consider using a more efficient method to compute the correlation matrix for large datasets. The current implementation might not scale well for very large feature sets. fslite/fs/multivariate.py [131-139] # Compute correlation matrix if corr_method == "pearson": corr_matrix = np.corrcoef(f_matrix, rowvar=False) elif corr_method == "spearman": corr_matrix, _ = spearmanr(f_matrix) else: raise ValueError( f"Unsupported correlation method '{corr_method}'. Use 'pearson' or 'spearman'." ) +# For large datasets, consider using a more efficient method +# For example, you could use pandas' corr() method with a custom function +# that computes correlation for chunks of the data at a time + Apply this suggestion Suggestion importance[1-10]: 5 Why: While the suggestion is valid for improving performance with large datasets, it lacks a concrete implementation, making it less immediately actionable.	5

💡 Need additional feedback ? start a PR chat

codiumai-pr-agent-pro · 2024-09-26T05:25:18Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

CI Failure Feedback 🧐

Action: build-linux

Failed stage: Test with pytest [❌]

Relevant error logs:

1:  ##[group]Operating System
2:  Ubuntu
...

324:  #
325:  #     $ conda activate base
326:  #
327:  # To deactivate an active environment, use
328:  #
329:  #     $ conda deactivate
330:  ##[group]Run conda install flake8
331:  �[36;1mconda install flake8�[0m
332:  �[36;1m# stop the build if there are Python syntax errors or undefined names�[0m
333:  �[36;1mflake8 . --count --select=E9,F63,F7,F82 --show-source --statistics�[0m
334:  �[36;1m# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide�[0m
...

428:  Executing transaction: ...working... done
429:  ============================= test session starts ==============================
430:  platform linux -- Python 3.10.0, pytest-7.4.4, pluggy-1.5.0
431:  rootdir: /home/runner/work/fslite/fslite
432:  collected 13 items
433:  fslite/tests/test_fsdataframe.py ...                                     [ 23%]
434:  fslite/tests/test_multivariate_methods.py ....                           [ 53%]
435:  fslite/tests/test_univariate_methods.py .F....                           [100%]
436:  =================================== FAILURES ===================================
...

452:  def get_handle(
453:  path_or_buf: FilePath | BaseBuffer,
454:  mode: str,
455:  *,
456:  encoding: str | None = None,
457:  compression: CompressionOptions | None = None,
458:  memory_map: bool = False,
459:  is_text: bool = True,
460:  errors: str | None = None,
...

478:  supported for compression modes 'gzip', 'bz2', 'zstd' and 'zip'.
479:  .. versionchanged:: 1.4.0 Zstandard support.
480:  memory_map : bool, default False
481:  See parsers._parser_params for more information. Only used by read_csv.
482:  is_text : bool, default True
483:  Whether the type of the content passed to the file/buffer is string or
484:  bytes. This is not the same as `"b" not in mode`. If a string content is
485:  passed to a binary file/buffer, a wrapper is inserted.
486:  errors : str, default 'strict'
487:  Specifies how encoding and decoding errors are to be handled.
488:  See the errors argument for :func:`open` for a full list
489:  of options.
490:  storage_options: StorageOptions = None
491:  Passed to _get_filepath_or_buffer
492:  Returns the dataclass IOHandles
493:  """
494:  # Windows does not default to utf-8. Set to utf-8 for a consistent behavior
495:  encoding = encoding or "utf-8"
496:  errors = errors or "strict"
497:  # read_csv does not know whether the buffer is opened in binary/text mode
498:  if _is_binary_mode(path_or_buf, mode) and "b" not in mode:
499:  mode += "b"
500:  # validate encoding and errors
501:  codecs.lookup(encoding)
502:  if isinstance(errors, str):
503:  codecs.lookup_error(errors)
...

526:  ioargs.mode = ioargs.mode.replace("t", "")
527:  elif compression == "zstd" and "b" not in ioargs.mode:
528:  # python-zstandard defaults to text mode, but we always expect
529:  # compression libraries to use binary mode.
530:  ioargs.mode += "b"
531:  # GZ Compression
532:  if compression == "gzip":
533:  if isinstance(handle, str):
534:  # error: Incompatible types in assignment (expression has type
...

552:  # "Union[str, BaseBuffer]", "str", "Dict[str, Any]"
553:  handle = get_bz2_file()(  # type: ignore[call-overload]
554:  handle,
555:  mode=ioargs.mode,
556:  **compression_args,
557:  )
558:  # ZIP Compression
559:  elif compression == "zip":
560:  # error: Argument 1 to "_BytesZipFile" has incompatible type
...

564:  handle, ioargs.mode, **compression_args  # type: ignore[arg-type]
565:  )
566:  if handle.buffer.mode == "r":
567:  handles.append(handle)
568:  zip_names = handle.buffer.namelist()
569:  if len(zip_names) == 1:
570:  handle = handle.buffer.open(zip_names.pop())
571:  elif not zip_names:
572:  raise ValueError(f"Zero files found in ZIP file {path_or_buf}")
573:  else:
574:  raise ValueError(
...

576:  f"Only one file per ZIP: {zip_names}"
577:  )
578:  # TAR Encoding
579:  elif compression == "tar":
580:  compression_args.setdefault("mode", ioargs.mode)
581:  if isinstance(handle, str):
582:  handle = _BytesTarFile(name=handle, **compression_args)
583:  else:
584:  # error: Argument "fileobj" to "_BytesTarFile" has incompatible
...

591:  if "r" in handle.buffer.mode:
592:  handles.append(handle)
593:  files = handle.buffer.getnames()
594:  if len(files) == 1:
595:  file = handle.buffer.extractfile(files[0])
596:  assert file is not None
597:  handle = file
598:  elif not files:
599:  raise ValueError(f"Zero files found in TAR archive {path_or_buf}")
600:  else:
601:  raise ValueError(
602:  "Multiple files found in TAR archive. "
603:  f"Only one file per TAR archive: {files}"
604:  )
605:  # XZ Compression
606:  elif compression == "xz":
607:  # error: Argument 1 to "LZMAFile" has incompatible type "Union[str,
...

620:  handle = zstd.open(
621:  handle,
622:  mode=ioargs.mode,
623:  **open_args,
624:  )
625:  # Unrecognized Compression
626:  else:
627:  msg = f"Unrecognized compression type: {compression}"
628:  raise ValueError(msg)
...

632:  # Check whether the filename is to be opened in binary mode.
633:  # Binary mode does not support 'encoding' and 'newline'.
634:  if ioargs.encoding and "b" not in ioargs.mode:
635:  # Encoding
636:  handle = open(
637:  handle,
638:  ioargs.mode,
639:  encoding=ioargs.encoding,
640:  errors=errors,
641:  newline="",
642:  )
643:  else:
644:  # Binary mode
645:  >               handle = open(handle, ioargs.mode)
646:  E               FileNotFoundError: [Errno 2] No such file or directory: '../../examples/GSE156793.parquet'
647:  /usr/share/miniconda/lib/python3.10/site-packages/pandas/io/common.py:882: FileNotFoundError
648:  =========================== short test summary info ============================
649:  FAILED fslite/tests/test_univariate_methods.py::test_univariate_filter_big_corr - FileNotFoundError: [Errno 2] No such file or directory: '../../examples/GSE156793.parquet'
650:  ======================== 1 failed, 12 passed in 15.33s =========================
651:  ##[error]Process completed with exit code 1.

✨ CI feedback usage guide:

The CI feedback tool (/checks) automatically triggers when a PR has a failed check.
The tool analyzes the failed checks and provides several feedbacks:

Failed stage
Failed test name
Failure summary
Relevant error logs

In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:

/checks "https://github.com/{repo_name}/actions/runs/{run_number}/job/{job_number}"

where {repo_name} is the name of the repository, {run_number} is the run number of the failed check, and {job_number} is the job number of the failed check.

Configuration options

enable_auto_checks_feedback - if set to true, the tool will automatically provide feedback when a check is failed. Default is true.
excluded_checks_list - a list of checks to exclude from the feedback, for example: ["check1", "check2"]. Default is an empty list.
enable_help_text - if set to true, the tool will provide a help message with the feedback. Default is true.
persistent_comment - if set to true, the tool will overwrite a previous checks comment with the new feedback. Default is true.
final_update_message - if persistent_comment is true and updating a previous checks message, the tool will also create a new message: "Persistent checks updated to latest commit". Default is true.

See more information about the checks tool in the docs.

ypriverol and others added 30 commits September 19, 2024 15:46

first line

82e5c3b

first iteration of pandas fdataframe.py

deb2df6

first iteration of pandas fdataframe.py

70fec44

first iteration of pandas fdataframe.py

b99aee0

update fdataframe class

64fb1aa

minor refactory

cc471f0

Merge branch 'refactor-py' of https://github.com/bigbio/fsspark into …

30e0659

…refactor-py # Conflicts: # fsspark/fs/fdataframe.py

first iteration of pandas fdataframe.py

471dafa

Merge remote-tracking branch 'origin/refactor-py' into refactor-py

66d6118

# Conflicts: # requirements.txt

first iteration of pandas fdataframe.py

174196a

first iteration of pandas fdataframe.py

fa0d320

added test univariate corr

0a8080b

refactor univariate methods (corr)

8558656

update

d2ca24d

added methods to select features and update FSDataFrame

516b4c6

move from unitests to pytests

a787707

move from unitests to pytests

f75093d

minor changes to store sparse matrices

f15b4e8

fsspark -> fslite

ea15b18

fsspark -> fslite

a4de03c

better structure for methods in constants.py

032a422

better structure for methods in constants.py

c2312c8

fsspark -> fslite

10ee2e8

Minor changes in constants.py

a69ac12

black applied

3f56ded

clean more code.

1fafeb5

clean more code.

f2ce664

update in dependencies

6d1f54a

update in dependencies

a0181aa

update in dependencies

4a93621

ypriverol and others added 21 commits September 23, 2024 09:44

minor refinements

35f58a2

minor refinements

43dddb7

added example script to parse single-cell data

b6e8eab

implemented univariate selector methods (from sci-learn) and added te…

07a9dc5

…sts.

added implementation for multivariate methods: variance and matrix_co…

6c29cd8

…rrelation

added tests for multivariate

5cbd7da

loom2parquet examples

cc493f6

Update fslite/fs/utils.py

4250a4e

Co-authored-by: codiumai-pr-agent-pro[bot] <151058649+codiumai-pr-agent-pro[bot]@users.noreply.github.com>

Update fslite/tests/test_ml_methods.py

cc4e794

Co-authored-by: codiumai-pr-agent-pro[bot] <151058649+codiumai-pr-agent-pro[bot]@users.noreply.github.com>

Update fslite/tests/generate_big_tests.py

0ccd98d

Co-authored-by: codiumai-pr-agent-pro[bot] <151058649+codiumai-pr-agent-pro[bot]@users.noreply.github.com>

Update fslite/tests/generate_big_tests.py

0e24e2c

Co-authored-by: codiumai-pr-agent-pro[bot] <151058649+codiumai-pr-agent-pro[bot]@users.noreply.github.com>

delete ML methods

82a1a86

delete ML methods

e2f7b9c

Update examples/loom2parquetmerge.py

5a91f14

Co-authored-by: codiumai-pr-agent-pro[bot] <151058649+codiumai-pr-agent-pro[bot]@users.noreply.github.com>

Update fslite/tests/generate_big_tests.py

718b743

Co-authored-by: codiumai-pr-agent-pro[bot] <151058649+codiumai-pr-agent-pro[bot]@users.noreply.github.com>

Update fslite/fs/methods.py

681a823

Co-authored-by: codiumai-pr-agent-pro[bot] <151058649+codiumai-pr-agent-pro[bot]@users.noreply.github.com>

Update fslite/fs/ml.py

8608117

Co-authored-by: codiumai-pr-agent-pro[bot] <151058649+codiumai-pr-agent-pro[bot]@users.noreply.github.com>

Update fslite/fs/fdataframe.py

6ecbaca

Co-authored-by: codiumai-pr-agent-pro[bot] <151058649+codiumai-pr-agent-pro[bot]@users.noreply.github.com>

Update fslite/fs/multivariate.py

d5cc974

Co-authored-by: codiumai-pr-agent-pro[bot] <151058649+codiumai-pr-agent-pro[bot]@users.noreply.github.com>

delete ML methods

7ee27c8

delete ML methods

d1f74d6

ypriverol requested a review from enriquea September 25, 2024 12:58

codiumai-pr-agent-pro bot added enhancement New feature or request tests Review effort [1-5]: 4 labels Sep 25, 2024

ypriverol added 2 commits September 25, 2024 17:03

refactoring parquet SC generation

3909487

small changes

07cb771

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor py #12

Refactor py #12

ypriverol commented Sep 25, 2024 •

edited by codiumai-pr-agent-pro bot

Loading

codiumai-pr-agent-pro bot commented Sep 25, 2024

codiumai-pr-agent-pro bot commented Sep 25, 2024 •

edited

Loading

codiumai-pr-agent-pro bot commented Sep 26, 2024

Configuration options

Refactor py #12

Are you sure you want to change the base?

Refactor py #12

Conversation

ypriverol commented Sep 25, 2024 • edited by codiumai-pr-agent-pro bot Loading

PR Type

Description

Changes walkthrough 📝

codiumai-pr-agent-pro bot commented Sep 25, 2024

PR Reviewer Guide 🔍

codiumai-pr-agent-pro bot commented Sep 25, 2024 • edited Loading

PR Code Suggestions ✨

codiumai-pr-agent-pro bot commented Sep 26, 2024

CI Failure Feedback 🧐

Configuration options

ypriverol commented Sep 25, 2024 •

edited by codiumai-pr-agent-pro bot

Loading

codiumai-pr-agent-pro bot commented Sep 25, 2024 •

edited

Loading