forcedotcom · joroscoSF · Aug 22, 2025 · Aug 22, 2025 · Aug 22, 2025 · Sep 12, 2025
diff --git a/README.md b/README.md
@@ -124,6 +124,7 @@ Your Python dependencies can be packaged as .py files, .zip archives (containing
 Your entry point script will define logic using the `Client` object which wraps data access layers.
 
 You should only need the following methods:
+* `read_file(file_name)` - Returns a file handle of the provided file_name
 * `read_dlo(name)` – Read from a Data Lake Object by name
 * `read_dmo(name)` – Read from a Data Model Object by name
 * `write_to_dlo(name, spark_dataframe, write_mode)` – Write to a Data Model Object by name with a Spark dataframe

diff --git a/docs/file_reader_refactoring.md b/docs/file_reader_refactoring.md
@@ -0,0 +1,179 @@
+# DefaultFileReader Class Refactoring
+
+## Overview
+
+The `DefaultFileReader` class has been refactored to improve testability, readability, and maintainability. This document outlines the changes made and how to use the new implementation.
+
+## Key Improvements
+
+### 1. **Separation of Concerns**
+- **File path resolution** is now handled by dedicated methods
+- **File opening** is separated from path resolution
+- **Configuration management** is centralized and configurable
+
+### 2. **Enhanced Testability**
+- **Dependency injection** through constructor parameters
+- **Mockable methods** for unit testing
+- **Clear interfaces** between different responsibilities
+- **Comprehensive test coverage** with isolated test cases
+
+### 3. **Better Error Handling**
+- **Custom exception hierarchy** for different error types
+- **Descriptive error messages** with context
+- **Proper exception chaining** for debugging
+
+### 4. **Improved Configuration**
+- **Configurable defaults** that can be overridden
+- **Environment-specific settings** support
+- **Clear configuration contract**
+
+### 5. **Enhanced Readability**
+- **Comprehensive docstrings** for all methods
+- **Clear method names** that describe their purpose
+- **Logical method organization** from public to private
+- **Type hints** throughout the codebase
+
+## Class Structure
+
+### DefaultFileReader
+The main class that provides the file reading framework:
+
+```python
+class DefaultFileReader(BaseDataAccessLayer):
+    # Configuration constants
+    DEFAULT_CODE_PACKAGE = 'payload'
+    DEFAULT_FILE_FOLDER = 'files'
+    DEFAULT_CONFIG_FILE = 'config.json'
+
+    def __init__(self, code_package=None, file_folder=None, config_file=None):
+        # Initialize with custom or default configuration
+
+    def file_open(self, file_name: str) -> io.TextIOWrapper:
+        # Main public method for opening files
+
+    def get_search_locations(self) -> list[Path]:
+        # Get all possible search locations
+```
+
+## Exception Hierarchy
+
+```python
+FileReaderError (base)
+├── FileNotFoundError (file not found in any location)
+└── FileAccessError (permission, I/O errors, etc.)
+```
+
+## Usage Examples
+
+### Basic Usage
+```python
+from datacustomcode.file.reader.default import DefaultFileReader
+
+# Use default configuration
+reader = DefaultFileReader()
+with reader.file_open('data.csv') as f:
+    content = f.read()
+```
+
+### Custom Configuration
+```python
+from datacustomcode.file.reader.default import DefaultFileReader
+
+# Custom configuration
+reader = DefaultFileReader(
+    code_package='my_package',
+    file_folder='data',
+    config_file='settings.json'
+)
+```
+
+### Error Handling
+```python
+try:
+    with reader.file_open('data.csv') as f:
+        content = f.read()
+except FileNotFoundError as e:
+    print(f"File not found: {e}")
+except FileAccessError as e:
+    print(f"Access error: {e}")
+```
+
+## File Resolution Strategy
+
+The file reader uses a two-tier search strategy:
+
+1. **Primary Location**: `{code_package}/{file_folder}/{filename}`
+2. **Fallback Location**: `{config_file_parent}/{file_folder}/{filename}`
+
+This allows for flexible deployment scenarios where files might be in different locations depending on the environment.
+
+## Testing
+
+### Unit Tests
+The refactored class includes comprehensive unit tests covering:
+- Configuration initialization
+- File path resolution
+- Error handling scenarios
+- File opening operations
+- Search location determination
+
+### Mocking
+The class is designed for easy mocking in tests:
+```python
+from unittest.mock import patch
+
+with patch('DefaultFileReader._resolve_file_path') as mock_resolve:
+    mock_resolve.return_value = Path('/test/file.txt')
+    # Test file opening logic
+```
+
+### Integration Tests
+Integration tests verify the complete file resolution and opening flow using temporary directories and real file operations.
+
+## Migration Guide
+
+### From Old Implementation
+The old implementation had these issues:
+- Hardcoded configuration values
+- Mixed responsibilities in single methods
+- Limited error handling
+- Difficult to test
+
+### To New Implementation
+1. **Update imports**: Use `DefaultFileReader` from `datacustomcode.file.reader.default`
+2. **Error handling**: Catch specific exceptions instead of generic ones
+3. **Configuration**: Use constructor parameters for custom settings
+4. **Testing**: Leverage the new mockable methods
+
+## Benefits
+
+### For Developers
+- **Easier debugging** with clear error messages
+- **Better IDE support** with type hints and docstrings
+- **Simplified testing** with dependency injection
+- **Clearer code structure** with separated responsibilities
+
+### For Maintainers
+- **Easier to extend** with new file resolution strategies
+- **Better error tracking** with custom exception types
+- **Improved test coverage** with isolated test cases
+- **Clearer documentation** with comprehensive docstrings
+
+### For Users
+- **More reliable** with proper error handling
+- **More flexible** with configurable behavior
+- **Better debugging** with descriptive error messages
+- **Consistent interface** across different implementations
+
+## Future Enhancements
+
+The refactored structure makes it easy to add:
+- **Additional file resolution strategies** (URLs, cloud storage, etc.)
+- **File format detection** and automatic handling
+- **Caching mechanisms** for frequently accessed files
+- **Async file operations** for better performance
+- **File validation** and integrity checking
+
+## Conclusion
+
+The refactored `DefaultFileReader` class provides a solid foundation for file reading operations while maintaining backward compatibility. The improvements in testability, readability, and maintainability make it easier to develop, test, and maintain file reading functionality in the Data Cloud Custom Code SDK.
diff --git a/poetry.lock b/poetry.lock
diff --git a/pyproject.toml b/pyproject.toml
@@ -100,7 +100,7 @@ loguru = "^0.7.3"
 numpy = "*"
 pandas = "*"
 pydantic = "^1.8.2 || ^2.0.0"
-pyspark = "3.5.1"
+pyspark = "3.5.6"
 python = ">=3.10,<3.12"
 pyyaml = "^6.0"
 salesforce-cdp-connector = "*"

diff --git a/src/datacustomcode/client.py b/src/datacustomcode/client.py
@@ -24,9 +24,12 @@
 from pyspark.sql import SparkSession
 
 from datacustomcode.config import SparkConfig, config
+from datacustomcode.file.reader.default import DefaultFileReader
 from datacustomcode.io.reader.base import BaseDataCloudReader
 
 if TYPE_CHECKING:
+    import io
+
     from pyspark.sql import DataFrame as PySparkDataFrame
 
     from datacustomcode.io.reader.base import BaseDataCloudReader
@@ -112,6 +115,7 @@ class Client:
     _instance: ClassVar[Optional[Client]] = None
     _reader: BaseDataCloudReader
     _writer: BaseDataCloudWriter
+    _file: DefaultFileReader
     _data_layer_history: dict[DataCloudObjectType, set[str]]
 
     def __new__(
@@ -154,6 +158,7 @@ def __new__(
                 writer_init = writer
             cls._instance._reader = reader_init
             cls._instance._writer = writer_init
+            cls._instance._file = DefaultFileReader()
             cls._instance._data_layer_history = {
                 DataCloudObjectType.DLO: set(),
                 DataCloudObjectType.DMO: set(),
@@ -212,6 +217,11 @@ def write_to_dmo(
         self._validate_data_layer_history_does_not_contain(DataCloudObjectType.DLO)
         return self._writer.write_to_dmo(name, dataframe, write_mode, **kwargs)
 
+    def read_file(self, file_name: str) -> io.TextIOWrapper:
+        """Read a file from the local file system."""
+
+        return self._file.read_file(file_name)
+
     def _validate_data_layer_history_does_not_contain(
         self, data_cloud_object_type: DataCloudObjectType
     ) -> None:

diff --git a/src/datacustomcode/file/__init__.py b/src/datacustomcode/file/__init__.py
@@ -0,0 +1,14 @@
+# Copyright (c) 2025, Salesforce, Inc.
+# SPDX-License-Identifier: Apache-2
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/src/datacustomcode/file/base.py b/src/datacustomcode/file/base.py
@@ -0,0 +1,19 @@
+# Copyright (c) 2025, Salesforce, Inc.
+# SPDX-License-Identifier: Apache-2
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import annotations
+
+
+class BaseDataAccessLayer:
+    """Base class for data access layer implementations."""
diff --git a/src/datacustomcode/file/reader/__init__.py b/src/datacustomcode/file/reader/__init__.py
@@ -0,0 +1,14 @@
+# Copyright (c) 2025, Salesforce, Inc.
+# SPDX-License-Identifier: Apache-2
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.