Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
222 changes: 15 additions & 207 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,207 +1,15 @@
# Byte-compiled / optimized / DLL files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not remove gitignore contents

__pycache__/
*.py[codz]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py.cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# UV
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
#uv.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
#poetry.toml

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
# pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
# https://pdm-project.org/en/latest/usage/project/#working-with-version-control
#pdm.lock
#pdm.toml
.pdm-python
.pdm-build/

# pixi
# Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
#pixi.lock
# Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
# in the .venv directory. It is recommended not to include this directory in version control.
.pixi

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.envrc
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

# Abstra
# Abstra is an AI-powered process automation framework.
# Ignore directories containing user credentials, local state, and settings.
# Learn more at https://abstra.io/docs
.abstra/

# Visual Studio Code
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
# and can be added to the global gitignore or merged into this file. However, if you prefer,
# you could uncomment the following to ignore the entire vscode folder
# .vscode/

# Ruff stuff:
.ruff_cache/

# PyPI configuration file
.pypirc

# Cursor
# Cursor is an AI-powered code editor. `.cursorignore` specifies files/directories to
# exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
# refer to https://docs.cursor.com/context/ignore-files
.cursorignore
.cursorindexingignore

# Marimo
marimo/_static/
marimo/_lsp/
__marimo__/
__pycache__/
*.py[cod]
*.py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/

.env
.venv/
.DS_Store
36 changes: 3 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,46 +17,16 @@ Lance-Ray combines the distributed computing capabilities of Ray with the effici

## Installation

### Basic Installation
```bash
# Clone the repository
git clone https://github.com/lancedb/lance-ray.git
# Install from source
git clone https://github.com/lance-ray/lance-ray.git
cd lance-ray

# Install UV (if not already installed)
pip install uv

# Install in editable mode
uv pip install -e .
```

### Development Installation (with all dependencies)
```bash

# Clone the repository
git clone https://github.com/lancedb/lance-ray.git
cd lance-ray

# Install UV (if not already installed)
pip install uv

# Install with development dependencies
# Or install with development dependencies
uv pip install -e ".[dev]"
```

### Windows Specific Instructions
```bash
# If 'uv' command is still not recognized (especially on Windows),
# try restarting your terminal or use:
# Basic installation
python -m uv pip install -e .

# Development installation
python -m uv pip install -e ".[dev]"

```


## Requirements

- Python >= 3.10
Expand Down
35 changes: 35 additions & 0 deletions docs/contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Contributing to lance-ray

## Development setup

Install the latest development version with all dependencies:

```bash
git clone https://github.com/<your-username>/lance-ray.git
cd lance-ray
uv pip install -e .[dev]
```
# Requirements

- Python >= 3.8

- Ray >= 2.40.0

- PyLance >= 0.30.0

- lance-namespace >= 0.0.5

- PyArrow >= 17.0.0

- Pandas >= 2.2.0

- NumPy >= 2.0.0


# Running Tests

To run all tests using [pytest](https://docs.pytest.org/):

```bash
uv run pytest
```
57 changes: 57 additions & 0 deletions docs/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Examples

## Basic Usage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure the examples in README are all covered here, this seems to be missing the advanced example, the basic one is also not as detailed. For example, the filter example does not print out "print(f"Filtered count: {filtered_ds.count()}")" as the README does.


```python

import ray

import pandas as pd

from lance_ray import read_lance, write_lance

ray.init()

# Write a DataFrame to Lance
df = pd.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"]})

write_lance("example.lance", df)

# Read the dataset back

ds = read_lance("example.lance")

print(ds.take(3))

# Read only specific columns

ds = read_lance("example.lance", columns=["a"])

print(ds.take(3))

# Read with a filter expression

filtered_ds = read_lance("example.lance", filters="a > 1")

print(filtered_ds.take(3))

print(f"Filtered count: {filtered_ds.count()}")

## Advanced Usage

# Process data in parallel using Ray tasks
@ray.remote
def process_partition(partition):
return [x * 2 for x in partition["a"]]

# Split the dataset into 2 partitions

ds = read_lance("example.lance")

partitions = ds.split(2)

# Process each partition in parallel
results = ray.get([process_partition.remote(p) for p in partitions])

print(results)
```
Loading