Skip to content

Merge Arrow into Main for Release #37

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Jul 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
f38157d
Threadpool executor (#22)
justinGilmer Jun 6, 2023
354a13a
Threaded arrow (#23)
justinGilmer Jun 6, 2023
5217656
Add 3.10 python to the testing matrix (#21)
justinGilmer Jun 6, 2023
c7ac177
Update pre-commit.yaml
justinGilmer Jun 6, 2023
1351e2b
Fix missing logging import, rerun pre-commit (#24)
justinGilmer Jun 6, 2023
d59839d
Add basic doc string to endpoint object (#25)
justinGilmer Jun 6, 2023
f41f711
Update benchmark scripts.
justinGilmer Jun 12, 2023
c07a5be
Multistream read bench insert bench (#26)
justinGilmer Jun 13, 2023
0a88f69
Add insert benchmarking methods (#27)
justinGilmer Jun 13, 2023
038c142
Fix arrow inserts (#28)
justinGilmer Jun 15, 2023
5e0ce6c
Start integration test suite
andrewchambers Jun 22, 2023
9cc9b03
Add more streamset integration tests.
andrewchambers Jun 22, 2023
caa919a
Add support for authenticated requests without encryption.
andrewchambers Jun 23, 2023
d89d28c
Optimize logging calls (#30)
justinGilmer Jun 23, 2023
744a6b8
Add more arrow tests and minor refactoring.
andrewchambers Jun 26, 2023
75d12f9
Merge branch 'staging' of github.com:PingThingsIO/btrdb-python into s…
andrewchambers Jun 26, 2023
a40ef4f
More integration test cases
andrewchambers Jul 1, 2023
79e74ea
Restructure tests.
andrewchambers Jul 1, 2023
e8d6869
Mark new failing tests as expected failures for now.
andrewchambers Jul 1, 2023
78901f1
Disable gzip compression, it is very slow.
andrewchambers Jul 6, 2023
0654465
Reenable test, server has been fixed.
andrewchambers Jul 12, 2023
915bf5f
Update pandas testing and fix flake8 issues (#31)
justinGilmer Jul 19, 2023
0beef06
Update docs for arrow (#35)
justinGilmer Jul 19, 2023
0c646d4
Only enable arrow-endpoints when version >= 5.30 (#36)
justinGilmer Jul 20, 2023
c529017
Update arrow notes, small doc changes. (#38)
justinGilmer Jul 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions .github/workflows/pre-commit.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: pre-commit

on:
pull_request:
branches:
- master
- staging
types:
- opened
- reopened
- ready_for_review
- synchronize

env:
SKIP: pytest-check

jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
token: ${{ secrets.GITHUB_TOKEN }}
fetch-depth: 0 # get full git history
- uses: actions/setup-python@v3
with:
cache: 'pip'
- name: Install pre-commit
run: |
pip install pre-commit
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@v21
with:
token: ${{ secrets.GITHUB_TOKEN }}
- name: Run pre-commit
uses: pre-commit/action@v2.0.3
with:
extra_args: --files ${{ steps.changed-files.outputs.all_changed_files }}
12 changes: 6 additions & 6 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: [3.7, 3.8, 3.9]
python-version: [3.7, 3.8, 3.9, '3.10']
os: [ubuntu-latest, macos-latest, windows-latest]

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }} ${{ matrix.os }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
Expand All @@ -39,7 +39,7 @@ jobs:
if: startsWith(github.ref, 'refs/tags/')
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: Create Release
id: create_release
uses: actions/create-release@v1
Expand All @@ -59,9 +59,9 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: '3.8'
- name: Install dependencies
Expand Down
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -118,3 +118,13 @@ dmypy.json

# Pyre type checker
.pyre/

# arrow parquet files
*.parquet

.idea
.idea/misc.xml
.idea/vcs.xml
.idea/inspectionProfiles/profiles_settings.xml
.idea/inspectionProfiles/Project_Default.xml
/.idea/
35 changes: 35 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
exclude: ^(setup.cfg|btrdb/grpcinterface)
- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black-jupyter
args: [--line-length=88]
exclude: btrdb/grpcinterface/.*\.py
- repo: https://github.com/pycqa/isort
rev: 5.11.5
hooks:
- id: isort
name: isort (python)
args: [--profile=black, --line-length=88]
exclude: btrdb/grpcinterface/.*\.py
- repo: https://github.com/PyCQA/flake8
rev: 6.0.0
hooks:
- id: flake8
args: [--config=setup.cfg]
exclude: ^(btrdb/grpcinterface|tests|setup.py|btrdb4|docs|benchmarks)
- repo: local
hooks:
- id: pytest-check
name: pytest-check
entry: pytest
language: system
pass_filenames: false
always_run: true
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,4 @@ global-exclude *.py[co]
global-exclude .ipynb_checkpoints
global-exclude .DS_Store
global-exclude .env
global-exclude .coverage.*
global-exclude .coverage.*
80 changes: 80 additions & 0 deletions benchmarks/benchmark_stream_inserts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
from time import perf_counter
from typing import Dict, List, Tuple, Union

import pyarrow

import btrdb


def time_stream_insert(
stream: btrdb.stream.Stream,
data: List[Tuple[int, float]],
merge_policy: str = "never",
) -> Dict[str, Union[int, float, str]]:
"""Insert raw data into a single stream, where data is a List of tuples of int64 timestamps and float64 values.

Parameters
----------
stream : btrdb.stream.Stream, required
The stream to insert data into.
data : List[Tuple[int, float]], required
The data to insert into stream.
merge_policy : str, optional, default = 'never'
How should the platform handle duplicated data?
Valid policies:
`never`: the default, no points are merged
`equal`: points are deduplicated if the time and value are equal
`retain`: if two points have the same timestamp, the old one is kept
`replace`: if two points have the same timestamp, the new one is kept
"""
prev_ver = stream.version()
tic = perf_counter()
new_ver = stream.insert(data, merge=merge_policy)
toc = perf_counter()
run_time = toc - tic
n_points = len(data)
result = {
"uuid": stream.uuid,
"previous_version": prev_ver,
"new_version": new_ver,
"points_to_insert": n_points,
"total_time_seconds": run_time,
"merge_policy": merge_policy,
}
return result


def time_stream_arrow_insert(
stream: btrdb.stream.Stream, data: pyarrow.Table, merge_policy: str = "never"
) -> Dict[str, Union[int, float, str]]:
"""Insert raw data into a single stream, where data is a pyarrow Table of timestamps and float values.

Parameters
----------
stream : btrdb.stream.Stream, required
The stream to insert data into.
data : pyarrow.Table, required
The table of data to insert into stream.
merge_policy : str, optional, default = 'never'
How should the platform handle duplicated data?
Valid policies:
`never`: the default, no points are merged
`equal`: points are deduplicated if the time and value are equal
`retain`: if two points have the same timestamp, the old one is kept
`replace`: if two points have the same timestamp, the new one is kept
"""
prev_ver = stream.version()
tic = perf_counter()
new_ver = stream.arrow_insert(data, merge=merge_policy)
toc = perf_counter()
run_time = toc - tic
n_points = data.num_rows
result = {
"uuid": stream.uuid,
"previous_version": prev_ver,
"new_version": new_ver,
"points_to_insert": n_points,
"total_time_seconds": run_time,
"merge_policy": merge_policy,
}
return result
Loading