Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add column API mvp #100

Merged
merged 57 commits into from
Apr 13, 2024
Merged

Add column API mvp #100

merged 57 commits into from
Apr 13, 2024

Commits on Apr 22, 2022

  1. Add namespace stub

    ezmiller committed Apr 22, 2022
    Configuration menu
    Copy the full SHA
    da07f71 View commit details
    Browse the repository at this point in the history
  2. Add super naive colunn fn

    ezmiller committed Apr 22, 2022
    Configuration menu
    Copy the full SHA
    9eae7fc View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2022

  1. Configuration menu
    Copy the full SHA
    2c93f87 View commit details
    Browse the repository at this point in the history

Commits on Apr 30, 2022

  1. Add some simple column fns

    ezmiller committed Apr 30, 2022
    Configuration menu
    Copy the full SHA
    cfaffc6 View commit details
    Browse the repository at this point in the history

Commits on Jun 11, 2022

  1. Configuration menu
    Copy the full SHA
    04c9b41 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2eeeee8 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e5ef843 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    bcc3582 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    c0a7cb9 View commit details
    Browse the repository at this point in the history
  6. Polishing up existing column fns

    * added some docstrings
    * re-organized a little
    ezmiller committed Jun 11, 2022
    Configuration menu
    Copy the full SHA
    fb07581 View commit details
    Browse the repository at this point in the history

Commits on Jun 21, 2022

  1. Configuration menu
    Copy the full SHA
    d433c63 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6a08d61 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f86911f View commit details
    Browse the repository at this point in the history
  4. Add tests for zeros and ones

    ezmiller committed Jun 21, 2022
    Configuration menu
    Copy the full SHA
    186e764 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    851819f View commit details
    Browse the repository at this point in the history

Commits on Jun 25, 2022

  1. Configuration menu
    Copy the full SHA
    1d2cef3 View commit details
    Browse the repository at this point in the history

Commits on Jun 26, 2022

  1. Add column exploration html

    ezmiller committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    c81d13c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2788e3e View commit details
    Browse the repository at this point in the history

Commits on Jul 5, 2022

  1. Configuration menu
    Copy the full SHA
    14ca935 View commit details
    Browse the repository at this point in the history
  2. Use dtype alias in ns

    ezmiller committed Jul 5, 2022
    Configuration menu
    Copy the full SHA
    2d1d07e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e5c8322 View commit details
    Browse the repository at this point in the history
  4. Fix comment syntax

    ezmiller committed Jul 5, 2022
    Configuration menu
    Copy the full SHA
    e42a3d9 View commit details
    Browse the repository at this point in the history

Commits on Jul 17, 2022

  1. Configuration menu
    Copy the full SHA
    35ec106 View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2022

  1. Merge pull request #71 from scicloj/ethan/add-initial-column-api-fns

    Add initial column API functions
    ezmiller authored Aug 12, 2022
    Configuration menu
    Copy the full SHA
    3b4cec4 View commit details
    Browse the repository at this point in the history

Commits on Aug 19, 2022

  1. Configuration menu
    Copy the full SHA
    ac62b31 View commit details
    Browse the repository at this point in the history

Commits on Nov 14, 2022

  1. Update type interface to use type hierarchy in tablecloth.api.util (#76)

    * Add ->general-types function
    
    * Add a general type :logical
    
    * Use type hierarchy in tablecloth.api.utils for `typeof` functions
    
    * Add column dev branch to pr workflow
    
    * Add tests for typeof
    
    * Fix tests for typeof
    
    * Return the concrete type from `typeof`
    
    * Simplify `concrete-types` fn
    
    * Optimize ->general-types by using static lookup
    
    * Adjust fns listing types
    
    * We decided that the default meaning of type points to the "concrete"
    type, and not the general type.
    * So `types` now returns the set of concrete types and `general-types`
    returns the general types.
    
    * Revert "Adjust fns listing types"
    
    This reverts commit d93e34f.
    
    * Fix `typeof` test to test for concerete types
    
    * Reorganize `typeof?` tests
    
    * Reword docstring for `typeof?` slightly
    
    * Update column api template and add missing `typeof?`
    
    * Add commment to `general-types-lookup`
    
    * Improve `->general-types` docstring
    
    * Add `general-types` fn that returns sets of general types
    
    * Adjust util `types` fn to return concrete types
    ezmiller authored Nov 14, 2022
    Configuration menu
    Copy the full SHA
    6e7413b View commit details
    Browse the repository at this point in the history

Commits on Jan 23, 2023

  1. Configuration menu
    Copy the full SHA
    704156a View commit details
    Browse the repository at this point in the history

Commits on Feb 10, 2023

  1. Lift tech.v3.datatype.functional operations (#90)

    * Add ->general-types function
    
    * Add a general type :logical
    
    * Use type hierarchy in tablecloth.api.utils for `typeof` functions
    
    * Add column dev branch to pr workflow
    
    * Add tests for typeof
    
    * Fix tests for typeof
    
    * Return the concrete type from `typeof`
    
    * Simplify `concrete-types` fn
    
    * Optimize ->general-types by using static lookup
    
    * Adjust fns listing types
    
    * We decided that the default meaning of type points to the "concrete"
    type, and not the general type.
    * So `types` now returns the set of concrete types and `general-types`
    returns the general types.
    
    * Revert "Adjust fns listing types"
    
    This reverts commit d93e34f.
    
    * Fix `typeof` test to test for concerete types
    
    * Reorganize `typeof?` tests
    
    * Reword docstring for `typeof?` slightly
    
    * Update column api template and add missing `typeof?`
    
    * Add commment to `general-types-lookup`
    
    * Improve `->general-types` docstring
    
    * Add `general-types` fn that returns sets of general types
    
    * Adjust util `types` fn to return concrete types
    
    * Save changes to column api.clj
    
    * Save ongoing experiments with lifting
    
    * Save ongoing work on lifting
    
    * Adjust lift-ops-1 to handle any number of args with rest arg
    
    * Working `rearrange-args` fn
    
    * Save work actually writing lifted fns
    
    * Saving first attempt to writer operators
    
    * Add `percentiiles test
    
    * Adjust `rearrange-args to take new-args in option map
    
    * Unify two lift functions
    
    * Add in docstrings when present
    
    * Move lift utils into utils ns
    
    * Rename lifting namespaces
    
    * Lift some more fns
    
    * Make exclusions for ns header helper an arg
    
    * Add new operators and tests
    
    * Add ops with lhs rhs arg pattern
    
    * Lift '*
    
    * Add require to operators ns for utils
    
    * Update test to make it more complete
    
    * Lift `equals
    
    * Make test more accurate
    
    * Reorganize tests
    
    * Fix grammar
    
    * Lift 'shift
    
    * Uncomment 'or test
    
    * Lift 'normalize op
    
    * Life 'magnitude
    
    * Lifting bit manipulation ops
    
    * lift ieee-remainder
    
    * Lifting more functions
    
    * Add excludes
    
    * Lift a bunch of new functions
    
    * Alphebetize some lists
    
    * More alphebitization
    
    * Clean up
    
    * Instead of using `col` as arg conform to using `x & and `y
    
    * Temporarily disable failing test fix in 7.000-beta23
    
    * Disable the correct test
    
    * Just some minor cleanup in op tests
    
    * Some more cleanup/reorg in op tests
    
    * Update generated operators namespace with switch from col -> x etc
    
    * Lift 'descriptive-statistics
    
    * Fix messed up test layout
    
    * Lift 'quartiles
    
    * Lift 'fill-range and a bunch of reduce operations
    
    * Lift 'mean-fast 'sum-fast 'magnitude-squared
    
    * Lift correlation fns
    
    kendalls, pearsons, and spearmans
    
    * Lift cumulative ops
    
    * cleanup
    ezmiller authored Feb 10, 2023
    Configuration menu
    Copy the full SHA
    5dc5065 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    27c4f89 View commit details
    Browse the repository at this point in the history

Commits on Feb 11, 2023

  1. Bring column exploration doc up-to-date (#95)

    * Upgrade to latest clay version
    
    * Show using tablecloth.column.api.operators ns
    
    * Cleanup whitespace
    ezmiller authored Feb 11, 2023
    Configuration menu
    Copy the full SHA
    ff85c22 View commit details
    Browse the repository at this point in the history

Commits on Mar 5, 2023

  1. Add method for subsetting (#96)

    * Export tech.ml.dataset `select` fn for column api
    
    * Update docstring exported to api
    
    * Update column-exploration with basic illustration of select
    
    * Add `slice`
    
    * clean up tests a bit
    
    * Improve `slice` docstring slightly
    
    * Export `slice` to column api
    
    * Add stuff about `slice` to column exploration doc
    
    * Move accesssing & subsetting seciton above basic ops
    
    * Update column_expolration.html
    
    * Update comment block
    ezmiller authored Mar 5, 2023
    Configuration menu
    Copy the full SHA
    905c68e View commit details
    Browse the repository at this point in the history

Commits on Mar 10, 2023

  1. Add iteration support by wrapping tech.v3.dataset.column/column-map (#97

    )
    
    * Export tech.ml.dataset `select` fn for column api
    
    * Update docstring exported to api
    
    * Update column-exploration with basic illustration of select
    
    * Add `slice`
    
    * clean up tests a bit
    
    * Improve `slice` docstring slightly
    
    * Export `slice` to column api
    
    * Add stuff about `slice` to column exploration doc
    
    * Move accesssing & subsetting seciton above basic ops
    
    * Update column_expolration.html
    
    * Update comment block
    
    * Add column-map wrapper over tech.v3.dataset.column/column-mapping
    
    * Accepts columns in the first position to support use with pipes
    * If `col` is a vector of columns, then map-fn is run on all
    
    * Fix arg name
    
    * Clean up
    
    * Add iteration to column exploration and reorganize
    
    * Add column-map to column api_template
    
    * Add example of using column-map with multiple columns
    
    * Update column_exploration html doc
    
    * Update column_exploration html doc
    ezmiller authored Mar 10, 2023
    Configuration menu
    Copy the full SHA
    1ca9911 View commit details
    Browse the repository at this point in the history

Commits on Mar 12, 2023

  1. Add sorting support for column (#99)

    * Add rough version of `sort-column` with some tests
    
    * Add basic docstring
    
    * Add support for `:asc` and `:desc` to sort-column
    
    * Add note to handle missing values
    
    * Make slight improvement to sort-column docstringa
    ezmiller authored Mar 12, 2023
    Configuration menu
    Copy the full SHA
    ece8388 View commit details
    Browse the repository at this point in the history

Commits on Apr 14, 2023

  1. Improve support for missing values for column api (#101)

    * Export tech.ml.dataset `select` fn for column api
    
    * Update docstring exported to api
    
    * Update column-exploration with basic illustration of select
    
    * Add `slice`
    
    * clean up tests a bit
    
    * Improve `slice` docstring slightly
    
    * Export `slice` to column api
    
    * Add stuff about `slice` to column exploration doc
    
    * Move accesssing & subsetting seciton above basic ops
    
    * Update column_expolration.html
    
    * Update comment block
    
    * Add column-map wrapper over tech.v3.dataset.column/column-mapping
    
    * Accepts columns in the first position to support use with pipes
    * If `col` is a vector of columns, then map-fn is run on all
    
    * Fix arg name
    
    * Clean up
    
    * Add iteration to column exploration and reorganize
    
    * Add column-map to column api_template
    
    * Add example of using column-map with multiple columns
    
    * Update column_exploration html doc
    
    * Update column_exploration html doc
    
    * Export tech.v3.dataset.column's missing fns
    
    * Remove `set-missing`
    
    I think this may be more of an internal fn
    
    * Add `count-missing` function
    
    * Add test for `sort-column` for missing values
    
    * Activate test that wil now pass due to tmd upgrade
    
    * Add sort-column to api-template
    
    * Add sort-column section to column_exploration doc
    
    * Add more missing apidoc
    
    * move fns to their own namespace to mirror main tc api
    * add `drop-missing` and `replace-missing`
    
    * Add details about missing api to column exploration
    
    * Add a exmaple of using count to column exploration
    
    * Add a few simple tests for missing ns
    
    * Fix docstrings
    ezmiller authored Apr 14, 2023
    Configuration menu
    Copy the full SHA
    b9c5019 View commit details
    Browse the repository at this point in the history
  2. Add proof of concept

    ezmiller committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    a99b96f View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2023

  1. Consolidate tablecloth.column.api/operators args (#106)

    * Conslidate ops args to x y z
    
    * Fix lift op for comparison ops
    
    * Update lift-op fn to handle multiple ar lookups
    
    Case that required this was the comparison ops. We
    want (> x y z) from (> lhs rhs) (> lhs mid rhs). We
    can't universally map y to rhs because it would be
    wront for the 3-arity option.
    ezmiller authored Apr 23, 2023
    Configuration menu
    Copy the full SHA
    1790609 View commit details
    Browse the repository at this point in the history

Commits on Sep 29, 2023

  1. Lift column ops to the dataset level (#107)

    * Readme: Replace `lein test` with `lein midje`
    
    * Add proof of concept for lifting
    
    * Clean up
    
    * Fix magnitude arguments
    
    * Fix typo breaking lift operation for `magnitude
    
    * Save prototype working example that handles optional arguments
    
    * Clean up
    
    * Reorganize codegen utilities
    
    * moved hopefully common utilities up  into 'tablecloth.utils.codegen
    * retooled those helpers in that ns to be a bit more accessible (WIP)
    
    * Clean up
    
    * Clean up
    
    * Rejigger codegen for column ops to take just fn-sym arglists
    
    * Try lifting all column ops to ds (no tests yet)
    
    * Exclude ops that do not potentially return column
    
    * Do not lift options that do not return columns
    
    * Add docstrings for some codegen
    
    Also regenerated operators to make sure tests pass.
    
    * Add docstring to ds col ops
    
    * version bump and small fix
    
    * Modify ds-level lift op to also return fn that returns column
    
    This is a breaking change for the column api lifting until I adapt
    the lift-op to the changes made in the codegen where the argument
    is supplied in data rather than within a fn.
    
    * example added for replace-missing
    
    * Add tests for ops that take inf number of cols
    
    * Add tests for ops returning ds taking max of three cols
    
    * Add tests for ops returning ds and taking two columns max
    
    * Test for ops returning ds and max of one column
    
    * Add more functions to test for ops taking one col
    
    * Clean up
    
    * Lifted ops taking one column and returning a scalar
    
    * Lift functions taking two columns and returning a scalar
    
    * Clean up
    
    * Clean up
    
    * bump to 7.000-beta-50
    
    * fixes #108
    
    * hashing in joins enabled for every case
    
    * 7.000-beta-51
    
    * Clean up
    
    * Lift functions taking 1 col and returning scalar
    
    * Adjust column api lift ops to new declarative syntax
    
    * Adjust lift plan for tablecloth.column.api for tmd v7
    
    * Remove mention of tech.ml.datatype
    
    * Add missing word
    
    * Bump tmd version to 7.006 for fix to fns that were erroring
    
    fns are: quartiles-1, quartiles-3 and median
    
    * Fixing more tests
    
    * Comment some code to keep around for a spell
    
    * Remove special lift op for 'round
    
    It's arugments were fixed.
    
    * Cleanup
    
    * 7.007
    
    ---------
    
    Co-authored-by: Teodor Heggelund <git@teod.eu>
    Co-authored-by: genmeblog <38646601+genmeblog@users.noreply.github.com>
    Co-authored-by: GenerateMe <generateme.blog@gmail.com>
    Co-authored-by: adham-omran <git@adham-omran.com>
    5 people authored Sep 29, 2023
    Configuration menu
    Copy the full SHA
    e0479aa View commit details
    Browse the repository at this point in the history

Commits on Nov 3, 2023

  1. Ethan/lift scalar ops to ds as aggregators (#118)

    * Fix indentation
    
    * Save rough working example
    
    Not fully tested
    
    * Fix tests for new aggregator form of ops that return scalar
    ezmiller authored Nov 3, 2023
    Configuration menu
    Copy the full SHA
    eee60d3 View commit details
    Browse the repository at this point in the history

Commits on Nov 18, 2023

  1. Configuration menu
    Copy the full SHA
    f267bd2 View commit details
    Browse the repository at this point in the history

Commits on Dec 25, 2023

  1. Configuration menu
    Copy the full SHA
    e207ea3 View commit details
    Browse the repository at this point in the history

Commits on Jan 13, 2024

  1. Configuration menu
    Copy the full SHA
    66e14a2 View commit details
    Browse the repository at this point in the history

Commits on Feb 4, 2024

  1. Add column API documentation (#120)

    * Add a sample notebook file
    
    * Save draft work on column api doc
    
    * Add doc entry for tcc/select boolean select
    
    This appears to be broken now, but ti shouldn't be.
    
    * Export column api operators in column api ns
    
    * Add in some documentation of operations
    
    * Hide namespace expression from generated doc
    
    * Fix circular dependency
    
    * Update generated docs
    
    * Update text in colum operations section
    
    * More updates to the docs
    
    * Remove "Functionality" header in TOC
    
    This way Dataset is an entry, and I can add Column after that.
    
    * Add Column API documentation
    
    * Add an indication of column op signature to docs
    
    * Export lifted column operators in dataset api template
    
    * Add documentation for column operations on datasets
    
    * Some minor changes
    
    * Rename the two headers for Dataset and Column, adding API onto the
    end.
    * A few small fixes.
    
    * Remove the `Functions` section
    
    This is essentially replaced by the Column API that lifts these
    functions into Tablecloth
    
    * Try to remove cyclical dependency
    
    * Revert "Try to remove cyclical dependency"
    
    This reverts commit fcb16c4.
    
    * Fix circular dependency
    
    * Actually fix cyclical dependency
    
    * Undo added line
    ezmiller authored Feb 4, 2024
    Configuration menu
    Copy the full SHA
    ea430d4 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2c436cc View commit details
    Browse the repository at this point in the history

Commits on Feb 24, 2024

  1. Configuration menu
    Copy the full SHA
    26ae38e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    224e03f View commit details
    Browse the repository at this point in the history
  3. Add preview-branch to docs preview action

    Default was gh-pages, we use master.
    ezmiller committed Feb 24, 2024
    Configuration menu
    Copy the full SHA
    e2a2b82 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b937011 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    4e8dd2c View commit details
    Browse the repository at this point in the history

Commits on Mar 22, 2024

  1. Configuration menu
    Copy the full SHA
    5625351 View commit details
    Browse the repository at this point in the history

Commits on Mar 23, 2024

  1. Configuration menu
    Copy the full SHA
    515eb73 View commit details
    Browse the repository at this point in the history

Commits on Mar 29, 2024

  1. Configuration menu
    Copy the full SHA
    e105dba View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    7120166 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    7af5cbc View commit details
    Browse the repository at this point in the history

Commits on Apr 6, 2024

  1. Configuration menu
    Copy the full SHA
    840ebe0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    7275dbf View commit details
    Browse the repository at this point in the history

Commits on Apr 12, 2024

  1. Remove draft notebook

    ezmiller committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    3df66a3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8a25972 View commit details
    Browse the repository at this point in the history