Releases: zinggAI/zingg
Releases · zinggAI/zingg
Zingg 0.4.0
What's Changed
- 0.3.4 by @sonalgoyal in #422
- GitBook: [#60] descriptions by @sonalgoyal in #423
- docs: Grammatical and Lexical fixes on README.md by @siddharth2798 in #418
- Fixing typo in generatingdocumentation @ 0.3.4 by @Akash-R-7 in #424
- 0.3.4 by @sonalgoyal in #425
- GitBook: [#62] trainMatch for verification by @sonalgoyal in #426
- adding multiple match type options to docs by @Akash-R-7 in #430
- 0.3.4 by @sonalgoyal in #431
- unifying datatype in generics by @Akash-R-7 in #437
- adding session template by @Akash-R-7 in #438
- matchType updated by @Bagotia16 in #434
- Spelling typos in python api docs by @Akash-R-7 in #433
- SnowFrame and its dependency by @Akash-R-7 in #439
- SnowPark and Spark core classes by @Akash-R-7 in #442
- Fix compilation/setup issues : Removed TestSnowFrame.java by @abhay447 in #452
- MATCH TYPE "NULL_OR___BLANK is incorrect by @jgransac in #454
- python postgres entity resolution example by @SuchandraDatta in #458
- Bump jackson-databind from 2.12.6.1 to 2.12.7.1 in /client by @dependabot in #455
- merge main changes to generics by @sonalgoyal in #453
- Generics by @vikasgupta78 in #477
- Generics by @vikasgupta78 in #478
- unused hash classes being moved by @vikasgupta78 in #480
- documenter related changes by @vikasgupta78 in #505
- merge issues by @vikasgupta78 in #507
- documenter junits and package name correction for matcher by @vikasgupta78 in #508
- junit should not use ZINGG_HOME by @vikasgupta78 in #515
- fixing junits by @vikasgupta78 in #524
- clean up session, context data issue #534 by @vikasgupta78 in #535
- junit not working by @vikasgupta78 in #529
- issue #427 difference between python and json config by @vikasgupta78 in #543
- fixing float, long similarity and hash issue by @vikasgupta78 in #537
- Documenter should have model stats issue #299 by @vikasgupta78 in #541
- Git auto build issue fixed by @vikasgupta78 in #547
- fixind findAndLabel phase and in generateDocs naming convention for col names by @vikasgupta78 in #546
- Added profiles for latest Spark versions by @morazow in #542
- issue #548 bump up release version by @vikasgupta78 in #549
- Revert "issue #548 bump up release version" by @sonalgoyal in #551
- Added new Exasol dialect description into connectors page by @morazow in #552
- documentation for issue #450 by @vikasgupta78 in #550
- added serialVersionUID by @vikasgupta78 in #554
- ddl schema definition instead of json for csv schema #473 by @vikasgupta78 in #557
- issue #473 and #285 by @vikasgupta78 in #559
- issue 561, null pointer due to insufficient data by @vikasgupta78 in #564
- Issue 415 source column in the input leads to an error by @vikasgupta78 in #574
- label updater giving exception #580 by @vikasgupta78 in #584
- license url change by @vikasgupta78 in #585
- Added readme on how to run the python test suite by @vikasgupta78 in #588
- multi platform image for docker and include vi by @vikasgupta78 in #590
- fromDDL instead of JSON for schema by @vikasgupta78 in #591
- issue #558 linker not working by @vikasgupta78 in #562
- Issue #595 code refactor by @vikasgupta78 in #596
- issue #593 doc update for recommend phase by @vikasgupta78 in #594
- Running using Databricks Connect #582 by @vikasgupta78 in #583
- 0.3.5 by @sonalgoyal in #599
- Issue 603 and 604 ZFrame changes by @vikasgupta78 in #605
- issue #607 refactor matcher by @vikasgupta78 in #609
- filter methods in zframe by @vikasgupta78 in #610
- Date sim func by @vikasgupta78 in #613
- Added jacoco plugin in pom.xml & improved coverage by @Manan-S0ni in #612
- doc_change by @Manan-S0ni in #614
- Issue #601 linker has inconsistent results by @vikasgupta78 in #617
- license information not coming downstream by @vikasgupta78 in #618
- Session wrapper and license interface by @vikasgupta78 in #619
- additional methods by @vikasgupta78 in #620
- obviousDupeString added in json by @vikasgupta78 in #623
- Obvious dupes condition in config by @vikasgupta78 in #624
- cache obv dupe pairs for performance by @vikasgupta78 in #626
- Obvious dupes performance optimisation by @vikasgupta78 in #628
- Array type support by @vikasgupta78 in #631
- python file changes for array by @vikasgupta78 in #632
- phase findAndLabel and trainMatch should be calling respective implementations #636 by @vikasgupta78 in #639
- ftd phase should create different model class #635 by @vikasgupta78 in #637
- refactoring by @vikasgupta78 in #633
- using ZinggWithSpark in python causes error #640 by @vikasgupta78 in #641
- ensuring order of features by @vikasgupta78 in #646
- Update settingUpZingg.md by @gnanaprakash-ravi in #642
- Instructions on prerequisite software needed to compile / run zingg by @vikasgupta78 in #647
- Deleted output120k.csv by @gnanaprakash-ravi in #657
- rm instances of dblogger by @gnanaprakash-ravi in #660
- 0.4.0 by @gnanaprakash-ravi in #663
- try to access jvm by @gnanaprakash-ravi in #672
- refactoring to remove static methods in Arguments issue #655 by @vikasgupta78 in #656
- Pipe util changes by @vikasgupta78 in #675
- refactor getting/writing cluster data by @vikasgupta78 in #676
- 0.4.0 by @gnanaprakash-ravi in #677
- renamed to runIncremental by @vikasgupta78 in #679
- rm same title as amazonS3.md by @gnanaprakash-ravi in #678
- UTs for clients.py by @gnanaprakash-ravi in #683
- issue #680 Update python api for read and write arguments to json by @vikasgupta78 in #681
- Fix broken docs links by @gnanaprakash-ravi in #685
- code fix for the client/parseArguments function by @gnanaprakash-ravi in #691
- UTs for Arguments class by @gnanaprakash-ravi in #686
- log level change issue #649 by @vikasgupta78 in #693
- Specify types inside diamond operators by @gnanaprakash-ravi in #698
- Issue #692 obv dupe changes for ftd and matcher by @vikasgupta78 in #694
- Renaming ObviousDupes to DeterministicMatching by @gnanaprakash-ravi in #712
- rename obv dupe to Det match by @vikasgupta78 in #713
- refactoring label updater by @vikasgupta78 in #717
- issue #720 generateDocs giving error for boolean by @vikasgupta78 in #721
- added sort column issue #724 by @vikasgupta78 in #725
- Reorder schema in config by @gnanaprakash-ravi in https://github....
Array Support Snapshot Release - Please use the offical 0.4.0 release instead of this
Pre-release
This is a snapshot release with array support as a data type for Zingg users. Examples are at https://github.com/zinggAI/zingg/blob/0.4.0/test/InMemPipeTestImages.py
zingg-0.3.4-SNAPSHOT-spark-3.1.2
Lots of goodies in this release - python interface, stop words, new match types
What's Changed
- Extract Stop words by @navinrathore in #186
- CI: Executing maven compile at each commit by @edmondo1984 in #189
- Introduce CodeQL pipeline on each commit by @edmondo1984 in #190
- New InMemoryPipe has been added by @navinrathore in #209
- Changes to support databases that use Jdbc driver by @navinrathore in #214
- Data processing for Stop Words removal by @navinrathore in #191
- MatchType 'DONOT USE' updated in the docs and TC added by @navinrathore in #222
- Added new format 'bigquery' by @navinrathore in #233
- Documentation for BigQuery connector by @navinrathore in #236
- Renamed ZINGG_ARGS_EXTRA to ZINGG_EXTRA_SPARK_CONF and ZINGG_EXTRA to ZINGG_EXTRA_JARS. by @navinrathore in #239
- Setting-Up Zingg Development Environment by @Aditya-R-Chakole in #240
- Documenter refactoring and handling error 'Path does not exist' by @navinrathore in #223
- Bump jackson-databind from 2.10.0 to 2.12.6.1 in client/pom.xml by @navinrathore in #195
- Bump poi-scratchpad from 3.16 to 5.2.1 in /client by @dependabot in #167
- From original data, select fields whose definition is provided in config to be written in output by @navinrathore in #217
- Removed dependencies of snowflake, mysql, cassandra, elastic, apache-poi, log4j from pom by @navinrathore in #248
- Exception handling in PipeUtil::read() by @navinrathore in #229
- Z columns doc to have different template. by @navinrathore in #257
- GenerateDocs becomes independent of Data by @navinrathore in #270
- Working with StopWord file if its header does not include the column 'StopWord' by @navinrathore in #274
- specify path for ZINGG_HOME by @chetan453 in #280
- Updated installation document to install maven using sudo apt by @navinrathore in #282
- blockSize - a new config paramter for max size of the block by @navinrathore in #272
- moved getRecords & setRecords from InMemoryPipe to Pipe by @chetan453 in #286
- Revert "moved getRecords & setRecords from InMemoryPipe to Pipe" by @sonalgoyal in #288
- Revert "Revert "moved getRecords & setRecords from InMemoryPipe to Pipe"" by @sonalgoyal in #294
- resolved errors by @chetan453 in #293
- Match type pin code by @RavirajBaraiya in #290
- Removed EMAIL, LICENSE, SPARK_MEM and elastic references from zingg.sh by @navinrathore in #253
- Match type email by @RavirajBaraiya in #291
- Documenter testcases by @navinrathore in #281
- More blocking functions by @navinrathore in #292
- Checking if default zinggDir exists else create it by @navinrathore in #297
- moved config files for junits by @chetan453 in #298
- support for python phases in zingg.sh by @navinrathore in #301
- release 0.3.4 by @navinrathore in #305
- Python User script support in zingg.sh by @navinrathore in #311
- python unit at compilation time by @RavirajBaraiya in #318
- Updates in Python classes and 'assessModel' python phase by @navinrathore in #313
- env variables can be defined in zingg.conf in addition to spark properties by @navinrathore in #303
- new phase PeekModel by @RavirajBaraiya in #319
- added api,python,config dirs into distribution package by @navinrathore in #323
- added csv for testPeekModel by @RavirajBaraiya in #324
- new python phase exportModel by @RavirajBaraiya in #325
- API chnages issue part 2,4,5 by @RavirajBaraiya in #326
- rename matchtype dont use to dont_use by @RavirajBaraiya in #328
- modified TestDSUtil by @RavirajBaraiya in #331
- Python API - Specialized Pipes for SnowFlake, BigQuery etc. by @navinrathore in #327
- formatting of help message by @RavirajBaraiya in #332
- Documenter changes issue #335 by @RavirajBaraiya in #343
- pip package artifects by @RavirajBaraiya in #344
- Revert "release 0.3.4" by @sonalgoyal in #345
- zingg pip package artifacts modification by @RavirajBaraiya in #356
- Null Pointer check in "Range" hash functions by @navinrathore in #350
- proper handling of case when zingg config file does not exit by @navinrathore in #358
- python api dir deleted and moved FebrlExample.py by @RavirajBaraiya in #362
- Removed test involving reading generated file by @navinrathore in #368
- To fix Databrics UserWarning: DataFrame constructor is internal... by @navinrathore in #373
- getUnmarkedRecords() - updated to the version with correct functionality and fixed its name by @navinrathore in #372
- inmemorypipe accepts pandas df by @navinrathore in #376
- fixed the broken link for pipes.md by @shefalika-thapa in #381
- Tests for getAs() for Double, Integer types by @navinrathore in #375
- python examples by @RavirajBaraiya in #382
- python api doc by @RavirajBaraiya in #385
- Added specific pipes property constants by @navinrathore in #374
- testWriteArgumentObjectToJSONFile class modification by @RavirajBaraiya in #387
- Double similarity function - null pointer exception by @navinrathore in #369
- manofest.in changes- only add febrl and amazonGoogle example by @RavirajBaraiya in #394
- modification for jar and deps issue #308 by @RavirajBaraiya in #398
- added stopword functionality in zingg FieldDefinition by @RavirajBaraiya in #392
- recommender phase issue #336 by @RavirajBaraiya in #399
- TCs for String Similarity Distance function by @navinrathore in #371
- added python script to run all with febrl example python unittest by @RavirajBaraiya in #395
- Csvpipe by @RavirajBaraiya in #402
- mdification according to pipes changes by @RavirajBaraiya in #405
- Pipe by @RavirajBaraiya in #406
- testGetAs changes by @RavirajBaraiya in #410
- Pipes changes issue #401 by @RavirajBaraiya in #411
- Revert "Pipe" by @sonalgoyal in #412
- setStopword changes by @RavirajBaraiya in #413
- Removed Format type by @RavirajBaraiya in #409
New Contributors
- @edmondo1984 made their first contribution in #189
- @Aditya-R-Chakole made their first contribution in #240
- @chetan453 made their first contribution in #280
- @RavirajBaraiya made their first contribution in #290
- @shefalika-thapa made their first contribution in #381
Full Changelog: v0.3.3...v0.3.4
zingg-0.3.3-SNAPSHOT-spark-3.1.2
What's Changed
- Document for Exporting labeled data as training samples #117 by @navinrathore in #130
- data and scripts for amazon-google dataset by @navinrathore in #131
- Revert "data and scripts for amazon-google dataset " by @sonalgoyal in #135
- Migration to GitBook Document by @navinrathore in #139
- Updated links in README.md by @navinrathore in #140
- Betterment of documentation pages by @navinrathore in #143
- New Datasets iTunes-amazon, beerAdvo-rateBeer and respective models by @navinrathore in #144
- prediction score: set precision to 2 with ~ floor mode by @navinrathore in #145
- In TrainMatcher class, ZinggOptions set to TRAIN_MATCH by @navinrathore in #149
- Introduced new phase findLabel by @navinrathore in #147
- Documentaion for Working with Docker Image by @navinrathore in #160
- Rectified the broken link for Docker doc by @navinrathore in #163
- Added docker page under installation by @navinrathore in #166
- New option/config showconcise by @navinrathore in #170
- Added detailed documentaion for Link phase by @navinrathore in #172
- 0.3.3 release by @navinrathore in #174
Full Changelog: v0.3.2...v0.3.3
zingg-0.3.2-SNAPSHOT-spark-3.1.2
What's Changed
- Giving same cluster id to all records linked from multiple sources #108 by @navinrathore in #109
- zingg.sh path included in PATH env var in Dockerfile #111 by @navinrathore in #112
- locale set to C.UTF-8 in Dockerfile #116 by @navinrathore in #122
- Documentation broken links #118 by @navinrathore in #123
- z_source column is added at the end of dataset in alignDupes by @navinrathore in #119
- Labeller - if it is not known that it is a match or not, similarity score should not be printed #114 by @navinrathore in #125
- new febrl models by @navinrathore in #121
- Blocking tree are saved in parquet file #82 by @navinrathore in #120
- unionByName() with allowMissingColumns=true for training data by @navinrathore in #127
- handling null value of datatype Double #95 by @navinrathore in #126
Full Changelog: v0.3.1...v0.3.2
zingg-0.3.1-SNAPSHOT-spark-3.1.2
What's Changed
- Create CODE_OF_CONDUCT.md by @sonalgoyal in #25
- Fixed Broken Links and added favicon.ico (#57) by @navinrathore in #61
- Added page for running azure on cloud #6 by @navinrathore in #62
- Added Google Analytics for telemetry #41 by @navinrathore in #68
- Zingg version Parametrization by @navinrathore in #76
- Graceful message and exit for wrong value of phase by @navinrathore in #84
- Support for linking three or more dataset #70 by @navinrathore in #92
- Use environment variables in config files #60 by @navinrathore in #80
- Model doc generation by @sonalgoyal in #99
- Display how many pairs've been labelled matches and non matches #87 by @navinrathore in #94
- logging location of the file being read #83 by @navinrathore in #97
- new phase updateLabel implementation #58 by @navinrathore in #102
- 0.3.1 by @sonalgoyal in #103
Full Changelog: v0.3.0...v0.3.1
zingg-0.3.0-SNAPSHOT-spark-3.x.x-blockingDebug
Release for #46 for null pointer exception during blocking tree application
zingg-0.3.0-SNAPSHOT-spark-3.0.3
Zingg 0.3.0 with Spark 3.0.3