Complete implementation of remaining Spark transformers #8

MLnick · 2018-06-08T17:02:17Z

These require MurmurHash3 to be added as a built-in PFA function (refer to related Hadrian issue):

HashingTF
FeatureHasher

Paxanator · 2018-11-18T23:21:33Z

Hey @MLnick looking into picking up one of these Transforms to start learning more about aardpfark, starting with OneHotEnoder. For OneHotEncoder, looks like it's reliant on a StringIndexer in order to determine the length of output, but the transformer itself doesn't require it in Spark (i.e. the data tells the OneHotEncoder how to transform it, as opposed to being fit).

As of 2.3 it seems this has been addressed with OneHotEncoderEstimator, which has a fit and returns a OneHotEncoderModel with categorySizes
https://spark.apache.org/docs/latest/ml-features.html#onehotencoderestimator

Should support be added for 2.3 (i can try and upgrade) and use that instead?

MLnick · 2018-11-20T20:01:59Z

Hi @Paxanator thanks for your interest in Aardpfark!

Yes I agree, OneHotEncoder as from Spark 2.3 would be the best way forward for this transformer. Let me know if you need some assistance.

I'll take a look at upgrading Spark version - hopefully shouldn't be much of a problem.

Paxanator · 2018-11-21T02:52:06Z

Thank you for putting the library together! I'll wait on the Spark Version bump before trying to tackle it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complete implementation of remaining Spark transformers #8

Complete implementation of remaining Spark transformers #8

MLnick commented Jun 8, 2018

Paxanator commented Nov 18, 2018 •

edited

Loading

MLnick commented Nov 20, 2018 •

edited

Loading

Paxanator commented Nov 21, 2018

Complete implementation of remaining Spark transformers #8

Complete implementation of remaining Spark transformers #8

Comments

MLnick commented Jun 8, 2018

Paxanator commented Nov 18, 2018 • edited Loading

MLnick commented Nov 20, 2018 • edited Loading

Paxanator commented Nov 21, 2018

Paxanator commented Nov 18, 2018 •

edited

Loading

MLnick commented Nov 20, 2018 •

edited

Loading