-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Lib][ML] Replace std::vector usage with Vector #116
Conversation
@@ -27,6 +27,4 @@ set(lib-objs $<TARGET_OBJECTS:aggregator-objs>) | |||
# Visible to parent directory | |||
set(lib-objs ${lib-objs} PARENT_SCOPE) | |||
|
|||
add_subdirectory(ml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No errors if removing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no cpp under ml
dir to be compiled as the previous vector_linalg.cpp
is substituted by Vector class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, a little scary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And could some thoughts in issue #117 be applied here?
data_loader.load_info(husky::Context::get_param("test"), test_set); | ||
int num_features = data_loader.get_num_feature(); | ||
auto& format = husky::lib::ml::kTSVFormat; | ||
// int num_features = std::stoi(husky::Context::get_param("num_features")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove comments like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment still there?
@Kelvin-Ng Okay. |
e6b817a
to
998f9a5
Compare
Hi @kygx-legend, this PR is updated. Please check. |
Thanks for the contribution! I'll check this tomorrow carefully. |
|
||
namespace husky { | ||
namespace lib { | ||
namespace ml { | ||
typedef std::vector<double> vec_double; | ||
typedef std::vector<std::pair<int, double>> vec_sp; | ||
|
||
// indicate format | ||
const int kLIBSVMFormat = 0; | ||
const int kTSVFormat = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use enum
instead?
template <typename FT, typename LT> | ||
class FeatureLabelBase { | ||
template <typename FeatureT, typename LabelT, bool is_sparse> | ||
class FeatureLabel : public DataPoint<FeatureT, LabelT, is_sparse> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the naming here is a bit confusing. I think it is better to have class FeatureLabelSomeSuffix : public FeatureLabel
, class DataPointSomeSuffix : public DataPoint
etc. Actually I want to change the class name from DataPoint to LabeledPoint, so I think class LabeledPointSomeSuffix : public LabeledPoint
@@ -47,35 +47,36 @@ | |||
#include "lib/ml/linear_regression.hpp" | |||
#include "lib/ml/scaler.hpp" | |||
#include "lib/ml/sgd.hpp" | |||
#define is_sparse false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?!
Even if this is true, use const bool is_sparse = false
instead.
using husky::lib::ml::ParameterBucket; | ||
using SparseFeatureLabel = husky::lib::ml::FeatureLabel<double, double, is_sparse>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is_sparse
being a constant set to false (although I think it should not be), how can this be sparse?
data_loader.load_info(husky::Context::get_param("train"), train_set); | ||
data_loader.load_info(husky::Context::get_param("test"), test_set); | ||
int num_features = data_loader.get_num_feature(); | ||
auto& format = husky::lib::ml::kTSVFormat; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to let the user choose the file format in config file?
class FeatureLabelBase { | ||
// extends LabeledPoint, using Vector to represent features | ||
template <typename FeatureT, typename LabelT, bool is_sparse> | ||
class FeatureLabel : public LabeledPoint<Vector<FeatureT, is_sparse>, LabelT> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name looks confusing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is your suggestion on the naming convention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LabeledPointWithId? But this is a bit long...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or LabeledPointObj? The suffix is long so it is unavoidable that the subclass name becomes long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not a good name because anything is an object...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't prefer abbreviate. It's more confusing. Long is okay if it's meaningful enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does LabeledPointImpl mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suffix Impl is not so suitable because the original class LabeledPoint is already a complete class but not an interface.
How about LabeledPointHObj, where HObj is the short form of Husky object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I would further propose that we set a naming convension that:
All 'Husky object' version of a general class Foo should be named FooHObj.
What do you think about that?
Seems this Linear Regression only works for sparse data? It would be better if it can be more general purpose, and support more data formats (by making it configurable). Otherwise you have to rename some parts or write enough comments to avoid confusion because these things are in |
@ddmbr The truth is the opposite. The Linear Regression works only for dense data. The so-called |
@ddmbr Yes I am working on it. |
OK. But anyway please use a less confusing name. |
da62adb
to
c111d98
Compare
@ddmbr Now the sparsity and format of data are configurable. |
Fix issue husky-team#115 1. Change parameters to Vector<T, false> 2. Modify FeatureLabel and unify data and gradient vector type 3. LinearRegression and LogisticRegression object can now be properly moved / copied.
Not an easy PR. |
Fix issue #115