Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lib][ML] Replace std::vector usage with Vector #116

Merged
merged 1 commit into from
Nov 28, 2016

Conversation

TatianaJin
Copy link
Member

Fix issue #115

  1. Change parameters to Vector<T, false>
  2. Modify FeatureLabel and unify data and gradient vector type

@@ -27,6 +27,4 @@ set(lib-objs $<TARGET_OBJECTS:aggregator-objs>)
# Visible to parent directory
set(lib-objs ${lib-objs} PARENT_SCOPE)

add_subdirectory(ml)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No errors if removing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no cpp under ml dir to be compiled as the previous vector_linalg.cpp is substituted by Vector class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, a little scary.

Copy link
Member

@kygx-legend kygx-legend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And could some thoughts in issue #117 be applied here?

data_loader.load_info(husky::Context::get_param("test"), test_set);
int num_features = data_loader.get_num_feature();
auto& format = husky::lib::ml::kTSVFormat;
// int num_features = std::stoi(husky::Context::get_param("num_features"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove comments like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment still there?

@Kelvin-Ng
Copy link
Contributor

I have made changes on DataPoint according to issue #117 in pull request #125. Please update your code accordingly.

@TatianaJin
Copy link
Member Author

@Kelvin-Ng Okay.

@TatianaJin TatianaJin force-pushed the dev branch 2 times, most recently from e6b817a to 998f9a5 Compare November 21, 2016 12:07
@TatianaJin
Copy link
Member Author

Hi @kygx-legend, this PR is updated. Please check.

@kygx-legend kygx-legend added this to the v0.1.0 milestone Nov 21, 2016
@kygx-legend
Copy link
Member

Thanks for the contribution! I'll check this tomorrow carefully.


namespace husky {
namespace lib {
namespace ml {
typedef std::vector<double> vec_double;
typedef std::vector<std::pair<int, double>> vec_sp;

// indicate format
const int kLIBSVMFormat = 0;
const int kTSVFormat = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use enum instead?

template <typename FT, typename LT>
class FeatureLabelBase {
template <typename FeatureT, typename LabelT, bool is_sparse>
class FeatureLabel : public DataPoint<FeatureT, LabelT, is_sparse> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the naming here is a bit confusing. I think it is better to have class FeatureLabelSomeSuffix : public FeatureLabel, class DataPointSomeSuffix : public DataPoint etc. Actually I want to change the class name from DataPoint to LabeledPoint, so I think class LabeledPointSomeSuffix : public LabeledPoint

@@ -47,35 +47,36 @@
#include "lib/ml/linear_regression.hpp"
#include "lib/ml/scaler.hpp"
#include "lib/ml/sgd.hpp"
#define is_sparse false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?!
Even if this is true, use const bool is_sparse = false instead.

using husky::lib::ml::ParameterBucket;
using SparseFeatureLabel = husky::lib::ml::FeatureLabel<double, double, is_sparse>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is_sparse being a constant set to false (although I think it should not be), how can this be sparse?

data_loader.load_info(husky::Context::get_param("train"), train_set);
data_loader.load_info(husky::Context::get_param("test"), test_set);
int num_features = data_loader.get_num_feature();
auto& format = husky::lib::ml::kTSVFormat;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to let the user choose the file format in config file?

class FeatureLabelBase {
// extends LabeledPoint, using Vector to represent features
template <typename FeatureT, typename LabelT, bool is_sparse>
class FeatureLabel : public LabeledPoint<Vector<FeatureT, is_sparse>, LabelT> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name looks confusing...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your suggestion on the naming convention?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LabeledPointWithId? But this is a bit long...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or LabeledPointObj? The suffix is long so it is unavoidable that the subclass name becomes long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not a good name because anything is an object...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't prefer abbreviate. It's more confusing. Long is okay if it's meaningful enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does LabeledPointImpl mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suffix Impl is not so suitable because the original class LabeledPoint is already a complete class but not an interface.

How about LabeledPointHObj, where HObj is the short form of Husky object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would further propose that we set a naming convension that:
All 'Husky object' version of a general class Foo should be named FooHObj.

What do you think about that?

@ddmbr
Copy link
Member

ddmbr commented Nov 23, 2016

Seems this Linear Regression only works for sparse data? It would be better if it can be more general purpose, and support more data formats (by making it configurable). Otherwise you have to rename some parts or write enough comments to avoid confusion because these things are in lib/.

@Kelvin-Ng
Copy link
Contributor

@ddmbr The truth is the opposite. The Linear Regression works only for dense data. The so-called SparseFeatureLabel is actually dense. That's so confusing.

@TatianaJin
Copy link
Member Author

@ddmbr Yes I am working on it.
@Kelvin-Ng Actually it can be both sparse and dense, and I wrote SparseFeatureLabel when I started with is_sparse=true and later I change it to false to test if dense data is ok.

@Kelvin-Ng
Copy link
Contributor

OK. But anyway please use a less confusing name.

@TatianaJin TatianaJin force-pushed the dev branch 2 times, most recently from da62adb to c111d98 Compare November 23, 2016 05:54
@TatianaJin
Copy link
Member Author

@ddmbr Now the sparsity and format of data are configurable.

Fix issue husky-team#115

1. Change parameters to Vector<T, false>
2. Modify FeatureLabel and unify data and gradient vector type
3. LinearRegression and LogisticRegression object can now be properly moved / copied.
@zzxx-husky
Copy link
Collaborator

Not an easy PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants