Adding Factorization Machines #383

sfilipi · 2018-06-20T16:28:28Z

Adding FactorizationMachines and a test for it.
Addresses #381

sfilipi · 2018-06-20T16:31:03Z

Entry Points And Helper Classes

There is another PR adding the documentation, that I am hoping to merge before this, making this a no-op after rebasing. #Resolved

Refers to: docs/code/EntryPoints.md:1 in 6eef331. [](commit_id = 6eef331, deletion_comment = False)

Ivanidzo4ka · 2018-06-20T16:50:31Z

can you take a look on difference between cpumathnative code locally and here?
especially this part

machinelearning/src/Native/CpuMathNative/Avx.cpp

Line 25 in 8b01fc5

#include "../Stdafx.h"

we have that #ifdef section defined in stdafx.h in upper directory and it works for cpumathnative, so I hope if you remove #ifdef section and replace it with include to stdafx.h it will work #Resolved

Ivanidzo4ka · 2018-06-20T16:52:54Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+#define EXPORT_API(ret) extern "C" __declspec(dllexport) ret __stdcall
+#endif
+
+EXPORT_API(void) CalculateIntermediateVariablesNative(int fieldCount, int latentDim, int count, _In_ int * fieldIndices, _In_ int * featureIndices, _In_ float * featureValues,


Native [](start = 47, length = 6)

Native in the name of the function is kinda excessive #Pending

Keeping it for readability in the other files, when they are invoked. There are non-native functions in the faff interface, that have the same signatures, and call into those.

In reply to: 196861195 [](ancestors = 196861195)

Ivanidzo4ka · 2018-06-20T16:53:48Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+#define EXPORT_API(ret) extern "C" __declspec(dllexport) ret __stdcall
+#endif
+
+EXPORT_API(void) CalculateIntermediateVariablesNative(int fieldCount, int latentDim, int count, _In_ int * fieldIndices, _In_ int * featureIndices, _In_ float * featureValues,


CalculateIntermediateVariables [](start = 17, length = 30)

CalculateIntermediateVariables [](start = 17, length = 30)

Any chance you can add comments to this code? @wschin #Resolved

The two functions are Algorithm 1 & 2 at https://github.com/wschin/fast-ffm/blob/master/fast-ffm.pdf.

In reply to: 196861456 [](ancestors = 196861456)

Ivanidzo4ka · 2018-06-20T16:54:36Z

src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachineInterface.cs

+
+        [DllImport(NativePath), SuppressUnmanagedCodeSecurity]
+        public static extern void CalculateIntermediateVariablesNative(int fieldCount, int latentDim, int count, int* /*const*/ fieldIndices, int* /*const*/ featureIndices,
+            float* /*const*/ featureValues, float* /*const*/ linearWeights, float* /*const*/ latentWeights, float* latentSum, float* response);


/const/ [](start = 51, length = 9)

are this comments here for a reason? #Resolved

@wschin just remove them, or need to comment on them?

In reply to: 196861706 [](ancestors = 196861706)

I added those comments /*const*/ following another learner's implementation. It also uses C++ and C# together. The comments means "although they are not constant pointers, you should not touch their values." #Resolved

added this comment to the code.

In reply to: 196948210 [](ancestors = 196948210)

Ivanidzo4ka · 2018-06-20T17:00:20Z

src/Native/FactorizationMachineNative/CMakeLists.txt

+
+if(WIN32)
+else()
+    set_property(SOURCE segment.cpp APPEND_STRING PROPERTY COMPILE_FLAGS " -msse4.1")


segment.cpp [](start = 24, length = 11)

Copypaste!
There is no segment.cpp in this project. #Resolved

Ivanidzo4ka · 2018-06-20T17:35:38Z

Entry Points And Helper Classes

Are you trying to sneak files from other PR? BLASTEMY!!!111
You probably create PR on top of your another branch instead of branching master.

You can always remove this file, or just create new branch port files from this pr, and recreate PR. It's up for you

Refers to: docs/code/EntryPoints.md:1 in 6eef331. [](commit_id = 6eef331, deletion_comment = False)

Ivanidzo4ka · 2018-06-20T17:37:48Z

src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachine.cs

+
+[assembly: LoadableClass(FieldAwareFactorizationMachineTrainer.Summary, typeof(FieldAwareFactorizationMachineTrainer), typeof(FieldAwareFactorizationMachineTrainer.Arguments),
+    new[] { typeof(SignatureBinaryClassifierTrainer), typeof(SignatureTrainer) }, FieldAwareFactorizationMachineTrainer.UserName, FieldAwareFactorizationMachineTrainer.LoadName,
+    FieldAwareFactorizationMachineTrainer.ShortName, DocName = "trainer/FactorizationMachine.md")]


FactorizationMachine [](start = 72, length = 20)

shall we bring it as well? #Pending

hmm. That file has about 8 examples all using maml, which will need to be translated to ML.NET examples. Leaving it for a next PR immediately after this

In reply to: 196880773 [](ancestors = 196880773)

Ivanidzo4ka · 2018-06-20T17:49:14Z

src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachineInterface.cs

+{
+    internal unsafe static class FieldAwareFactorizationMachineInterface
+    {
+        internal const string NativePath = "FactorizationMachineNative.dll";


.dll [](start = 70, length = 4)

FactorizationMachineNative is enough #Resolved

Ivanidzo4ka · 2018-06-20T19:00:10Z

src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachine.cs

+
+namespace Microsoft.ML.Runtime.FactorizationMachine
+{
+    internal sealed class FieldAwareFactorizationMachineUtils


Can I suggest you to break this file into 3 files?
Trainer/Predictor/Utils?
If you look on other learners, they also follow one class - one file format. (it's ok tho to have entrypoint in same file as trainer) #Resolved

sfilipi · 2018-06-20T20:04:28Z

Entry Points And Helper Classes

shhhhh :)

In reply to: 398834636 [](ancestors = 398834636)

Refers to: docs/code/EntryPoints.md:1 in 6eef331. [](commit_id = 6eef331, deletion_comment = False)

Ivanidzo4ka · 2018-06-20T21:04:27Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+#include "../Stdafx.h"
+
+#define UNUSED(x) (void)(x)
+#define DEBUG_ONLY(x) (void)(x)


I don't see any usage of this, can we just remove them? #Resolved

Sure.

In reply to: 196942002 [](ancestors = 196942002)

wschin · 2018-06-20T21:34:24Z

src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachine.cs

+    /// for details. This trainer is essentially faster the one introduced in [2] because of some implemtation tricks[3].
+    /// [1] http://jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
+    /// [2] http://www.csie.ntu.edu.tw/~cjlin/papers/ffm.pdf
+    /// [3] fast-ffm.tex in FactorizationMachine project folder


Please cite https://github.com/wschin/fast-ffm/blob/master/fast-ffm.pdf for [3]. Thanks. #Resolved

wschin · 2018-06-20T22:25:17Z

src/Native/FactorizationMachineNative/CMakeLists.txt

+
+add_library(FactorizationMachineNative SHARED ${SOURCES} ${RESOURCES})
+
+install_library_and_symbols (FactorizationMachineNative)


What does this mean? I didn't find this function in CMake reference. Just curious. :) #Resolved

It is a custom function we wrote (we use it for all native libraries in the .NET Core repos). See

machinelearning/src/Native/CMakeLists.txt

Line 169 in 93ecbb7

function(install_library_and_symbols targetName)

.

It's reason for existing is to strip out symbols during Release builds and put them in a separate file. #Resolved

wschin · 2018-06-21T16:07:32Z

src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachine.cs

+                            badExampleCount++;
+                            continue;
+                        }
+                        FieldAwareFactorizationMachineInterface.CalculateIntermediateVariables(fieldCount, _latentDimAligned, count,


Could you add a note here saying that this method implements Algorithm 1 in [3]? #Resolved

wschin · 2018-06-21T16:08:09Z

src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachine.cs

+                            featureFieldBuffer, featureIndexBuffer, featureValueBuffer, linearWeights, latentWeightsAligned, latentSum, ref modelResponse);
+                        var slope = CalculateLossSlope(label, modelResponse);
+                        FieldAwareFactorizationMachineInterface.CalculateGradientAndUpdate(_lambdaLinear, _lambdaLatent, _learningRate, fieldCount, _latentDimAligned, weight, count,
+                            featureFieldBuffer, featureIndexBuffer, featureValueBuffer, latentSum, slope, linearWeights, latentWeightsAligned, linearAccSqGrads, latentAccSqGradsAligned);


Similarly, could you add a note here saying that this method implements Algorithm 2 in [3]? #Resolved

wschin · 2018-06-21T16:10:32Z

src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachine.cs

+                    loss = 0;
+                    exampleCount = 0;
+                    badExampleCount = 0;
+                    while (cursor.MoveNext())


while [](start = 20, length = 5)

This while-loop is the Algorithm 3 in [3]. Want to add a note? #Resolved

sfilipi · 2018-06-21T16:43:45Z

test/Microsoft.ML.TestFramework/BaseTestBaseline.cs

+                decimal range = f2 / precision;
+
+                if (range < safeRange)
+                    range = safeRange; 


I am not going to merge this until we settle that it is acceptable.

wschin · 2018-06-21T17:41:48Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+            const float * qfprimef = pq + fprime * m * d + f * d;
+            for (int k = 0; k + 4 <= d; k += 4)
+            {
+				// inter-field interaction


My comment line 79 means that we only handle inter-field interactions because f != f'. #Resolved

added that in english :)

In reply to: 197220282 [](ancestors = 197220282)

wschin · 2018-06-21T17:42:38Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+        for (int k = 0; k + 4 <= d; k += 4)
+        {
+            __m128 _qff = _mm_load_ps(qff + k);
+            _tmp = _mm_add_ps(_tmp, _mm_mul_ps(_qff, _qff));


Here is where we compute intra-filed interactions. #Resolved

wschin · 2018-06-21T17:51:16Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+        // update linear term w_j
+        float g = weight * (lambdaLinear * pw[j] + slope * px[i]);
+        phw[j] += g * g;
+        pw[j] -= learningRate / sqrt(phw[j]) * g;


Here we accumulate the gradient of linear term. #Resolved

wschin · 2018-06-21T17:51:45Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+                __m128 _g = _mm_mul_ps(_lambdav, _v);
+
+				// accumulate the squared gradient
+                if (fprime != f)


More precisely, accumulate the gradient of latent vectors. #Resolved

One extra space before the comment message? #Resolved

wschin · 2018-06-22T16:34:11Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+
+        // y += <q_f,f', q_f',f>, f != f'
+        // We only handle inter - field interactions because f != f'.
+        for (int fprime = f + 1; fprime < m; fprime++)


Sounds like intra-field interactions are ignored. May be change it to "This loop handles ..." #Resolved

Whis-->This

There should be no space between inter, -, and field.
#Resolved

wschin · 2018-06-22T16:36:16Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+                    _g = _mm_add_ps(_g, _mm_mul_ps(_sx, _mm_sub_ps(_q, _mm_mul_ps(_v, _x))));
+                _g = _mm_mul_ps(_wei, _g);
+
+                const __m128 _h = _mm_add_ps(_mm_load_ps(hvjfprime + k), _mm_mul_ps(_g, _g));


_mm_add_ps(_mm_load_ps(hvjfprime + k), _mm_mul_ps(_g, _g)); [](start = 34, length = 59)

My fault. "// Accumulate the gradient of latent vectors" should be here. #Resolved

One extra space before the comment message? #Resolved

wschin · 2018-06-22T16:37:28Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+                __m128 _q = _mm_load_ps(qfprimef + k);
+                __m128 _g = _mm_mul_ps(_lambdav, _v);
+
+				// Accumulate the gradient of latent vectors.


Accumulate the gradient of latent vectors. [](start = 7, length = 42)

// Calculate loss function's gradient. #Resolved

wschin · 2018-06-22T16:37:57Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+                __m128 _v = _mm_load_ps(vjfprime + k);
+                __m128 _q = _mm_load_ps(qfprimef + k);
+                __m128 _g = _mm_mul_ps(_lambdav, _v);
+


// Calculate L2-norm regularization's gradient #Resolved

wschin · 2018-06-22T16:38:49Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+
+        // Update linear term w_j.
+        float g = weight * (lambdaLinear * pw[j] + slope * px[i]);
+        phw[j] += g * g;


phw[j] += g * g; [](start = 8, length = 16)

"// Accumulate the gradient of the linear term" should be here. #Resolved

wschin · 2018-06-22T16:39:14Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+        float g = weight * (lambdaLinear * pw[j] + slope * px[i]);
+        phw[j] += g * g;
+
+        // Accumulate the gradient of the linear term.


// Accumulate the gradient of the linear term. [](start = 8, length = 46)

// Perform ADAGRAD update rule to adjust linear term #Resolved

wschin · 2018-06-22T16:39:38Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+                _g = _mm_mul_ps(_wei, _g);
+
+                const __m128 _h = _mm_add_ps(_mm_load_ps(hvjfprime + k), _mm_mul_ps(_g, _g));
+                _v = _mm_sub_ps(_v, _mm_mul_ps(_lr, _mm_mul_ps(_mm_rsqrt_ps(_h), _g)));


_v = _mm_sub_ps(_v, _mm_mul_ps(_lr, _mm_mul_ps(_mm_rsqrt_ps(_h), _g))); [](start = 16, length = 71)

// Perform ADAGRAD update rule to adjust latent vector #Resolved

wschin

Looks good to me now. Just leave three minor comments about extra spaces before comments. Thanks a lot.

glebuk · 2018-06-25T20:21:10Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+#include <cstring>
+#include <limits>
+#include <pmmintrin.h>
+#include "../Stdafx.h"


stdafx.h is usually a pre-compiled header, should be included first. #Resolved

glebuk · 2018-06-25T20:22:56Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+EXPORT_API(void) CalculateIntermediateVariablesNative(int fieldCount, int latentDim, int count, _In_ int * fieldIndices, _In_ int * featureIndices, _In_ float * featureValues,
+    _In_ float * linearWeights, _In_ float * latentWeights, _Inout_ float * latentSum, _Out_ float * response)
+{
+    // The number of all possible fields


poor choice for variable names. Anything more mnemonic? #Pending

We discussed about this before. Those symbols are close to my pseudo code in a pdf so that anyone who read the document first can get familiar with them. On the other hand, even with meaningful names, it's almost impossible to understand SSE code for users. #Resolved

glebuk · 2018-06-25T20:23:54Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+EXPORT_API(void) CalculateIntermediateVariablesNative(int fieldCount, int latentDim, int count, _In_ int * fieldIndices, _In_ int * featureIndices, _In_ float * featureValues,
+    _In_ float * linearWeights, _In_ float * latentWeights, _Inout_ float * latentSum, _Out_ float * response)
+{
+    // The number of all possible fields


do we do any input validation here? At least asserts would be nice. #Pending

The only one possible check is checking the signs of some parameters and indices (i.e., m, d, c, pf's elements, pi's elements). Note that adding checks for Release mode is not recommended because they two functions are highly optimized and only computation-oriented. #Resolved

glebuk · 2018-06-25T20:24:28Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+    __m128 _y = _mm_setzero_ps();
+    __m128 _tmp = _mm_setzero_ps();
+
+    for (int i = 0; i < c; i++)


in C++ preferred to use ++i by default as it is faster #Resolved

glebuk · 2018-06-25T20:31:48Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+
+        for (int fprime = 0; fprime < m; fprime++)
+        {
+            const int vBias = j * m * d + fprime * d;


Overall I see many repeated computations in the tight loop. Not sure if compiler will optimize them all out, but it seems like it might be worth at least trying to see if this refactored - both for readibilily and potential speed gains. #Pending

@glebuk , we have discussed this before. According to my experience when designing LIBFFM, complier is not able to do so even if we finely tuned the complier configuration. We have also concluded that it's very difficult to refactorize them. Maybe some of them are similar, but you will see that a refactorization easily lead to extra loops, temporal variables, or function pointer. Again, please don't expect readability of SSE and training algorithm. I personally prefer to use document as a solution. Honestly, although our SSE math library has some sort of inline documentation, they are still not readable due to the lack of math formulations to me. #Resolved

I made several innocuous seeming changes for pre/post increment on loops and optimization of tight loops through intermediate calculations.

Run through the test to make sure no regression, and the test run is aborted with an empty string as error message.

Will log an "enhancement" about the optimizations, and take that on post enabling the test.

In reply to: 197932033 [](ancestors = 197932033)

glebuk · 2018-06-25T20:44:47Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+// Calculate the stochastic gradient and update the model. 
+// The /*const*/ comment on the parameters of the function means that their values should not get altered by this function.
+EXPORT_API(void) CalculateGradientAndUpdateNative(float lambdaLinear, float lambdaLatent, float learningRate, int fieldCount, int latentDim, float weight, int count,
+    _In_ int* /*const*/ fieldIndices, _In_ int* /*const*/ featureIndices, _In_ float* /*const*/ featureValues, _In_ float* /*const*/ latentSum, float slope,


Is it possible to actually enforce this constness instead of just via comment but either as & reference or const? #Pending

Thanks for suggesting. Will take a look at the next iteration on this file, right after enabling the tests.

In reply to: 197935966 [](ancestors = 197935966)

glebuk · 2018-06-25T20:49:00Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+EXPORT_API(void) CalculateGradientAndUpdateNative(float lambdaLinear, float lambdaLatent, float learningRate, int fieldCount, int latentDim, float weight, int count,
+    _In_ int* /*const*/ fieldIndices, _In_ int* /*const*/ featureIndices, _In_ float* /*const*/ featureValues, _In_ float* /*const*/ latentSum, float slope,
+    _Inout_ float* linearWeights, _Inout_ float* latentWeights, _Inout_ float* linearAccumulatedSquaredGrads, _Inout_ float* latentAccumulatedSquaredGrads)
+{


if latentSum is an array it should be called latentSums (plural) Same throughout. #Resolved

No, latentSum is a meaningful name. It's actually a sum of latent vectors detailed in my document. #Resolved

TomFinley · 2018-06-26T16:22:31Z

src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachineInterface.cs

+        internal const string NativePath = "FactorizationMachineNative";
+        public const int CbAlign = 16;
+
+        private static bool Compat(AlignedArray a)


Hi @sfilipi and @wschin , could I ask, was the usage of AlignedArray here intentional, in the sense that without aligned array something bad happens? Or was it just copying over the practices in some existing code? If the latter, could we just as easily have used fixed with simple arrays, and whatnot? Relevant to discussions I've had with @eerhardt ... we want to get rid of AlignedArray.

In a few small benchmark performance tests I've run, AlignedArray performs worse than using a regular array.

FFM's core computation is done by SSE code, which requires the memory blocks to be aligned. The main computation doesn't call any member functions of AlighedArray, we just use AlighedArray as something like new operator. Thus, the performance is possibly as good as regular array.

From my investigation, I concluded that the performance penalty being paid was where we are moving the array elements around in memory to manually align it.

machinelearning/src/Microsoft.ML.CpuMath/AlignedArray.cs

Line 67 in c023727

MoveData(ifltMin);

In my experience, doing this copying is worse performance than just using unaligned reads.

Another way to fix this issue is to use both a preamble and a postamble to get aligned. The idea is to do the operation one element at a time while you are not aligned at the beginning. Then once you hit the alignment, do the vectorization operation as usual. Then at the end, do one element at a time for the last elements that don't fill out a full section. #Pending

Thanks for the note Eric. Added notes to Issue #146, to address this.

In reply to: 198231400 [](ancestors = 198231400)

TomFinley · 2018-06-26T16:24:50Z

...icrosoft.ML.StandardLearners/FactorizationMachine/FieldAwareFactorizationMachinePredictor.cs

+using Microsoft.ML.Runtime.Internal.Utilities;
+using Microsoft.ML.Runtime.Model;
+
+[assembly: LoadableClass(typeof(FieldAwareFactorizationMachinePredictor), null, typeof(SignatureLoadModel), "Field Aware Factorization Machine", FieldAwareFactorizationMachinePredictor.LoaderSignature)]


A bit long at 203 chars. Maybe break onto two lines. #Resolved

TomFinley

eerhardt · 2018-06-26T16:27:42Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

@@ -0,0 +1,175 @@
+#include "../Stdafx.h"


All code should have a copyright header. See the other .cpp files. #Resolved

eerhardt · 2018-06-26T16:30:15Z

src/Microsoft.ML.CpuMath/AssemblyInfo.cs

+using System.Runtime.CompilerServices;
+using System.Runtime.InteropServices;
+
+[assembly: InternalsVisibleTo("Microsoft.ML.StandardLearners, PublicKey=00240000048000009400000006020000002400005253413100040000010001004b86c4cb78549b34bab61a3b1800e23bfeb5b3ec390074041536a7e3cbd97f5f04cf0f857155a8928eaa29ebfd11cfbbad3ba70efea7bda3226c6a8d370a4cd303f714486b6ebc225985a638471e6ef571cc92a4613c00b8fa65d61ccee0cbe5f36330c9a01f4183559f1bef24cc2917c6d913e3a541333a1d05d9bed22b38cb")]


Is this necessary? #Pending

AlignedArray.Items is internal, but gets accessed in FactorizationMachines.

Will get rid of it we move off AlignedArray.

In reply to: 198213276 [](ancestors = 198213276)

wschin · 2018-06-26T17:06:59Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+        // Update linear term w_j.
+        float g = weight * (lambdaLinear * pw[j] + slope * px[i]);
+
+        // Accumulate the gradient of the linear term" should be here.


term" ---> term #Resolved

wschin · 2018-06-26T17:07:25Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+        const int f = pf[i];
+        const int j = pi[i];
+
+        // Update linear term w_j.


// Calculate gradient of linear term w_j #Resolved

wschin · 2018-06-26T17:09:17Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

+                __m128 _v = _mm_load_ps(vjfprime + k);
+                __m128 _q = _mm_load_ps(qfprimef + k);
+
+                // Calculate L2-norm regularization's gradient


Adding . at the end? #Resolved

Fixing the aftermath of a rebase.

Ivanidzo4ka · 2018-06-26T19:35:36Z

updateMe.bat

@@ -0,0 +1,5 @@
+::Update the https://github.com/sfilipi/machinelearning-1 from https://github.com/dotnet/machinelearning
+git fetch upstream


please don't checkin this file :) #Resolved

Thanks.

In reply to: 198271533 [](ancestors = 198271533)

* Bump master to v0.3 (#269) * RocketEngine fix for selecting top learners (#270) * Changes to RocketEngine to fix take top k logic. * Add namespace information to allow file to reference correct version of Formatting object. * small code cleanup (#271) * Preparation for syncing sources with internal repo (#275) * make class partial so I can add constuctor in separate file. add constructros for testing * formatting * Changes to use evaluator metrics names in PipelineSweeperSupportedMetrics. Made the private const strings in two classes public. (#276) * add missing subcomponents to sweepers (#278) * add missing subcomponents * right one * more cleanup * remove lotus references. (#252) * Random seed and concurrency for tests (#277) * first attempt * add comments * specify seed for random. make constructor internal. * Fix SupportedMetric.ByName() method (#280) * Fix for SupportedMetric.ByName() method. Include new unit test for function. * Fix for SupportedMetric.ByName() method. Include new unit test for function. * Fix for SupportedMetric.ByName() method. Include new unit test for function. * Removed unnecessary field filter, per review comment. * ML.NET-242: FastTreeRanking per-iteration loss metrics are empty (#289) When training a FastTreeRanker using the `testFrequency` parameter, it is expected that NDCG is prented every testFrequency iterations. However, instead of NDCG, only empty strings are printed. The root cause was that the MaxDCG property of the dataset was never calculated, so the NDCG calculation is aborted, leaving an empty string as a result. This PR fixes the problem by computing the MaxDCG for the dataset when the Tests are defined (so that if the tests are not defined, the MaxDCG will never be calculated). Closes #242 * Fixed typo in the method summary (#296) * Remove stale line of code from test. (#297) * Update release notes link to use aka.ms. (#294) Our release notes link is broken because the `Documentation` was renamed to `docs`. Fix this for the future to use a redirection link. * Add release notes for ML.NET 0.2 (#301) * Add release notes for ML.NET 0.2 * Adding release note about TextLoader changes and additional issue/PR references * Addressing comments: fixing typos, changing formatting, and adding references * Get the cross validation macro to work with non-default column names (#291) * Add label/grou/weight column name arguments to CV and train-test macros * Fix unit test. * Merge. * Update CSharp API. * Fix EntryPointCatalog test. * Address PR comments. * update sample in README.MD with 0.2 features. (#304) * update sample with new text loader API. * update with 0.2 stuff. * OVA should respect normalization in underlying learner (#310) * Respect normalization in OVA. * some cleanup * fix copypaste issues * Export to ONNX and cross-platform command-line tool to script ML.NET training and inference (#248) * Export to ONNX and Maml cross-platform executable. * Add Cluster evaluator (#316) * Add Cluster evaluator * fix copypaste * address comments * formatting * Fixes locale dependent test output comparisons (#109) The tests do not pass on systems with locale other than en-US. The error happens since the results are written to files and the contents of the files are compared to set of correct results produced under en-US locale. The fix is to imbue en-US culture to the test thread so that results will be output in format that is comparable with the test format. This patch fixes only tests, but do not guarantee calculation will be correct in production systems using a locale different than en-US. In particular, there can be problems in reading data and then conversing data from characters to numeric format. Fixes #74 * Add PartitionedFileLoader (#61) * Remove unexisting project from solution (#335) * GetSummaryDataView/Row implementation for Pca and Linear Predictors (#185) * Implement `ICanGetSummaryAsIDataView` on `PcaPredictor` class * Implement `ICanGetSummaryAsIRow` on `LinearPredictor` class * Disable ordinary least squares by removing the entry point (#286) * Disable ols by temporarily removing the entry point. It may be added again once we figure out how to ship MKL as part of this project. * add append function to pipeline (#284) Add `Append` function to pipeline for more fluent API than that allowed by `Add` * Removed field/column name checking of input type in TextLoader. (#327) * fix namespace issue in CSharpGenerator and some refactoring (#339) fix namespace issue and refactoring * Using named-tuple in OneToOneTransforms' constructor to make API more readable. (#324) * Minor formatting in CollectionDataSourceTests.cs (#348) * Create CalibratedPredictor instead of SchemaBindableCalibratedPredictor (#338) `CalibratorUtils.TrainCalibrator` and `TrainCalibratorIfNeeded` now creates `CalibratedPredictor` instead of `SchemaBindableCalibratedPredictor` whenever the predictor implements `IValueMapper`. * Remove reference and dependency on System.ValueTuple (#351) * Add link to samples (#355) * Use HideEnumValueAttribute for both manifest and C# API generation. (#356) * Use HideEnumValueAttribute for both manifest and C# API generation. * Unhide NAReplaceTransform.ReplacementKind.SpecifiedValue. This may require some other PR to resolve the corresponding issues. * Move the NuGet package build files into a TFM specific directory. (#370) When installing Microsoft.ML on an unsupported framework (like net452), it is currently getting installed successfully. However, users should be getting an error stating that net452 is not supported by this package. The cause is the build files exist for any TFM, which NuGet interprets as this package supports any TFM. Moving the build files to be consistent with the 'lib' folder support. Fix #357 * `Stream` subclasses now have `Close` call `base.Close` to ensure disposal. (#369) * Subclasses of `Stream` now have `Close` call `base.Close` to ensure disposal. * Add DeleteOnClose to File opening. * Remove explicit delete of file. * Remove explicit close of substream. * Since no longer deleting explicitly, no longer need `_overflowPath` member. * Return distinct array of ParameterSet when ProposeSweep is called (#368) * Changed List to HashSet to ensure that there are no duplicates * Update fast tree argument help text (#372) * Update fast tree argument help text * Update wording * Update API to fix test * Update core manifest JSON to update help text * Combine multiple tree ensemble models into a single tree ensemble (#364) * Add a way to create a single tree ensemble model from multiple tree ensemble models. * Address PR comments, and fix bugs in serializing/deserializing RegressionTrees. * Address PR comments. * add pipelineitem for Ova (#363) add pipelineitem for Ova * Fix CV macro to output the warnings data view properly. (#385) * Link to an example on using converting ML.NET model to ONNX. (#386) * Adding documentation about entry points, and entry points graphs: EntryPoints.md and GraphRunner.md (#295) * Adding EntryPoints.md and GraphRunner.md * addressing PR feedback * Updating the title of the GraphRunner.md file * adressing Tom's feedback * adressing feedback * code formatting for class names * Addressing Gal's comments * Adding an example of an entry point. Fixing casing on ML.NET * fixing link * Adding LDA Transform (#377) * Revert to using the native code (#413) Corrects an unintentional "typo" in FastTreeRanking.cs where there was mistakenly a USE_FASTTREENATIVE2 instead of USE_FASTTREENATIVE. This resulted in some obscure hidden ranking options (distance weighting, normalize query lambdas, and a few others) being unavailable. These are important for some applications. * LightGBM (#392) * LightGBM and test. * add test baselines and nuget source for lightGBM binaries. * Add entrypoint for lightGBM. * add unsafe flag for release build. * update nuget version. * make lightgbm test single threaded. * install gcc on OS machines to resolve dependencies on openmp thatis needed by lightgbm native code. * PR comments. Leave BREW and GCC in bash script to verify macOS tests work. * remove brew and gcc from build script. * PR feedback. * disable test on macOS. * disable test on macOS. * PR feedback. * Adding Factorization Machines (#383) * Adding Factorization Machines * ONNX API documentation. (#419) * ONNX API documentation. * Bring ensembles into codebase (#379) Introduce Ensemble codebase * enable macOS tests for LightGBM. (#422) * Create a shorter temp file name for model loading. (#397) Create a shorter temp file name for model loading, as well as remove the potential for a race condition among multiple openings by using the creation of a lock file. * removing extraneous character that broke the linux build, and with it unecessary cmake version requirement (#425) * EvaluatorUtils to handle label column of type key without text key values (#394) * Fix EvaluatorUtils to handle label column of type key without text key values. * Removing non source files from solution (#362)

* Adding Factorization Machines

sfilipi requested review from Ivanidzo4ka and wschin June 20, 2018 16:28

Ivanidzo4ka reviewed Jun 20, 2018

View reviewed changes

wschin reviewed Jun 20, 2018

View reviewed changes

wschin reviewed Jun 21, 2018

View reviewed changes

sfilipi commented Jun 21, 2018

View reviewed changes

wschin reviewed Jun 21, 2018

View reviewed changes

sfilipi force-pushed the FactorizationMachine branch from a8d6ad3 to 9223ff3 Compare June 22, 2018 16:23

wschin reviewed Jun 22, 2018

View reviewed changes

wschin approved these changes Jun 25, 2018

View reviewed changes

glebuk reviewed Jun 25, 2018

View reviewed changes

TomFinley reviewed Jun 26, 2018

View reviewed changes

TomFinley approved these changes Jun 26, 2018

View reviewed changes

eerhardt reviewed Jun 26, 2018

View reviewed changes

wschin reviewed Jun 26, 2018

View reviewed changes

sfilipi mentioned this pull request Jun 26, 2018

Optimize FactorizationMachinesNative.cpp #415

Closed

Adding Factorization Machines

a215156

Fixing the aftermath of a rebase.

sfilipi force-pushed the FactorizationMachine branch from c00873b to a215156 Compare June 26, 2018 18:33

Ivanidzo4ka reviewed Jun 26, 2018

View reviewed changes

removing utility script

2b27ffe

shauheen merged commit 31ae678 into dotnet:master Jun 26, 2018

sfilipi deleted the FactorizationMachine branch July 5, 2018 18:18

Ivanidzo4ka mentioned this pull request Jul 11, 2018

Remove Extra Code Comments and unused InternalStreams project #514

Merged

eerhardt pushed a commit to eerhardt/machinelearning that referenced this pull request Jul 27, 2018

Adding Factorization Machines (dotnet#383)

9b7768a

* Adding Factorization Machines

Anipik mentioned this pull request Sep 25, 2018

substitute AlignedArray with a regular array #1018

Closed

ghost locked as resolved and limited conversation to collaborators Mar 30, 2022


		add_library(FactorizationMachineNative SHARED ${SOURCES} ${RESOURCES})

		install_library_and_symbols (FactorizationMachineNative)

		@@ -0,0 +1,5 @@
		::Update the https://github.com/sfilipi/machinelearning-1 from https://github.com/dotnet/machinelearning
		git fetch upstream

Adding Factorization Machines #383

Adding Factorization Machines #383

Conversation

sfilipi commented Jun 20, 2018

sfilipi commented Jun 20, 2018 • edited Loading

Entry Points And Helper Classes

Ivanidzo4ka commented Jun 20, 2018 • edited by sfilipi Loading

Ivanidzo4ka Jun 20, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ivanidzo4ka Jun 20, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ivanidzo4ka Jun 20, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wschin Jun 20, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ivanidzo4ka Jun 20, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

Ivanidzo4ka commented Jun 20, 2018 • edited Loading

Entry Points And Helper Classes

Ivanidzo4ka Jun 20, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ivanidzo4ka Jun 20, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

Ivanidzo4ka Jun 20, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

sfilipi commented Jun 20, 2018

Entry Points And Helper Classes

Ivanidzo4ka Jun 20, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wschin Jun 20, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin Jun 20, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

eerhardt Jun 26, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin Jun 21, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin Jun 21, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin Jun 21, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wschin Jun 21, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wschin Jun 21, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin Jun 21, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin Jun 21, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin Jun 25, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin Jun 22, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin Jun 25, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin Jun 22, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin Jun 25, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin Jun 22, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin Jun 22, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin Jun 22, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin Jun 22, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin Jun 22, 2018 • edited by sfilipi Loading

Choose a reason for hiding this comment

wschin left a comment

Choose a reason for hiding this comment

glebuk Jun 25, 2018 • edited by sfilipi Loading

sfilipi commented Jun 20, 2018 •

edited

Loading

Ivanidzo4ka commented Jun 20, 2018 •

edited by sfilipi

Loading

Ivanidzo4ka Jun 20, 2018 •

edited by sfilipi

Loading

Ivanidzo4ka Jun 20, 2018 •

edited by sfilipi

Loading

Ivanidzo4ka Jun 20, 2018 •

edited by sfilipi

Loading

wschin Jun 20, 2018 •

edited by sfilipi

Loading

Ivanidzo4ka Jun 20, 2018 •

edited by sfilipi

Loading

Ivanidzo4ka commented Jun 20, 2018 •

edited

Loading

Ivanidzo4ka Jun 20, 2018 •

edited by sfilipi

Loading

Ivanidzo4ka Jun 20, 2018 •

edited by sfilipi

Loading

Ivanidzo4ka Jun 20, 2018 •

edited by sfilipi

Loading

Ivanidzo4ka Jun 20, 2018 •

edited by sfilipi

Loading

wschin Jun 20, 2018 •

edited by sfilipi

Loading

wschin Jun 20, 2018 •

edited by sfilipi

Loading

eerhardt Jun 26, 2018 •

edited by sfilipi

Loading

wschin Jun 21, 2018 •

edited by sfilipi

Loading

wschin Jun 21, 2018 •

edited by sfilipi

Loading

wschin Jun 21, 2018 •

edited by sfilipi

Loading

wschin Jun 21, 2018 •

edited by sfilipi

Loading

wschin Jun 21, 2018 •

edited by sfilipi

Loading

wschin Jun 21, 2018 •

edited by sfilipi

Loading

wschin Jun 21, 2018 •

edited by sfilipi

Loading

wschin Jun 25, 2018 •

edited by sfilipi

Loading

wschin Jun 22, 2018 •

edited by sfilipi

Loading

wschin Jun 25, 2018 •

edited by sfilipi

Loading

wschin Jun 22, 2018 •

edited by sfilipi

Loading

wschin Jun 25, 2018 •

edited by sfilipi

Loading

wschin Jun 22, 2018 •

edited by sfilipi

Loading

wschin Jun 22, 2018 •

edited by sfilipi

Loading

wschin Jun 22, 2018 •

edited by sfilipi

Loading

wschin Jun 22, 2018 •

edited by sfilipi

Loading

wschin Jun 22, 2018 •

edited by sfilipi

Loading

glebuk Jun 25, 2018 •

edited by sfilipi

Loading

glebuk Jun 25, 2018 •

edited by sfilipi

Loading

wschin Jun 26, 2018 •

edited by sfilipi

Loading

glebuk Jun 25, 2018 •

edited by sfilipi

Loading

wschin Jun 26, 2018 •

edited by sfilipi

Loading

glebuk Jun 25, 2018 •

edited by sfilipi

Loading

glebuk Jun 25, 2018 •

edited by sfilipi

Loading

wschin Jun 26, 2018 •

edited by sfilipi

Loading

glebuk Jun 25, 2018 •

edited by sfilipi

Loading

glebuk Jun 25, 2018 •

edited by sfilipi

Loading

wschin Jun 26, 2018 •

edited by sfilipi

Loading

eerhardt Jun 26, 2018 •

edited by sfilipi

Loading

TomFinley Jun 26, 2018 •

edited by sfilipi

Loading

eerhardt Jun 26, 2018 •

edited by sfilipi

Loading

eerhardt Jun 26, 2018 •

edited by sfilipi

Loading

wschin Jun 26, 2018 •

edited by sfilipi

Loading

wschin Jun 26, 2018 •

edited by sfilipi

Loading

wschin Jun 26, 2018 •

edited by sfilipi

Loading

Ivanidzo4ka Jun 26, 2018 •

edited by sfilipi

Loading