[WiP] SRHTs #122

aryamanjeendgar · 2024-10-01T17:39:17Z

This is a draft PR for integrating SRHTs into RandBLAS

CLAassistant · 2024-10-01T17:39:22Z

All committers have signed the CLA.

rileyjmurray

It's exciting to see progress here!

I left several comments. Please resolve them, but then take a step back. Part of what you're doing is adding FHT support. This already takes effort without thinking about randomization.

Building up your suite of FHT kernels

Make a folder RandBLAS/trig/ and a file RandBLAS/trig/hadamard.hh. Have your various FHT implementations go here. You already have one for applying to column-major data from the left and storing the result in a column-major sketch. You could easily write implementation that takes in row-major data and writes to a row-major sketch (although the only way you can parallelize this is with BLAS1, I think).

It would be great to have implementations for both the Hadamard transform itself and for the transpose of the Hadamard transform (equivalently, the inverse of the Hadamard transform). If you do this then w.l.o.g. you can always assume you're applying the transformation from the left.

Once you have those functions working, add in the ability to implicitly scale the rows of the input matrix by a vector of coefficients. In sketching we end up setting this vector to a Rademacher random vector, but from an implementation standpoint these functions don't need to care where the vector comes from.

Once you have those cases sorted out, you can allow conflicting layouts for the input matrix and the output matrix (i.e., input is row-major and output is column-major). This is the trick I use for resolving transposition in the sparse-times-dense matrix kernels.

Note: if you feel like you need to allocate a temporary matrix for workspace in order to do anything useful, you can definitely try that.

Writing tests

Make a folder test/test_matmul_cores/test_trig/ and a file test/test_matmul_cores/test_trig/test_hadamard.cc. This will handle tests only for your FHT.

You can take some inspiration from

RandBLAS/test/test_matmul_wrappers/test_sketch_sparse.cc

Lines 53 to 190 in 36da117

    
           // Adapted from test::linop_common::test_left_apply_transpose_to_eye. 
        
           template <typename T, typename DenseSkOp, SparseMatrix SpMat = COOMatrix<T,int64_t>> 
        
           void test_left_transposed_sketch_of_eye( 
        
               // B = S^T * eye, where S is m-by-d, B is d-by-m 
        
               DenseSkOp &S, Layout layout 
        
           ) { 
        
               auto [m, d] = dimensions(S); 
        
               auto I = eye<SpMat>(m); 
        
               std::vector<T> B(d * m, 0.0); 
        
               bool is_colmajor = (Layout::ColMajor == layout); 
        
               int64_t ldb = (is_colmajor) ? d : m; 
        
               int64_t lds = (is_colmajor) ? m : d; 
        
               lsksp3( 
        
                   layout, Op::Trans, Op::NoTrans, d, m, m, 
        
                   (T) 1.0, S, 0, 0, I, 0, 0, (T) 0.0, B.data(), ldb 
        
               ); 
        
               std::vector<T> S_dense(m * d, 0.0); 
        
               to_explicit_buffer(S, S_dense.data(), layout); 
        
               test::comparison::matrices_approx_equal( 
        
                   layout, Op::Trans, d, m, 
        
                   B.data(), ldb, S_dense.data(), lds, 
        
                   __PRETTY_FUNCTION__, __FILE__, __LINE__ 
        
               ); 
        
           } 
        
           // Adapted from test::linop_common::test_left_apply_submatrix_to_eye. 
        
           template <typename T, typename DenseSkOp, SparseMatrix SpMat = COOMatrix<T,int64_t>> 
        
           void test_left_submat_sketch_of_eye( 
        
               // B = alpha * submat(S0) * eye + beta*B, where S = submat(S) is d1-by-m1 offset by (S_ro, S_co) in S0, and B is random. 
        
               T alpha, DenseSkOp &S0, int64_t d1, int64_t m1, int64_t S_ro, int64_t S_co, Layout layout, T beta = 0.0 
        
           ) { 
        
               auto [d0, m0] = dimensions(S0); 
        
               randblas_require(d0 >= d1); 
        
               randblas_require(m0 >= m1); 
        
               bool is_colmajor = layout == Layout::ColMajor; 
        
               int64_t ldb = (is_colmajor) ? d1 : m1; 
        
               // define a matrix to be sketched, and create workspace for sketch. 
        
               auto I = eye<SpMat>(m1); 
        
               auto B = std::get<0>(random_matrix<T>(d1, m1, RNGState(42))); 
        
               std::vector<T> B_backup(B); 
        
               // Perform the sketch 
        
               lsksp3( 
        
                   layout, Op::NoTrans, Op::NoTrans, d1, m1, m1, 
        
                   alpha, S0, S_ro, S_co, I, 0, 0, beta, B.data(), ldb 
        
               ); 
        
               // Check the result 
        
               T *expect = new T[d0 * m0]; 
        
               to_explicit_buffer(S0, expect, layout); 
        
               int64_t ld_expect = (is_colmajor) ? d0 : m0;  
        
               auto [row_stride_s, col_stride_s] = layout_to_strides(layout, ld_expect); 
        
               auto [row_stride_b, col_stride_b] = layout_to_strides(layout, ldb); 
        
               int64_t offset = row_stride_s * S_ro + col_stride_s * S_co; 
        
               #define MAT_E(_i, _j) expect[offset + (_i)*row_stride_s + (_j)*col_stride_s] 
        
               #define MAT_B(_i, _j) B_backup[       (_i)*row_stride_b + (_j)*col_stride_b] 
        
               for (int i = 0; i < d1; ++i) { 
        
                   for (int j = 0; j < m1; ++j) { 
        
                       MAT_E(i,j) = alpha * MAT_E(i,j) + beta * MAT_B(i, j); 
        
                   } 
        
               } 
        
               test::comparison::matrices_approx_equal( 
        
                   layout, Op::NoTrans, 
        
                   d1, m1, 
        
                   B.data(), ldb, 
        
                   &expect[offset], ld_expect, 
        
                   __PRETTY_FUNCTION__, __FILE__, __LINE__ 
        
               ); 
        
               delete [] expect; 
        
           } 
        
           // Adapted from test::linop_common::test_right_apply_transpose_to_eye. 
        
           template <typename T, typename DenseSkOp, SparseMatrix SpMat = COOMatrix<T,int64_t>> 
        
           void test_right_transposed_sketch_of_eye( 
        
               // B = eye * S^T, where S is d-by-n, so eye is order n and B is n-by-d 
        
               DenseSkOp &S, Layout layout 
        
           ) { 
        
               auto [d, n] = dimensions(S); 
        
               auto I = eye<SpMat>(n); 
        
               std::vector<T> B(n * d, 0.0); 
        
               bool is_colmajor = Layout::ColMajor == layout; 
        
               int64_t ldb = (is_colmajor) ? n : d; 
        
               int64_t lds = (is_colmajor) ? d : n; 
        
               rsksp3(layout, Op::NoTrans, Op::Trans, n, d, n, (T) 1.0, I, 0, 0, S, 0, 0, (T) 0.0, B.data(), ldb); 
        
               std::vector<T> S_dense(n * d, 0.0); 
        
               to_explicit_buffer(S, S_dense.data(), layout); 
        
               test::comparison::matrices_approx_equal( 
        
                   layout, Op::Trans, n, d,  
        
                   B.data(), ldb, S_dense.data(), lds, 
        
                   __PRETTY_FUNCTION__, __FILE__, __LINE__ 
        
               ); 
        
           } 
        
           // Adapted from test::linop_common::test_right_apply_submatrix_to_eye. 
        
           template <typename T, typename DenseSkOp, SparseMatrix SpMat = COOMatrix<T,int64_t>> 
        
           void test_right_submat_sketch_of_eye( 
        
               // B = alpha * eye * submat(S) + beta*B : submat(S) is n-by-d, eye is n-by-n, B is n-by-d and random 
        
               T alpha, DenseSkOp &S0, int64_t n, int64_t d, int64_t S_ro, int64_t S_co, Layout layout, T beta = 0.0 
        
           ) { 
        
               auto [n0, d0] = dimensions(S0); 
        
               randblas_require(n0 >= n); 
        
               randblas_require(d0 >= d); 
        
               bool is_colmajor = layout == Layout::ColMajor; 
        
               int64_t ldb = (is_colmajor) ? n : d; 
        
               auto I = eye<SpMat>(n); 
        
               auto B = std::get<0>(random_matrix<T>(n, d, RNGState(11))); 
        
               std::vector<T> B_backup(B); 
        
               rsksp3(layout, Op::NoTrans, Op::NoTrans, n, d, n, alpha, I, 0, 0, S0, S_ro, S_co, beta, B.data(), ldb); 
        
               T *expect = new T[n0 * d0]; 
        
               to_explicit_buffer(S0, expect, layout); 
        
               int64_t ld_expect = (is_colmajor)? n0 : d0; 
        
               auto [row_stride_s, col_stride_s] = layout_to_strides(layout, ld_expect); 
        
               auto [row_stride_b, col_stride_b] = layout_to_strides(layout, ldb); 
        
               int64_t offset = row_stride_s * S_ro + col_stride_s * S_co; 
        
               #define MAT_E(_i, _j) expect[offset + (_i)*row_stride_s + (_j)*col_stride_s] 
        
               #define MAT_B(_i, _j) B_backup[       (_i)*row_stride_b + (_j)*col_stride_b] 
        
               for (int i = 0; i < n; ++i) { 
        
                   for (int j = 0; j < d; ++j) { 
        
                       MAT_E(i,j) = alpha * MAT_E(i,j) + beta * MAT_B(i, j); 
        
                   } 
        
               } 
        
               test::comparison::matrices_approx_equal( 
        
                   layout, Op::NoTrans, n, d, B.data(), ldb, &expect[offset], ld_expect, 
        
                   __PRETTY_FUNCTION__, __FILE__, __LINE__ 
        
               ); 
        
               delete [] expect; 
        
           }

to see what your tests could like. But don't let this line of thinking hobble you. Feel free to take a different approach to get started. I can give feedback and we iterate a bit.

RandBLAS/trig_skops.hh

rileyjmurray · 2024-10-02T23:59:46Z

@aryamanjeendgar, for our reference, here's the code you mentioned from the test suite of FFHT:

        void fht(double *buf, int log_n) {
        int n = 1 << log_n;
        for (int i = 0; i < log_n; ++i) {
                int s1 = 1 << i;
                int s2 = s1 << 1;
                for (int j = 0; j < n; j += s2) {
                for (int k = 0; k < s1; ++k) {
                        double u = buf[j + k];
                        double v = buf[j + k + s1];
                        buf[j + k] = u + v;
                        buf[j + k + s1] = u - v;
                }
                }
        }
        }

Step 1 to figuring out how we'll change it: replace the bit manipulations with standard integer arithmetic (or just give explanatory comments). You can also create a GitHub issue to facilitate the discussion, or make an Overleaf project.

rileyjmurray

Please do a detailed read of my notes in GitHub Issue #99.

I didn't review the whole PR since my plane is landing and I need to put away my laptop.

Ping @aryamanjeendgar

RandBLAS/trig_skops.hh

rileyjmurray · 2024-10-11T21:22:11Z

RandBLAS/trig_skops.hh

+    template<typename T, SignedInteger sint_t>
+    void applyDiagonalRademacher(
+                                bool left,
+                                blas::Layout layout,
+                                int64_t rows,
+                                int64_t cols,
+                                T* A,
+                                sint_t* diag
+                                )
+    {
+        if(left && layout == blas::Layout::ColMajor) {
+            for(int col=0; col < cols; col++) {
+                if(diag[col] > 0)
+                    continue;
+                blas::scal(rows, diag[col], &A[col * rows], 1);
+            }
+        }
+        else if(left && layout == blas::Layout::RowMajor) {
+            for(int col=0; col < cols; col++) {
+                if(diag[col] > 0)
+                    continue;
+                blas::scal(rows, diag[col], &A[col], cols);
+            }
+        }
+        else if(!left && layout == blas::Layout::ColMajor) {
+            for(int row = 0; row < rows; row++) {
+                if(diag[row] > 0)
+                    continue;
+                blas::scal(cols, diag[row], &A[row], rows);
+            }
+        }
+        else {
+            for(int row = 0; row < rows; row++) {
+                if(diag[row] > 0)
+                    continue;
+                blas::scal(cols, diag[row], &A[row * cols], 1);
+            }
+        }
+    }


This is fine for now, but it can and should be much more efficient. It's also probably better suited to RandBLAS/util.hh.

RandBLAS/trig_skops.hh

rileyjmurray

This is looking pretty good! Please start writing unit tests.

RandBLAS/trig_skops.hh

aryamanjeendgar · 2024-10-17T09:56:43Z

A description of the tests:

All of the tests right now (unfortunately) use Eigen --- I use Eigen extensively to be able to consistently produce RowMajor and ColMajor random matrices, computing matrix norms and products in all of the tests --- additionally, I use Eigen in the tests for the permutation matrices since Eigen has a convenient out-of-the-box PermutationMatrix class.

I also ended up using std::random for generating a simple "random" buffer ƒrom which I instantiate the Eigen matrices.

Much of this infrastructure (with norm computation, scaling, computing products etc.) can be ported (with a lot more LoC XD) to BLAS with the exception of the permutation tests which tests against Eigen's PermutationMatrix .

The correctness tests are fairly straightforward to understand:

For testing permutation: I permute the last row/column of the input matrix (it is swapped with the leading row/column), and then check if the result of my implementation and Eigens' matches up.
For testing Hadamard: Checked against explicit application of the Hadamard matrix
For testing Diagonal scaling: Scale the input matrix by all $-1$'s via my code and see if it's equal to a negated copy of the initial matrix

The tests are simple, but they cover all of the potential code paths in my code (since the code paths are chosen around two inputs: sketching direction + layout of the input matrix)

Let me know how we should proceed next, @rileyjmurray!

UPDATE: The latest commit also adds in invSRHT tests for checking that $\left(\mathbf{\Pi RHT}\right)^{-1}\left(\mathbf{\Pi RHT}\left(A\right)\right) = A$

rileyjmurray

You don't need the RandBLAS:: namespace qualifications within RandBLAS itself.
Don't explicitly reference r123::Philox4x32. Reference RandBLAS::DefaultRNG (But, again, without the namespace qualification).
I don't like people having to explicitly pass in log_n to the various FHT functions. Do you have a good reason for that?
Please don't use a single monolithic "correctness" test.
Tests should template numerical precision, not hard-code double. The tolerance for the test would ideally be based on some known error bounds for the operations you're trying to perform. The error bounds you're working with should be very simple since every part of an RHT is an orthogonal matrix, and there are special bounds for those.
There should be functions in RandBLAS proper (not just tests) for inverting an RHT.

This is just a partial review. I have to jump to a meeting.

RandBLAS/trig_skops.hh

test/test_matmul_cores/test_trig.cc

rileyjmurray

Woooo progress!

aryamanjeendgar · 2025-02-16T08:34:26Z

@rileyjmurray, the recent commit should address all of your comments in the review!

rileyjmurray · 2025-02-16T12:41:04Z

RandBLAS/trig_skops.hh

+        float rand_value = r123::u01fixedpt<float>(r.v[0]);
+
+        buff[i] = rand_value < 0.5 ? -1 : 1;


Conversion to float isn't necessary. Just use sint_t rand_value = (r.v[0] % 2 == 0) -1 : 1;.

rileyjmurray · 2025-02-16T12:43:25Z

RandBLAS/trig_skops.hh

+    auto [ctr, key] = seed_state;
+
+    for (int64_t i = 0; i < n; ++i) {
+        typename DefaultRNG::ctr_type r = rng(ctr, key);


This might raise a compiler error if state_t isn't RNGState<DefaultRNG>.

rileyjmurray · 2025-02-16T12:46:15Z

RandBLAS/trig_skops.hh

+template<typename T, SignedInteger sint_t = int64_t>
+void permute_rows_to_top(
+    blas::Layout layout,
+    int64_t rows,
+    int64_t cols,
+    sint_t* selected_rows,
+    int64_t d, // size of `selected_rows`
+    T* A
+) {
+    int64_t top = 0;  // Keeps track of the topmost unselected row
+
+    //TODO: discuss precise semantics of `selected_rows` in this function
+    if(layout == blas::Layout::ColMajor) {
+        for (int64_t i=0; i < d; i++) {
+            if (selected_rows[i] != top) {
+                // Use BLAS swap to swap the entire rows
+                // Swapping row 'selected' with row 'top'
+                blas::swap(cols, &A[top], rows, &A[selected_rows[i]], rows);
+            } // else, continue;
+        }
+    }
+    else {
+        // For `RowMajor` ordering
+        for (int64_t i=0; i < d; i++) {
+            if (selected_rows[i] != top) {
+                blas::swap(cols, &A[cols * selected_rows[i]], 1, &A[cols * top], 1);
+            } // else, continue;
+        }
+    }
+}


This is missing the leading dimension parameter for A.

rileyjmurray · 2025-02-16T12:46:31Z

RandBLAS/trig_skops.hh

+template<typename T, SignedInteger sint_t = int64_t>
+void permute_cols_to_left(
+    blas::Layout layout,
+    int64_t rows,
+    int64_t cols,
+    sint_t* selected_cols,
+    int64_t d, // size of `selectedRows`
+    T* A
+) {
+int64_t left = 0;  // Keeps track of the topmost unselected column
+
+if(layout == blas::Layout::ColMajor) {
+    for (int64_t i=0; i < d; i++) {
+        if (selected_cols[i] != left) {
+            // Use BLAS::swap to swap entire columns at once
+            // Swapping col 'selected' with col 'top'
+            blas::swap(rows, &A[rows * selected_cols[i]], 1, &A[rows * left], 1);
+        }
+    }
+}
+else {
+    // For `RowMajor` ordering
+    for (int64_t i=0; i < d; i++) {
+        if (selected_cols[i] != left) {
+            blas::swap(rows, &A[selected_cols[i]], cols, &A[left], cols);
+        }
+    }
+}
+}


As above, this is missing lda.

rileyjmurray · 2025-02-16T12:46:40Z

RandBLAS/trig_skops.hh

+int64_t left = 0;  // Keeps track of the topmost unselected column
+
+if(layout == blas::Layout::ColMajor) {
+    for (int64_t i=0; i < d; i++) {
+        if (selected_cols[i] != left) {
+            // Use BLAS::swap to swap entire columns at once
+            // Swapping col 'selected' with col 'top'
+            blas::swap(rows, &A[rows * selected_cols[i]], 1, &A[rows * left], 1);
+        }
+    }
+}
+else {
+    // For `RowMajor` ordering
+    for (int64_t i=0; i < d; i++) {
+        if (selected_cols[i] != left) {
+            blas::swap(rows, &A[selected_cols[i]], cols, &A[left], cols);
+        }
+    }
+}


rileyjmurray · 2025-02-16T12:47:38Z

RandBLAS/trig_skops.hh

+template <typename T>
+void fht_dispatch(
+    bool left, // Pre-multiplying?
+    blas::Layout layout,
+    int64_t num_rows,
+    int64_t num_cols,
+    int64_t log_n,
+    T* A,
+    int64_t workspace_ld=0, // leading dimension of the workspace buffer
+    T* workspace_buf=nullptr
+) {
+if(left && layout == blas::Layout::ColMajor)
+    fht_left_col_major(A, workspace_buf, workspace_ld, log_n, num_rows, num_cols);
+else if(left && layout == blas::Layout::RowMajor)
+    fht_left_row_major(A, workspace_buf, log_n, num_rows, num_cols);
+else if(!left && layout == blas::Layout::ColMajor)
+    fht_right_col_major(A, workspace_buf, log_n, num_rows, num_cols);
+else
+    fht_right_row_major(A, workspace_buf, workspace_ld, log_n, num_rows, num_cols);
+}
+}


This kernel (and all kernels it calls) need to accept an lda parameter.

rileyjmurray · 2025-02-16T12:49:31Z

RandBLAS/trig_skops.hh

+    int64_t workspace_ld=0, // leading dimension of the workspace buffer
+    T* workspace_buf=nullptr


If workspace_buf is nullptr then this can segfault. Either make it a required argument or have special logic to handle when it's nullptr.

rileyjmurray · 2025-02-16T12:51:03Z

RandBLAS/trig_skops.hh

+                  //NOTE: I am also going to assume that the user is kind enough to
+                  // populate this fully with zeros


I'd prefer that we not make this assumption. We should zero things out as the first step of applying the RHT.

rileyjmurray · 2025-02-16T12:54:21Z

RandBLAS/trig_skops.hh

+template <typename T>
+struct PaddedMatrix {
+    int64_t n_rows;
+    int64_t n_cols;
+    blas::Layout layout;
+    T* data; // Data-matrix
+    int64_t lda;
+    T* work; // Extra space of sz: (2 ** n_closest - lda) * cols/rows
+                  // Memory **has** to be managed by the user
+                  //NOTE: I am also going to assume that the user is kind enough to
+                  // populate this fully with zeros
+    //NOTE: We could (should?) explore the possibility of managing the buffer by ourselves
+    int64_t work_ld;
+    bool left;
+
+    // Trivial constructor
+    PaddedMatrix(
+        bool left,
+        blas::Layout layout,
+        int64_t rows,
+        int64_t cols,
+        T* data,
+        int64_t lda,
+        T* work
+    ) : left(left), layout(layout), n_rows(rows), n_cols(cols), data(data), lda(lda), work(work) {};
+};


This needs more comprehensive documentation. Compare to SparseSkOp. Add web docs, per DevNotes.md in the screenshot below.

rileyjmurray · 2025-02-16T12:56:44Z

RandBLAS/trig_skops.hh

+/*
+ * A `pure` description of the \PiRHT as a «data-oblvious» transform
+ */
+template <typename T, SignedInteger sint_t = int64_t, typename state_t = RNGState<DefaultRNG>>
+struct HadamardMixingOp {


Comprehensive documentation needed.

rileyjmurray · 2025-02-16T12:57:12Z

RandBLAS/trig_skops.hh

+    // Populating `perm_idx`
+    next_state = repeated_fisher_yates<sint_t>(
+        hmo.d,
+        hmo.dim,
+        1,
+        hmo.perm_idx,
+        next_state
+    );


rileyjmurray · 2025-02-16T12:57:26Z

RandBLAS/trig_skops.hh

+if(!hmo.filled()) {
+    hmo.diag_scale = new sint_t[hmo.dim];
+
+    hmo.perm_idx = new sint_t[hmo.d];
+
+    auto [ctr, key] = hmo.seed_state;
+
+    // Populating `diag`
+    auto next_state = RandBLAS::trig::kernels::generate_rademacher_vector_r123(hmo.diag_scale, hmo.dim, hmo.seed_state);
+
+    // Populating `perm_idx`
+    next_state = repeated_fisher_yates<sint_t>(


Remove excessive blank lines. We use blank lines to separate blocks of code that are both nontrivial and conceptually distinct from one another.

rileyjmurray · 2025-02-16T13:00:17Z

RandBLAS/trig_skops.hh

+ * a very similar signature to the driver itself
+*/
+template <typename T, typename state_t = RNGState<DefaultRNG>, SignedInteger sint_t = int64_t>
+inline int64_t miget_workspace_sz(


Naming convention: we don't use "sz" as an abbreviation for "size."

rileyjmurray · 2025-02-16T13:06:09Z

RandBLAS/trig_skops.hh

+                  // populate this fully with zeros
+    //NOTE: We could (should?) explore the possibility of managing the buffer by ourselves
+    int64_t work_ld;
+    bool left;


I don't like the name left in this context. I suggest you define an enum (perhaps a static member of this type) that you can use to indicate if you're padding with extra rows or extra columns. As a matter of practice, you'll pad with rows when applying an RHT from the left and you'll pad with columns when applying an RHT from the right.

rileyjmurray · 2025-02-16T13:06:20Z

RandBLAS/trig_skops.hh

+template <typename T, SignedInteger sint_t = int64_t, typename state_t = RNGState<DefaultRNG>>
+state_t compute_next_state(state_t state, int64_t dim, int64_t d) {
+    state.counter.incr(d * 1 + dim);
+


Remove blank line.

rileyjmurray · 2025-02-16T13:07:57Z

RandBLAS/trig_skops.hh

+    bool filled() {
+        if(this -> diag_scale == nullptr \
+        && this -> perm_idx == nullptr)
+            return false;
+        else
+            return true;
+    }


This function can be removed and in-lined with equivalent code

bool filled = (hmo.diag_scale != nullptr) && (hmo.perm_idx != nullptr);

rileyjmurray · 2025-02-16T13:08:26Z

RandBLAS/trig_skops.hh

+template <typename T, typename state_t = RNGState<DefaultRNG>, SignedInteger sint_t = int64_t>
+void fill_hadamard(
+    HadamardMixingOp<T, sint_t> &hmo
+) {


Indent the contents of this function.

rileyjmurray · 2025-02-16T13:09:58Z

RandBLAS/trig_skops.hh

+
+    //Step 2: Apply the Hadamard transform (invH = H.T = H)
+    // This has to be a `double` because `blas::scal` really doesn't like being passed integer types
+    double scal_factor = std::pow(2, int(std::log2(hmo.dim)) + 1);


Use T scal_factor, not double scal_factor.

rileyjmurray

See individual comments.

rileyjmurray · 2025-02-16T13:11:24Z

@aryamanjeendgar I've reviewed and left more comments.

init-commit, initial TrigSkOp structure

2872d1f

Cleaning

7103a00

rileyjmurray reviewed Oct 2, 2024

View reviewed changes

aryamanjeendgar added 10 commits October 7, 2024 13:50

Commiting old changes for documenting

5c1572d

A <- (\Pi D) A, for ColMajor A

a604909

A <- (\Pi H D) A for A: (mxn) and ColMajor

1caaffe

{L|R}RowMajor, {R}RowMajor, implicit padding

30c32f3

All variations of $H_n$ implemented

e601309

Cleaned code with fht_dispatch

7b8b134

Generalizing applyDiagonalRademacher

5f72995

Generalized permuteRowsToTop

3bd2361

Added permuteColsToLeft && rmiget

eec1813

Removed old TrigSkOp code

10470bb

rileyjmurray requested changes Oct 11, 2024

View reviewed changes

aryamanjeendgar added 3 commits October 12, 2024 15:03

First review

b2aabf8

Some stylistic cleanups

bc33741

Templating numeric precision in FHT implementation

348bf05

rileyjmurray reviewed Oct 13, 2024

View reviewed changes

RandBLAS/trig_skops.hh Outdated Show resolved Hide resolved

aryamanjeendgar added 4 commits October 13, 2024 13:48

Corrected ommited templating + int->int64_t

f13086b

First version of tests, VERY UNSTABLE

884f96e

Hadamard && Diag tests working

018acd8

Final cleanups, all tests working

0cfdcf1

aryamanjeendgar added 2 commits October 17, 2024 06:21

Added in invSRHT tests

591522f

Cleaning up comments and stylistic changes

8b04df1

rileyjmurray requested changes Oct 21, 2024

View reviewed changes

Review

6b7101c