Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can classifier-reborn work with Numo::NArray / Numo::GSL ? Is that a better choice than nmatrix? #192

Closed
0xdevalias opened this issue Jul 21, 2020 · 9 comments

Comments

@0xdevalias
Copy link

0xdevalias commented Jul 21, 2020

I noticed that https://jekyll.github.io/classifier-reborn/#dependencies states we can use the rb-gsl gem to speed things up, and that in turn references using nmatrix or narray.

The linked narray GitHub page states that it is in maintenance mode, and links to Numo::NArray, which lists Numo::GSL under related projects.

Is it possible to use Numo::GSL with classifier-reborn?


Since rb-gsl can only use one of nmatrix or narray, i'm also wondering which would be the better (aka: faster) choice to use for this functionality?

In order to use rb-gsl with NMatrix you must first set the NMATRIX environment variable and then install rb-gsl: gem install nmatrix export NMATRIX=1 gem install rb-gsl

This will compile rb-gsl with NMatrix specific functions.

For using rb-gsl with NArray: gem install narray export NARRAY=1 gem install rb-gsl

Note that setting both NMATRIX and NARRAY variables will lead to undefined behaviour. Only one can be used at a time.

I also noticed that rb-gsl hasn't been updated since 2017, and so is most likely unmaintained:

@mkasberg
Copy link
Contributor

mkasberg commented May 4, 2022

Bringing this issue back to life! I love Jekyll ❤️ and still use it for several personal projects. I also use this classifier-reborn gem, and it would be great to make sure it remains easy to use with more modern hardware & software. Here are some notes about why this matters:

  • This gem is currently testing in CI against Ruby 2.4 and 2.6, both of which are EOL.
    - 2.4
    - 2.6
  • Ruby 2.7 will be the last 2.x branch, and will be EOL in less than a year.
  • Ubuntu 22.04 no longer includes openssl 1.1, making it difficult to build/use 2.x branches of Ruby.
  • Likewise, my understanding is that using Ruby with openssl 1.1 via brew on macOS is possible, but is not the default and requires extra configuration to build.

All that is to say... it's increasingly important that this gem is easy to use with Ruby 3. However:

  • Using classifier-reborn on my ~50 post blog is unusable (hours of build time) without GSL.
  • rb-gsl is unmaintained - luckily, there's a commit on main that makes it compatible with Ruby 3, but nobody has released the gem so the only way to use rb-gsl with Ruby 3 right now is to specify the git hash in your Gemfile. See Publish a new release to rubygems SciRuby/rb-gsl#67
  • As pointed out above, rb-gsl is itself using a deprecated/unmaintained narray implementation.

I would like to write some PRs to try to:

  • Update CI to work with Ruby 2.7 and 3.x on GitHub Actions so we have good CI/tests with modern Ruby versions.
  • Add Numo::Linalg as an alternative to rb-gsl. This could work similarly to the way $GSL currently works, detecting the availability at runtime.
  • Ultimately, produce a released gem version that's compatible with Ruby 3.x in a way that does not require using unreleased versions of dependencies for decent performance.

I've experimented a little bit (essentially a prototype/spike) and I think I can get a working SVD using Numo::Linalg. For example, here's an alternative implementation of LSI#build_reduced_matrix I was experimenting with:

    def numo_build_reduced_matrix(matrix, cutoff = 0.75)
      # OPTIMIZE ME: Consider other drivers/options like sdd.
      s, u, vt = Numo::Linalg.svd(matrix, driver: 'svd', job: 'S')

      # TODO: Better than 75% term, please. :\
      s_cutoff = s.sort.reverse[(s.size * cutoff).round - 1]
      s.size.times do |ord
        s[ord] = 0.0 if s[ord] < s_cutoff
      end

      # Reconstruct the term document matrix, only with reduced rank
      u.dot(::Numo::DFloat.eye(s.size) * s).dot(vt)
    end

To make this work and get it merged, I need the support of a maintainer on this project. Is there anyone in here who could review & merge PRs for the above? I realize this is an open source side project for most people (me too!) and I'm not looking for immediate response/turnaround. At the same time, I'm hesitant to invest in this work if my PRs will sit for months with no response.

@mattr-
Copy link
Member

mattr- commented May 4, 2022

If you can make the PRs, I can review and merge them.

mkasberg added a commit to mkasberg/classifier-reborn that referenced this issue May 6, 2022
This repo hasn't received much attention recently. As such, the TravisCI
yaml file was referencing
[EOL](https://www.ruby-lang.org/en/downloads/branches/) versions of
ruby. Moreover, TravisCI itself isn't generally used for open source
anymore. There are heavy restrictions on build minutes, as noted
[here](jekyll/jekyll#8492) in the core Jekyll
project.

This PR does the following:

 * Removes `.travis.yml`. We won't run jobs on TravisCI anymore.
 * Replaces it with `.github/workflows/ci.yml`. We'll start running CI
   on GitHub Actions.
 * Updates Ruby versions to those currently supported. This matches the
   ones tested in the [core jekyll project](https://github.com/jekyll/jekyll/blob/796ae15c31147d1980662744ef0f19a15a27cdee/.github/workflows/ci.yml#L20-L28).

This is making progress toward jekyll#192. I plan to work toward supporting
Ruby 3 and Numo in classifier-reborn, but the first step is getting CI
working with the existing code.
mkasberg added a commit to mkasberg/classifier-reborn that referenced this issue May 6, 2022
This repo hasn't received much attention recently. As such, the TravisCI
yaml file was referencing
[EOL](https://www.ruby-lang.org/en/downloads/branches/) versions of
ruby. Moreover, TravisCI itself isn't generally used for open source
anymore. There are heavy restrictions on build minutes, as noted
[here](jekyll/jekyll#8492) in the core Jekyll
project.

This PR does the following:

 * Removes `.travis.yml`. We won't run jobs on TravisCI anymore.
 * Replaces it with `.github/workflows/ci.yml`. We'll start running CI
   on GitHub Actions.
 * Updates tested Ruby versions to 2.7 and 3.0. This is a subset of the
   ruby versions currently supported/tested by [Jekyll
   core](https://github.com/jekyll/jekyll/blob/796ae15c31147d1980662744ef0f19a15a27cdee/.github/workflows/ci.yml#L20-L28).
   I plan to add support for Ruby 3.1 and jruby 9.3.4 in subsequent PRs,
   but doing so will require additional code changes and I wanted to
   start be getting the existing code under test in CI.

This is making progress toward jekyll#192. I plan to work toward supporting
Ruby 3 with fast svd support and Numo in classifier-reborn, but the
first step is getting CI working with the existing code.
mkasberg added a commit to mkasberg/classifier-reborn that referenced this issue May 6, 2022
This repo hasn't received much attention recently. As such, the TravisCI
yaml file was referencing
[EOL](https://www.ruby-lang.org/en/downloads/branches/) versions of
ruby. Moreover, TravisCI itself isn't generally used for open source
anymore. There are heavy restrictions on build minutes, as noted
[here](jekyll/jekyll#8492) in the core Jekyll
project.

This PR does the following:

 * Removes `.travis.yml`. We won't run jobs on TravisCI anymore.
 * Replaces it with `.github/workflows/ci.yml`. We'll start running CI
   on GitHub Actions.
 * Updates tested Ruby versions to 2.7, 3.0, and jruby 9.3.4. This is a
   subset of the ruby versions currently supported/tested by [Jekyll core](https://github.com/jekyll/jekyll/blob/796ae15c31147d1980662744ef0f19a15a27cdee/.github/workflows/ci.yml#L20-L28).
   I plan to add support for Ruby 3.1 in a subsequent commit, but doing so
   will require additional code changes and I wanted to start be getting
   the existing code under test in CI.

This makes progress toward jekyll#192. I plan to work toward supporting Ruby 3
with fast SVD support and Numo in classifier-reborn, but the first step
is getting CI working with the existing code.
mkasberg added a commit to mkasberg/classifier-reborn that referenced this issue May 8, 2022
This repo hasn't received much attention recently. As such, the TravisCI
yaml file was referencing
[EOL](https://www.ruby-lang.org/en/downloads/branches/) versions of
ruby. Moreover, TravisCI itself isn't generally used for open source
anymore. There are heavy restrictions on build minutes, as noted
[here](jekyll/jekyll#8492) in the core Jekyll
project.

This PR does the following:

 * Removes `.travis.yml`. We won't run jobs on TravisCI anymore.
 * Replaces it with `.github/workflows/ci.yml`. We'll start running CI
   on GitHub Actions.
 * Updates tested Ruby versions to 2.7, 3.0, and jruby 9.3.4. This is a
   subset of the ruby versions currently supported/tested by [Jekyll core](https://github.com/jekyll/jekyll/blob/796ae15c31147d1980662744ef0f19a15a27cdee/.github/workflows/ci.yml#L20-L28).
   I plan to add support for Ruby 3.1 in a subsequent commit, but doing so
   will require additional code changes and I wanted to start be getting
   the existing code under test in CI.

This makes progress toward jekyll#192. I plan to work toward supporting Ruby 3
with fast SVD support and Numo in classifier-reborn, but the first step
is getting CI working with the existing code.
@mkasberg
Copy link
Contributor

@mattr- what's the process like for releasing a new gem version now that #198 is merged?

@mattr-
Copy link
Member

mattr- commented Jun 13, 2022

@mkasberg will research and find out. It's likely similar to what we have in the maintainer instructions for the core jekyll gem.

@mattr-
Copy link
Member

mattr- commented Jun 19, 2022

I've pushed a v2.3.0 tag and am waiting on permission changes to rubygems.org to push the new gem itself. There will be a new release soon™

@mattr-
Copy link
Member

mattr- commented Jun 19, 2022

Since v2.3.0 will support Numo, I'm going to go ahead and close this.

@mattr- mattr- closed this as completed Jun 19, 2022
@mkasberg
Copy link
Contributor

mkasberg commented Jul 9, 2022

@mattr- Any updates? I still don't see 2.3.0 at https://rubygems.org/gems/classifier-reborn

@mattr-
Copy link
Member

mattr- commented Jul 12, 2022

Sorry, I was on vacation. I just pushed the gem up.

@mkasberg
Copy link
Contributor

Awesome! Apologies for pinging you on your vacation, hope you had a great time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants