Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More GMP work #1075

Merged
merged 21 commits into from
Jan 23, 2017
Merged

More GMP work #1075

merged 21 commits into from
Jan 23, 2017

Conversation

fingolfin
Copy link
Member

More work on the GMP work. Not yet complete

@fingolfin fingolfin added the do not merge PRs which are not yet ready to be merged (e.g. submitted for discussion, or test results) label Jan 13, 2017
@codecov-io
Copy link

codecov-io commented Jan 13, 2017

Current coverage is 54.49% (diff: 96.10%)

Merging #1075 into master will decrease coverage by 0.03%

@@             master      #1075   diff @@
==========================================
  Files           432        432          
  Lines        224765     224546   -219   
  Methods        3431       3442    +11   
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits         122572     122368   -204   
+ Misses       102193     102178    -15   
  Partials          0          0          
Diff Coverage File Path
••••• 50% lib/integer.gi
••••••••• 96% src/gmpints.c
•••••••••• 100% src/vecgf2.c
•••••••••• 100% lib/numtheor.gi
•••••••••• 100% src/code.c

Powered by Codecov. Last update 75245e7...cad52ab

@fingolfin fingolfin force-pushed the mh/gmp2 branch 3 times, most recently from 3549c0b to d59fc61 Compare January 16, 2017 18:17
@fingolfin
Copy link
Member Author

I think this is almost good for merging. What is left is to add (more) test cases for the new functions (in commits marked "HACK"), adding some thoughts into failure modes for them (i.e. make sure there are no crashes) and finally using them in the library methods for Jacobi(), PValuation(), PowerModInt(). Might also change ModRat in the kernel to use InverseModInt.

@fingolfin fingolfin force-pushed the mh/gmp2 branch 2 times, most recently from ace78ca to a629a1f Compare January 20, 2017 19:36
@fingolfin fingolfin removed the do not merge PRs which are not yet ready to be merged (e.g. submitted for discussion, or test results) label Jan 20, 2017
CHECK_INT(mod);

if ( mod == INTOBJ_INT(0) ) {
ErrorMayQuit( "PowerModInt: <mod> must be nonzero", 0L, 0L );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, wrong func name

@@ -1195,7 +1195,11 @@ InstallGlobalFunction(PowerModInt,function ( r, e, m )
# return the power
return pow;
end);

if IsBound(POWERMODINT) then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to keep POWERMODINT_GAP around? Is POWERMODINT ever not going to be defined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar for other functions that follow a similar pattern. If they are only used for testing (and I do like testing by comparing two implementations), perhaps they could be moved to a file in tst?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original reasons I kept them are (a) testing, and (b) making it possible to quickly switch between branches without always having to recompile (as warnings would be shown about undeclared functions). Both were mostly useful for development of this PR. I left them in because that way it was clearly visible to everybody that I did indeed verify the new against the old implementation.

I have no problem either removing those, or else moving them to the .tst files, as you suggest.

** The calling code is responsible for any verification.
**
** WARNING: Since this may trigger a garbage collection, make sure to
** call this *before* any calls to FAKEMPZ_GMPorINTOBJ.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is 'this'? or should the function at the end be different?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, that is an obsolete comment I forget to remove (and somehow it indeed migrated to the "wrong" function -- I think used to be a comment for NEW_FAKEMPZ, or rather its predecessor). With this final implementation, the GC problems are mostly avoided by using the MPZ_FAKEMPZ macro.

/****************************************************************************
**
** Compute log2 of the absolute value of an Int, i.e. the index of the highest
** set bit. For 0 it returns -1, for negative values it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it..??? :)

local a,v;
if not IsPrimeInt(p) or not IsRat(n) then
local v;
if not IsPosInt(p) or not IsRat(n) then
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I change IsPrimeInt to IsPosInt as that avoided a pointless and potentially costly primality test. However, this meant it used to accept negative values for p, too. So perhaps I should change IsPosInt(p) to IsInt(p) and p <> 0 ?
Might just as well insert a better error message too... And finally, is there any reason this very useful function is not documented???

Also rewrite GcdInt, ProdInt, SumOrDiffInt to use fake_mpz_t. The
rewritten functions are much simpler than before. This technique also
opens up the way to wrapping more mpz_ functions. Finally, this commit
speeds up GcdInt if one or both arguments are multiples of a large power
of 2. For example:

Old code:
gap> GcdInt(2^100001*13, 2^100002*17);; time;
135
gap> GcdInt(2^1000001*13, 2^1000002*17);; time;
14122
gap> GcdInt(2^10000001*13, 2^10000002*17);; time;
... did not finish in reasonable time ...

New code:
gap> GcdInt(2^100001*13, 2^100002*17);; time;
1
gap> GcdInt(2^1000001*13, 2^1000002*17);; time;
13
gap> GcdInt(2^10000001*13, 2^10000002*17);; time;
164
We previously used gmp_snprintf, which needs to parse a format string.
It is simpler and more efficient to call mpz_get_str directly.
Both call the same new function StringIntBase(), which we could
also make available to the library.
There was only once place using it left, and that was actually better
off directly using the INTOBJ (instead of first converting it into a GMP
integer).
Added test to verify it does not destroy the input string anymore.
This was used for the old integers code, were the two values differed.
At this point in the computation, it should be impossible for 'mod'
to be negative, so remove the code which adjusts the sign. And just
in case I am missing something, insert an assert(!IS_NEGATIVE(mod))
(only active when DEBUG_GMP is on)
After all my refactoring, there was only one caller left.
Also convert rest of gmpints.c to use them, and also IS_INTOBJ,
IS_INTPOS, IS_INTNEG. There are now far fewer places using TNUM_OBJ
directly.
Surely the user does not want to retry as QUO / QuoInt...

Note: Just changing QUO to RemInt and the `if` to `while` would not be
enough; we'd also have to verify that opR stays an integer. And then
adjust the error message. And so on...

Since I am doubtful regarding the usefulness of ErrorReturnObj to start
with, I chose to keep my life simple and use ErrorMayQuit instead.
Use GMP's mpz_kronecker to compute the Kronecker-Jacobi symbol J(n,m).
We retain the old GAP implementation for reference, and also to
facilitate testing (we want to make sure that old and new code return
identical results).

Note that the new implementation accepts some more inputs, namely it
allows negative values for m. This is because GMP allows this, and
it seems pointless to restrict this again.

The new implementation is considerably faster. Old code:

gap> for a in [1..1000] do for b in [1..1000] do Jacob(a,b);od;od;time;
1979

New code:

gap> for a in [1..1000] do for b in [1..1000] do Jacob(a,b);od;od;time;
85
@fingolfin
Copy link
Member Author

@ChrisJefferson I updated the PR to hopefully address all your remarks. Thanks for the review!

@fingolfin fingolfin force-pushed the mh/gmp2 branch 2 times, most recently from 35be8e2 to fbf656a Compare January 22, 2017 00:09
This adds a GMP based implementation of PowerModInt. The new kernel
function INVMODINT computes modular inverse, and is not currently
exposed to the library in a documented function.

As usual, we get a speedup. Old code:

gap> for b in [-100..100] do for e in [1..100] do for m in [1..100] do
>  PowerModInt(b,e,m);od;od;od; time;
2270
gap> for b in [-100..100] do for e in [1..100] do for m in [1..100] do
>  PowerModInt(b,2^e,m);od;od;od; time;
11354

New code:

gap> for b in [-100..100] do for e in [1..100] do for m in [1..100] do
>  PowerModInt(b,e,m);od;od;od; time;
625
gap> for b in [-100..100] do for e in [1..100] do for m in [1..100] do
>  PowerModInt(b,2^e,m);od;od;od; time;
1920
It uses mpz_remove to compute the p-valuation. In fact, it computes
arbitrary g-valuations, i.e. the second argument can be an arbitrary
nonzero integer.

We also adapt PValuation to use PVALUATION_INT, and the Valuation method
for integers. For small inputs, there is little difference, but for
larger ones we get a noticeable speedup.

To illustrate this, here are some examples, where
  ps:=[2,3,5,7,251,65537];
  lwi:=ListWithIdenticalEntries;;

Old code:

gap> x:=2^20;; ListX(lwi(10000,x),ps,PValuation);; time;
90
gap> x:=2^1000-1;; ListX(lwi(10000,x),ps,PValuation);; time;
79
gap> x:=2^1000;; ListX(lwi(10000,x),ps,PValuation);; time;
2608
gap> x:=2^1000+1;; ListX(lwi(10000,x),ps,PValuation);; time;
56
gap> x:=2^1000+2^100;; ListX(lwi(10000,x),ps,PValuation);; time;
379

New code:

gap> x:=2^20;; ListX(lwi(10000,x),ps,PValuation);; time;
40
gap> x:=2^1000-1;; ListX(lwi(10000,x),ps,PValuation);; time;
86
gap> x:=2^1000;; ListX(lwi(10000,x),ps,PValuation);; time;
64
gap> x:=2^1000+1;; ListX(lwi(10000,x),ps,PValuation);; time;
71
gap> x:=2^1000+2^100;; ListX(lwi(10000,x),ps,PValuation);; time;
67

Using PVALUATION_INT directly (returns 0 for n=0, and only
supports integer inputs):

gap> x:=2^20;; ListX(lwi(10000,x),ps,PVALUATION_INT);; time;
26
gap> x:=2^1000-1;; ListX(lwi(10000,x),ps,PVALUATION_INT);; time;
67
gap> x:=2^1000;; ListX(lwi(10000,x),ps,PVALUATION_INT);; time;
47
gap> x:=2^1000+1;; ListX(lwi(10000,x),ps,PVALUATION_INT);; time;
53
gap> x:=2^1000+2^100;; ListX(lwi(10000,x),ps,PVALUATION_INT);; time;
49
Copy link
Member

@markuspf markuspf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few small comments/questions.

This PR makes the GMP code less fragile and more readable, which is great.

A general question is whether these changes have any (notable) performance implications?

@@ -799,7 +799,7 @@ DeclareGlobalFunction( "NextPrimeInt" );
##
## <Description>
## returns <M><A>r</A>^{<A>e</A>} \pmod{<A>m</A>}</M> for integers <A>r</A>,
## <A>e</A> and <A>m</A> (<M><A>e</A> \geq 0</M>).
## <A>e</A> and <A>m</A>.
## <P/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it obvious enough that $e \neq 0$?

Negative values seem to work (now) if $r$ is invertible, so that should maybe be documented explicitly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But e can (still) be 0. And negative values were supported before, just not documented.
Documenting this more explicitly is fine by me, but I'd prefer to not do this as part of this PR (which has been in the works for very long by now)...

local a,v;
if not IsPrimeInt(p) or not IsRat(n) then
local v;
if not IsInt(p) or not IsRat(n) or p = 0 then
Error("wrong parameters");
fi;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message could be more helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I hope somebody will make a PR improving it after this PR has been merged :-)

"%Nd", ADDR_INT(op),
(TypGMPSize)SIZE_INT(op));
Char buf[20000];
mpz_t v;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see that we're checking for overflows here anymore. Maybe I am missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean with "checking for overflows". You mean: check whether the buffer was exceeded (and we thus will crash momentarily, for smashing the stack) ? The old code had an assert for that, sure.

But given that LogInt( (2^64)^1000, 10) = 19265 < 20000, that simply cannot happen. Hmm, unless we run this on a 128 bit processor, I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the assert was not even doing anything by default.

Anyway, I'll add a comment and think about whether I can make this more clear / pseudo-robust.


return jac;
end );
InstallGlobalFunction( Jacobi, JACOBI_INT );

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While its great to be able to do all these in GMP, I would quite like to keep GAP functions around as an alternative. The reason is that while implementing GAP in alternative runtimes, the more code one does not have to write in the runtime the better.

I don't know how to do this cleanly right now though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is exactly how I did it before @ChrisJefferson asked me to remove the GAP implementation sigh. It is still around in the .tst files, by the way.

I can add it back here again, but I don't want to rewrite the code a third time, so I'll wait until at least @ChrisJefferson signs off on it, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caveat: the new Jacobi accepts more inputs than the old (n negative).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the chat, I am happy with moving the function to tests, and I will attempt to make a structured attempt at detecting which functions are delegated to C code (and collect/provide GAP implementations where appropriate).

#else
#define CHECK_FAKEMPZ(fake) do { } while(0);
#endif

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Also, ensure it works correctly on 128bit systems ;-)
This is largely based on the old src/integer.c file, with some
edits to make it valid for non-32 bit systems.
@markuspf markuspf merged commit be7677f into gap-system:master Jan 23, 2017
@fingolfin fingolfin deleted the mh/gmp2 branch January 23, 2017 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants