Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalised version of improved fp2 #212

Closed
wants to merge 6 commits into from
Closed

Conversation

jon-chuang
Copy link
Contributor

@jon-chuang jon-chuang commented Feb 6, 2021

Related to #207

The main contribution of this PR is to get rid of multiple redundant functions, and to unify all mul by small const methods into default implementations.

Results vis a vis pre #207:
Screenshot from 2021-02-07 07-51-02

Often, we get speedups for free (for instance for bw6). I'm still investigating why this is.

mnt4_298's residue is G2 basefield non-residue is 17, so it's not worthwhile.


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (master)
  • Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
  • Wrote unit tests
  • Updated relevant documentation in the code
  • Added a relevant changelog entry to the Pending section in CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer

@jon-chuang
Copy link
Contributor Author

@ValarDragon @Pratyush

@jon-chuang
Copy link
Contributor Author

jon-chuang commented Feb 7, 2021

I think to really generalise this, we need to define some way to define multiplications with small integers for fields. Or, we should have some way to indicate that a field element is really small. This should be some compile-time data, to indicate that a particular field element is a small constant.

However, the threshold for how small would be considered small would depend on the number of limbs in the field.

Further, this only makes sense for constant data. So this is sad. However, we can use const generics to achieve this...

Nonetheless, we actually don't need so much generality, so I will investigate tweaking the concrete FpN impls to use more efficient autogenerated versions of mul by residue if we make them available.

@jon-chuang jon-chuang force-pushed the quadfield_autoconst branch 2 times, most recently from 123f692 to 4f711ab Compare February 7, 2021 00:49
@Pratyush
Copy link
Member

Pratyush commented Feb 7, 2021

What about adding a SmallValue associated type to Field?

pub trait Field: Mul<<Self as Field>::SmallValue, Output = Self> {
	...
	type SmallValue: Neg + Mul<Self, Output = Self> + Into<Self>;
}

We can use it as follows:

// Impl for prime fields FpN:
impl Field for FpN {
	type Smallvalue = i8;
}

// Impl for quadratic extension fields:
impl Field for QuadExtField {
	type Smallvalue = Pair<Self::BaseField>;
}

pub trait QuadExtParameters {
	const NONRESIDUE: Pair<Self::BaseField>; // No need for separate NONRESIDUE_SMALL
	// No need for mul_by_nonresidue, because it's subsumed by the trait bounds above
}

pub struct Pair<F: Field> {
	// The coordinates can be either small or full-sized.
	c0: Either<F, F::SmallValue>, 
	c1: Either<F, F::SmallValue>,
}

(Similarly for cubic ext field)

@jon-chuang
Copy link
Contributor Author

jon-chuang commented Feb 7, 2021

Hmm, that was something that crossed my mind, but I wanted something that could be dropped in so that one wouldn't have to rewrite any formulas. However, this seemed too complicated. Do you think your idea might have any advantages over what this PR does?

Hence, I just resorted in the end to defining a specific method, mul_by_i8, where the integer is meant to be constant, or literal, so that llvm can optimise out the branches and loops.

Do you know if there is any methods apart from nonresidue muls where mul_by_i8 could also be deployed? Meaning, places that could benefit from a generic mul_by_small_constant type function?

@jon-chuang
Copy link
Contributor Author

I should probably write some tests for mul_by_i8.

@Pratyush
Copy link
Member

Pratyush commented Feb 7, 2021

The benefit of the foregoing approach is primarily that it enables multiplication by arbitrary small field elements cheaply. For example, one could use it also to replace the mul_by_a methods in SWParameters, and similarly for TEParameters.

It would also reduce some complexity in writing new curves and fields. For example, right now, when implementing a new extension field, one could forget to specialize the mul_by_nonresidue method, resulting in slowdowns that are difficult to trace. By going this way, there's no way to forget. (similarly for mul_by_a)

We could also use it for #209

Overall it would allow us to remove many unseemly hacks from the codebase =)

@jon-chuang
Copy link
Contributor Author

jon-chuang commented Feb 7, 2021

Hmm, but do we have the same idea of what mul_by_small_element entails?

There are two possibilities essentially:

  • the double and add method used in this PR
  • an improvement over montgomery mul can multiply field elements by small numbers which are too large for method 1.

In my mind, it makes sense to define these as two separate functionalities, and one can already create these functions generically in the PrimeField trait or Fp struct, for instance.

@Pratyush
Copy link
Member

Pratyush commented Feb 7, 2021

There are two possibilities essentially:

  • the double and add method used in this PR
  • an improvement over montgomery mul can multiply field elements by small numbers which are too large for method 1.

Why not have a threshold? For small enough numbers, we use custom addition chains, and then after that we use the specialized montgomery mul (which we can add at a later date)

@jon-chuang
Copy link
Contributor Author

jon-chuang commented Feb 7, 2021

Well, determining that threshold is probably not that straightforward. I would rather let the downstream user decide which method they'd like to deploy, or if they'd like to set a threshold.

Btw, regarding mul_by_a, I noticed that

#[inline(always)]
pub fn sub_noborrow(x: &mut [u64; 6], other: &[u64; 6]) -> bool {
    let mut borrow = 0u8;

    unsafe {
        borrow = core::arch::x86_64::_subborrow_u64(borrow, x[0], other[0], &mut x[0]);
        borrow = core::arch::x86_64::_subborrow_u64(borrow, x[1], other[1], &mut x[1]);
        borrow = core::arch::x86_64::_subborrow_u64(borrow, x[2], other[2], &mut x[2]);
        borrow = core::arch::x86_64::_subborrow_u64(borrow, x[3], other[3], &mut x[3]);
        borrow = core::arch::x86_64::_subborrow_u64(borrow, x[4], other[4], &mut x[4]);
        borrow = core::arch::x86_64::_subborrow_u64(borrow, x[5], other[5], &mut x[5]);
    }

    borrow != 0
}

pub fn zero() -> [u64; 6] {[0u64; 6]}

pub fn f(x: &mut [u64; 6]) {
    sub_noborrow(x, &zero());
}

results in non-trivial assembly for f. This means that rust probably can't optimise the zero additions away, sadly.

Hence I will also submit a PR to rewrite some formulas in adding results of mul_by_a, when a is zero.

OK, short weierstrass already does this for the addition formula.

@Pratyush
Copy link
Member

Pratyush commented Feb 7, 2021

Well, determining that threshold is probably not that straightforward. I would rather let the downstream user decide which method they'd like to deploy, or if they'd like to set a threshold.

For the case of PrimeFields we can support that use case by also requiring Mul<i64, Output = Self>, where which uses the montgomery mul, while Mul<i8, Output = Self> uses addition chains, and maybe delegates to the i64 impl at some threshold.

EDIT:

Actually you don't even need separate cases, you just defined Fp::SmallValue = SmallInt, where SmallInt is defined as

pub enum SmallInt {
	UseAdditionChain(i8),
	UseSpecialMontMul(i64),
}

This way the choice is left to the user.

@jon-chuang
Copy link
Contributor Author

jon-chuang commented Feb 7, 2021

You are indeed right that having a faster mod mul for u64, let's say, would already be of great value. However, it unfortunately does not fit within the Montgomery representation framework, for there even small values in biginteger space are large values in montgomery rep space.

Hence, I do not have any straightforward idea on how to take advantage of small values that are not suitable for the double and add chain via mul_by_i8.

I have confirmed some positive results for converting some mul_by_a to use mul_by_i8:
Screenshot from 2021-02-07 14-34-34
Affected functions are double, mul, and deser (subgroup check).

@jon-chuang
Copy link
Contributor Author

Turns out I made a mistake in implementation, by default, I used naive mul_assign with Self::Residue rather than mul_base_field_by_residue. Now, it should be back to before for tower fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants