Refactor preprocessing #1777

APJansen · 2023-07-17T11:18:57Z

This PR simplifies the preprocessing layer, improving readability without changing any results. I think these changes are harmless and uncontroversial and can easily be merged.

Additional changes proposed

However my goal was a slightly bigger change, that may cause issues so I put the last commit in a different branch.
What this does is unite all the preprocessing factors into a single alpha vector and a single beta vector, rather than having tons of scalars. This layer is the only place in the model where flavors are treated individually. Each parameter has flavor-dependent min and max values, that go into both the initializer and a constraint, but this can be done as a vector as well.

The reason this may be controversial is that it doesn't allow setting weights as trainable on a flavor-by-flavor basis, only all alphas and/or all betas. I don't see why this would be necessary, but I think I did see a runcard that does this.

Timing

I did some timing tests as well, creating a model with only the preprocessing layer and training it on random targets for 10_000 epochs:

branch	time (s)
master	38
refactor_preprocessing	36
prepro_join_weights	24

The timing script is something like:

    input = Input(shape=(200, 1), batch_size=1)
    prepro = Preprocessing(flav_info=flav_info, seed=0)
    output = prepro(input)
    model = Model(inputs=input, outputs=output)
    test_x = tf.random.uniform(shape=(1, 200, 1), seed=42)
    test_y = tf.random.uniform(shape=(1, 200, 8), seed=43)

    model.compile(loss="mse", optimizer='adam')

    start = datetime.now()
    model.fit(test_x, test_y, epochs=10_000)
    end = datetime.now()
    diff = end - start

checks

refactor_preprocessing: Regression test passes, so identical results (note though, even when specifying a seed, the initialization depends on the tensorflow version, the regression test passes on Snellius but not on my laptop)
prepro_join_weights: initialization is different when the weights are a vector, so the regression test needs to be updated. But I checked that manually setting the weights to what they were in master, the results are still the same. Other regression tests and the structure of saved models will need to be changed as well. I'll do that only if these changes are approved.

…oadcasting

…efault :q

scarlehoff · 2023-07-17T11:28:17Z

The reason this may be controversial is that it doesn't allow setting weights as trainable on a flavor-by-flavor basis, only all alphas and/or all betas. I don't see why this would be necessary

Please don't remove options that currently exist. Actually one of the basic runcards does exactly that. I'm ok with the changes but I would like to keep that option.

Radonirinaunimi · 2023-07-17T12:36:08Z

This will also remove the possibility of having a fixed-functional form (the one we briefly talked about at some point) that @peterkrack is working on.

APJansen · 2023-07-17T13:06:22Z

@Radonirinaunimi Why exactly? You mean only the changes in the separate prepro_join_weights branch?

Radonirinaunimi · 2023-07-18T08:26:19Z

@Radonirinaunimi Why exactly? You mean only the changes in the separate prepro_join_weights branch?

Sorry if I was unclear but my comment refers to the changes you propose here:

The reason this may be controversial is that it doesn't allow setting weights as trainable on a flavor-by-flavor basis, only all alphas and/or all betas. I don't see why this would be necessary, but I think I did see a runcard that does this.

(If I am not mistaken), having trainable weights on flavor-by-flavor basis is exactly what is required for the fixed functional form. Therefore if you restrict that option, this may no longer be possible.

APJansen · 2023-07-18T08:58:20Z

Ok, so two good reasons to keep this option. Keeping this option and having the weights as vectors will probably make things more complicated rather than simpler, so let's discard the prepro_join_weights branch then.
Are the changes in this branch ok?

RoyStegeman

Thanks @APJansen, this looks good! For as far as I'm concerned this can be merged after you've addressed a few minor points.

n3fit/src/n3fit/layers/preprocessing.py

n3fit/src/n3fit/tests/test_preprocessing.py

APJansen · 2023-07-28T08:45:44Z

@RoyStegeman As you saw I was having some issues with the test, thanks for fixing it. Did you understand what was going wrong? It was passing for me locally but not in the CI after incorporating your comment.

RoyStegeman · 2023-07-28T13:20:34Z

To be honest I don't know, for me this passes locally (of course, since I also generated the test_prefactors locally). The numpy version shouldn't make a difference for the rng so assuming you used the same seeds I find it actualy quite surprising that you found different numbers. Does this still fail locally for you now?

github-actions · 2023-07-28T15:51:22Z

Greetings from your nice fit 🤖 !
I have good news for you, I just finished my tasks:

Fit Name: NNBOT-20d3c6885-2023-07-28
Fit Report: https://vp.nnpdf.science/3Fbp6t_OTq2GFpWtIs9oXg==
Fit Data: https://data.nnpdf.science/fits/NNBOT-20d3c6885-2023-07-28.tar.gz

Check the report carefully, and please buy me a ☕ , or better, a GPU 😉!

scarlehoff

Please check that I didn't made any typos in the suggestion before accepting any of them ¡!

RE the differences, by changing to using np.testing we'll have extra information on the failures, it might have just been numerics.

n3fit/src/n3fit/layers/preprocessing.py

n3fit/src/n3fit/tests/test_preprocessing.py

Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>

APJansen · 2023-08-07T12:15:41Z

@RoyStegeman It was passing locally for me before your change, and failing after. I think it's because of tensorflow, I had this issue before. The tensorflow version I have locally doesn't handle seeds properly. i.e. initializing a layer with say seed=42 will be consistent if you run it multiple times, but if you do it twice, initialize two layers with the same seed, they will actually be different.

Now I changed the seed at Juan's suggestion, which I agree is better, but it will break the tests of course, and if I update the numbers it will probably again pass for me locally but fail in the CI, so could you change them again? Sorry about that.

RoyStegeman · 2023-08-07T12:32:07Z

No problem!

APJansen added 9 commits July 17, 2023 09:02

Add preprocessing regression test

e0f0035

Change kernel to alphas and betas in preprocessing

9cc1907

Simplify preprocessing call by stacking alphas and betas and using br…

0571023

…oadcasting

Improve documentation

5f34bc9

Simplify notation

9dc338f

Remove initializer as argument, as it is never set differently from d…

5f0801a

…efault :q

Add type hints

daf1788

Remove unused output_dim attribute

f856911

Uniformize namings with builder_helper

4998185

APJansen added Refactoring n3fit Issues and PRs related to n3fit labels Jul 17, 2023

APJansen requested review from scarlehoff and RoyStegeman July 17, 2023 11:18

This was referenced Jul 18, 2023

Refactor rotations #1780

Merged

Multi Replica PDF #1782

Closed

RoyStegeman approved these changes Jul 26, 2023

View reviewed changes

APJansen and others added 8 commits July 27, 2023 13:11

Black, isort

efa28fa

Remove self.initializer

1f1a829

Remove tensorflow from test

387d304

Uncomment trainable

920af7f

Merge branch 'master' into refactor_preprocessing

b55fb8d

Remove main function from test

6c58dcc

Convert test input to tensor

746df3f

fix preproc test

351d5c4

scarlehoff added the run-fit-bot Starts fit bot from a PR. label Jul 28, 2023

scarlehoff removed the run-fit-bot Starts fit bot from a PR. label Jul 28, 2023

scarlehoff approved these changes Jul 28, 2023

View reviewed changes

n3fit/src/n3fit/layers/preprocessing.py Outdated Show resolved Hide resolved

n3fit/src/n3fit/tests/test_preprocessing.py Outdated Show resolved Hide resolved

n3fit/src/n3fit/tests/test_preprocessing.py Outdated Show resolved Hide resolved

APJansen and others added 3 commits August 7, 2023 14:06

Add Optional typehint

02161d5

Add np.testing

b0e8f26

Set seed to 1 instead of 0

9d9b0c0

Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>

update values test_preprocessing

5b49c83

APJansen merged commit 56b53e8 into master Aug 8, 2023

APJansen deleted the refactor_preprocessing branch August 8, 2023 13:25

APJansen mentioned this pull request Dec 4, 2023

Multi Replica PDF #1880

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor preprocessing #1777

Refactor preprocessing #1777

APJansen commented Jul 17, 2023

scarlehoff commented Jul 17, 2023 •

edited

Loading

Radonirinaunimi commented Jul 17, 2023

APJansen commented Jul 17, 2023

Radonirinaunimi commented Jul 18, 2023

APJansen commented Jul 18, 2023

RoyStegeman left a comment

APJansen commented Jul 28, 2023

RoyStegeman commented Jul 28, 2023

github-actions bot commented Jul 28, 2023

scarlehoff left a comment

APJansen commented Aug 7, 2023

RoyStegeman commented Aug 7, 2023

Refactor preprocessing #1777

Refactor preprocessing #1777

Conversation

APJansen commented Jul 17, 2023

Additional changes proposed

Timing

checks

scarlehoff commented Jul 17, 2023 • edited Loading

Radonirinaunimi commented Jul 17, 2023

APJansen commented Jul 17, 2023

Radonirinaunimi commented Jul 18, 2023

APJansen commented Jul 18, 2023

RoyStegeman left a comment

Choose a reason for hiding this comment

APJansen commented Jul 28, 2023

RoyStegeman commented Jul 28, 2023

github-actions bot commented Jul 28, 2023

scarlehoff left a comment

Choose a reason for hiding this comment

APJansen commented Aug 7, 2023

RoyStegeman commented Aug 7, 2023

scarlehoff commented Jul 17, 2023 •

edited

Loading