Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEP007] Decay and merge isotopic abundance dataframe #757

Merged
merged 17 commits into from
Aug 1, 2017

Conversation

vg3095
Copy link
Contributor

@vg3095 vg3095 commented Jul 4, 2017

I have added a decay method under Model class , which will take time_explosion as parameter , and then decay isotopic abundance frame , and returns merged dataframe .
See this notebook for detailed output
Jupyter Notebook

@vg3095 vg3095 changed the title [TEP007] Decay and merge isotopic abundance dataframe [TEP007] [WIP] Decay and merge isotopic abundance dataframe Jul 4, 2017
@vg3095 vg3095 changed the base branch from master to decay July 4, 2017 07:40
@@ -189,6 +191,33 @@ def abundance(self):
abundance.columns = range(len(abundance.columns))
return abundance

def decay(self, day, normalize=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should happen during the initialization of the model.

The behavior should be:
If there is isotope_abundance as parameter, decay these with time_explosion and merge them with the normal abundances. If we do this, save the normal abundances and the undecayed abundances somewhere on the model (example raw_abundance, raw_isotope_abundance)

@vg3095 vg3095 force-pushed the model_decay branch 2 times, most recently from 8d1b968 to 9073eff Compare July 4, 2017 10:20
isotope_abundance = self.raw_isotope_abundance.decay(time_explosion)

#Set atomic_number as index in isotopic_abundance dataframe
isotope_abundance = isotope_abundance.reset_index().drop(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the many reset_index - not sure that you need those.


#Merge abundance dataframes
modified_df = isotope_abundance.append(self.abundance)
modified_df = modified_df.reset_index().set_index('atomic_number')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. you do not need to make something an index to have groupby work in it.

'mass_number', axis=1).set_index(['atomic_number'])

#Merge abundance dataframes
modified_df = isotope_abundance.append(self.abundance)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you definitely do not want to append - but add the decayed dataframe to the abundances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wkerzendorf I have now used pd.concat() in the latest commit .

  • If you mean pd.add() , then according to what I could find , index values for both Dataframe must be equal, then only result is correct, otherwise, values not present in one dataframe , are filled with NaN values
  • pd.merge() and pd.join() operation results in addition along columns.
    So, I think , it leaves me with pd.concat() , and I have also added a test to ensure if it works correctly.

Copy link
Contributor

@yeganer yeganer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes into the right direction, however I think we should extend the current abundances property to be more flexible so that we don't require to call a decay method.

@@ -189,6 +196,33 @@ def abundance(self):
abundance.columns = range(len(abundance.columns))
return abundance

def decay(self, time_explosion, normalize=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want a decay function at all. Maybe we need a helper function at some point but in general we don't need this.
We don't need functionality to 'redecay' a model because we will always use time_explosion as the time of the decay.

@@ -72,7 +72,14 @@ def __init__(self, velocity, homologous_density, abundance, time_explosion,
self.v_boundary_outer = v_boundary_outer
self.homologous_density = homologous_density
self._abundance = abundance
self.decayed = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed.

self.time_explosion = time_explosion

self.raw_abundance = self._abundance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss and define at some point the names of all abundance variables involved.
There is input: abundance and isotope_abundance (should be attached to model for diagnosis and debugging purposes)
There is output: Model.abundance which should only contain the region defined by the velocity boundaries (as it currently does) AND which I think should contain the merged result if isotope_abundance is passed as an optional argument.
Then we have self._abundance which looks like it's currently the reference to the input (for diagnosis)

So what we basically should decide is, how do we cache the result of the decay (do we even cache it?) and how do we attach the input to the model.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vg3095 I think the decay is extremely quick - maybe just a property would be fine. Actually can you profile how quick?

Copy link
Contributor Author

@vg3095 vg3095 Jul 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wkerzendorf If by profile , meaning time taken by decay function here
I made a dummy dataframe, then used decay on it. (with 2 shells/columns)
It takes around 0.005 seconds for 10 elements and around 0.01 seconds for 100 elements

Copy link
Contributor Author

@vg3095 vg3095 Jul 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wkerzendorf I have now used cProfile module, If you want to know , no. of function calls and time taken for decay method.

(Updated) No of shells/columns - 20

  • For 10 elements
    33189 function calls (32937 primitive calls) in 0.030 seconds

  • For 20 elements
    40899 function calls (40647 primitive calls) in 0.034 seconds

  • For 50 elements
    64029 function calls (63777 primitive calls) in 0.050 seconds

  • For 100 elements
    102579 function calls (102327 primitive calls) in 0.067 seconds

@yeganer
Copy link
Contributor

yeganer commented Jul 4, 2017

I'd suggest to add a as_atomic_numbers or similar to the IsotopeAbundance class which returns the dataframe converted to the format Model.abundances currently is in.

@vg3095
Copy link
Contributor Author

vg3095 commented Jul 4, 2017

As of now, these are the variables associated with abundance

  • raw_abundance : Normal abundance as passed during initialization
  • raw_isotope_abundance : Isotope abundance as passed during initialization (Undecayed)
  • isotope_abundance : Decayed Isotope abundance (using time_explosion)
  • _abundance : If raw_isotope_abundance is present , then it has combined dataframe , else
    normal abundance dataframe.
  • abundance : Contains the region defined by the velocity boundaries. As _abundance is updated(when isotope_abundance is present), changes will reflect here also, when dataframes are merged.

@vg3095
Copy link
Contributor Author

vg3095 commented Jul 4, 2017

Travis fail is due to time out in Mac build .

@yeganer
Copy link
Contributor

yeganer commented Jul 4, 2017

I don't think we need isotope_abundance, right? Because it can be created from raw_isotope_abundance at all times

@@ -189,6 +199,25 @@ def abundance(self):
abundance.columns = range(len(abundance.columns))
return abundance

def as_atomic_numbers(self, normalize=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should be part of IsotopeAbundance


#Merge abundance dataframes
modified_df = pd.concat([isotope_abundance, abundance])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want you to add them here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wkerzendorf For pd.add() , index values for both Dataframe must be equal, then only result is correct, otherwise, index values not present in one dataframe , are filled with NaN values

@@ -63,7 +63,7 @@ class Radial1DModel(HDFWriterMixin):
def __init__(self, velocity, homologous_density, abundance, time_explosion,
t_inner, luminosity_requested=None, t_radiative=None,
dilution_factor=None, v_boundary_inner=None,
v_boundary_outer=None):
v_boundary_outer=None, isotope_abundance=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this up

@@ -73,6 +73,13 @@ def __init__(self, velocity, homologous_density, abundance, time_explosion,
self.homologous_density = homologous_density
self._abundance = abundance
self.time_explosion = time_explosion

self.raw_abundance = self._abundance
self.raw_isotope_abundance = isotope_abundance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is None - you should make an empty frame

@vg3095
Copy link
Contributor Author

vg3095 commented Jul 5, 2017

@wkerzendorf I have changed it . Travis fail is due to timeout in Mac build. I think you will have to re-trigger Mac build, after some time.

@vg3095
Copy link
Contributor Author

vg3095 commented Jul 6, 2017

@wkerzendorf
Is there anything remaining to change in this PR ?

return df

def as_atomic_numbers(self, abundance, t, normalize=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thi should be called - merge_isotopes and should only perform the groupby operation.

"""
#Drop mass_number coloumn in isotopic_abundance dataframe
isotope_abundance = self.decay(t).reset_index(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use groupby to sum up over all mass numbers.

isotope_abundance = self.decay(t).reset_index(
level='mass_number').drop('mass_number', axis=1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then return here. The rest should happen somewhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wkerzendorf the rest means normalizing and returning as model abundance dataframe format should be moved to other separate function (as_atomic_numbers) and only groupby here(merge_isotopes)

@@ -73,6 +73,15 @@ def __init__(self, velocity, homologous_density, abundance, time_explosion,
self.homologous_density = homologous_density
self._abundance = abundance
self.time_explosion = time_explosion

self.raw_abundance = self._abundance
if isotope_abundance is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is None make an empty one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wkerzendorf I have made one , see line 82-84

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use self.raw_isotope_abundance = isotope_abundance or pd.DataFrame()



if hasattr(config.model, 'isotope_abundance'):
isotope_abundance = config.model.isotope_abundance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no that's wrong - we have not decided on a config option and that should also come later!

self.raw_abundance = self._abundance
if isotope_abundance is not None:
self.raw_isotope_abundance = isotope_abundance
self._abundance = self.raw_isotope_abundance.as_atomic_numbers(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add this line to the abundance property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yeganer If I put this line in under abundance property, then this line would be executed every time model.abundance is called , and I think it should only be called 1 time during initialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As your profiling showed, executing this for 100 elements takes less than 0.02 seconds. In my opinion it's okay to call it every time. especially as it should always reflect the current state of the model.

Currently tardis objects are not designed to be changed, nevertheless if one were to change time_explosion on the model, the isotopic abundances have to be recalculated. The easiest way to achieve this is by doing this (very fast) computation every time. I think you can count the number of accesses to model.abundance during a typical tardis run with one hand.

Copy link
Contributor

@yeganer yeganer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think after some small changes this PR is done.
Please add one commit dedicated to PEP8 issues currently in this file. There are tools for commandline and some editors also have plugins for this job.
I know you didn't write a lot of the code that contains style errors but I always try to fix all PEP8 issues whenever I edit a file :)

return self.groupby('atomic_number').sum()

def as_atomic_numbers(self, abundance, normalize=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call this method something like merge_with or only merge as that reflects what this function does, merge Isotopes with a normal DataFrame.
Maybe we can call the other argument other so we can easily distinguish between self and other.

return df

def merge_isotopes(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not happy with the name of this function, maybe we should call it as_atoms? That's short and reflects that the output is only atoms instead of isotopes.

@vg3095
Copy link
Contributor Author

vg3095 commented Jul 7, 2017

@wkerzendorf I think , in last meeting, I could not convey my message properly.

  • As of now as_atoms (Renamed from merge_isotopes , as per suggestion by @yeganer )-> Does only groupby to sum up over all mass numbers.
  • merge (Renamed from as_atomic_numbers)-> takes 2 data-frame , then adds and normalizes it .

I think, you confused with the names , as almost the names are reversed , and thought merge to be merge_isotopes.
If you want me to revert the names back,
(Make as_atoms to merge_isotopes and merge to as_atomic_numbers) , then a confirmation would be nice.

@wkerzendorf
Copy link
Member

@vg3095 this is outdated right?

@vg3095
Copy link
Contributor Author

vg3095 commented Jul 18, 2017

@wkerzendorf Yes, this is outdated. I will do a rebase

@wkerzendorf
Copy link
Member

@vg3095 are you sure that is not in the codebase - as we merged the uniform case?

@vg3095
Copy link
Contributor Author

vg3095 commented Jul 19, 2017

@wkerzendorf I have rebased it now. You can review it.

@unoebauer
Copy link
Contributor

@wkerzendorf, @vg3095 - so, do we still need to merge this or not? I am confused since I thought that all the decay stuff has already been merged.

@vg3095
Copy link
Contributor Author

vg3095 commented Jul 21, 2017

@wkerzendorf @unoebauer Yes, it needs to be merged. This is the only PR related to decay stuff. Rest all were related to config options for isotope.

@vg3095 vg3095 changed the title [TEP007] [WIP] Decay and merge isotopic abundance dataframe [TEP007] Decay and merge isotopic abundance dataframe Jul 28, 2017
@wkerzendorf wkerzendorf merged commit edc4ffa into tardis-sn:decay Aug 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants