Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update mc readers #693

Merged
merged 87 commits into from
May 25, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
99114a1
Remove old MC to dictionary readers
andLaing May 5, 2020
2249a2e
Add function to differentiate old and new file formats
andLaing May 5, 2020
7fc7f04
Modify load_mchits_df to be able to read all formats
andLaing May 5, 2020
db932d3
Add function to cast hits df to dictionary
andLaing May 5, 2020
e3a8cfb
Modify load_mcparticles_df to read all file formats
andLaing May 5, 2020
c0efc57
Add function to get correct sample binning from file
andLaing May 5, 2020
0b673c2
Modify load_mcsensor_response_df to work with both file formats
andLaing May 5, 2020
6d7e37e
Add function to check sensor types present
andLaing May 5, 2020
b51d806
Add Enum class for MC table types
andLaing May 5, 2020
d95c480
Add function to get the MC tables in file
andLaing May 5, 2020
49c1dbf
Add function to get list of events in file
andLaing May 5, 2020
4c9f193
Update imports for mcinfo_io
andLaing May 5, 2020
ad2f821
Add function to load the nexus configuration
andLaing May 5, 2020
28dc2b6
Add function to load the sensor positions table
andLaing May 5, 2020
ad1b2f8
Add function to load the MC generators table
andLaing May 5, 2020
c93ac1d
Add function to load the event mapping table
andLaing May 5, 2020
14d12da
Add function to read all MC tables into a dictionary
andLaing May 5, 2020
78483b3
Add a writer for the MC info to be copied to the output file
andLaing May 5, 2020
43c1a74
Modify mcinfo_io.copy_mc_info to use the new readers/writers
andLaing May 5, 2020
d2dba9b
mcinfo_io_test with tests for new MC readers
andLaing May 5, 2020
dae3c05
New test files for mcinfo_test
andLaing May 13, 2020
9184a3f
Adapt components.copy_mc_info for the new writer
andLaing May 5, 2020
f411298
Change fixture KrMC_true_hits to use DF based readers
andLaing Feb 28, 2020
2b68010
Patch test_voxels_with_no_hits for load_mchits_df
andLaing Feb 28, 2020
5629327
improve test_penthesilea_true_hits_are_correct
andLaing Mar 17, 2020
012b8c7
Change setup for test_voxels_with_no_hits to use cast function
andLaing Mar 18, 2020
74ce3b3
Add database relevant arguments for copy_mc_info to remaining cities
andLaing Mar 26, 2020
aac753f
Adapt diomira_test for new MC info copy
andLaing Mar 27, 2020
21656fb
Adapt irene_test to use new mc readers
andLaing Mar 27, 2020
42ab55a
Adapt penthesilea_test for new MC
andLaing Mar 27, 2020
f3e40c5
Show test_isidora_exact_result only fails because of MC output
andLaing Mar 28, 2020
a6f53f8
Adapt esmeralda_test for new MC
andLaing Mar 28, 2020
8e9ecf4
Adapt beersheba_test to use the modern copy_mc_info code
andLaing Apr 9, 2020
7a051b3
Remove obsolete mcinfo_io.mc_info_writer
andLaing May 5, 2020
7912f3e
Remove mcinfo_io.mc_info_writer specific tests
andLaing May 5, 2020
a0e3454
Remove mc_info_writer test from indexation_test.py
andLaing May 5, 2020
ab4c22e
Improve test for cast_todict comparison to old method
andLaing May 5, 2020
74dfb09
Fix clarifying comments for pandas gymnastics
andLaing May 6, 2020
d253632
Add exception for old format file in get_sensor_types
andLaing May 6, 2020
8c3c5ee
Improve load_mcconfiguration for all file types
andLaing May 8, 2020
7a22f34
Alter get_sensor_binning to use load_mcconfiguration
andLaing May 8, 2020
7761d8e
Added warning to load_mcsensor_positions for oldformat
andLaing May 8, 2020
f4bb031
Remove unnecessary index check and raise in read_mc_tables
andLaing May 12, 2020
f4ffac6
Alter test_copy_mc_info_which_events_out_of_range to new conditions
andLaing May 12, 2020
cd3449c
Remove unnecessary try except from mcinfo_io.copy_mc_info
andLaing May 12, 2020
389b68b
Improve error message in _get_list_of_events_new
andLaing May 12, 2020
6368f81
Add warning and empty df return to get_sensor_binning
andLaing May 12, 2020
ac8640f
Add warning to components.copy_mc_info in case of no MC
andLaing May 12, 2020
2babf44
Check warning in test_copy_mc_info_noMC
andLaing May 12, 2020
9cf35fa
Add old/new format files for 3 types of simulation
andLaing May 12, 2020
b1c5acc
Add test comparing old and new format for same simulation
andLaing May 12, 2020
6d77a01
Add missing column to old format mcparticles
andLaing May 12, 2020
776a148
Change Pmt binning name for old mcconfiguration
andLaing May 12, 2020
e9bbb54
Added protection so empty mc sensor positions columns same
andLaing May 13, 2020
4d2e0c0
Test comparing old and new format for same simulaitons
andLaing May 13, 2020
37b3263
Add function to check for presence of MC group
andLaing May 13, 2020
031bdc6
Check MC group present and give warning if not for copy_mc_info
andLaing May 13, 2020
657d600
Change esmeralda_test to use get_event_numbers_in_file
andLaing May 14, 2020
afffd8f
Replace uses of pandas.testing.assert_frame_equal with IC function
andLaing May 14, 2020
8b50df5
Remove unnecessary import from components.py
andLaing May 14, 2020
3e94e98
Remove unused imports from mcinfo_io
andLaing May 14, 2020
63a53fc
Add MCEventNotFound exception
andLaing May 14, 2020
d60f774
Use MCEventNotFound exception in components.copy_mc_info
andLaing May 14, 2020
34fefac
Change run number for test_empty_events_issue_81
andLaing May 19, 2020
51d9edc
Add str_col_length, columns_to_index keyword arguments in mc_writer
andLaing May 19, 2020
41ef169
Updated mcinfo_io_test names
andLaing May 19, 2020
b636c82
Reset index in load_mcsensor_response_dfold
andLaing May 19, 2020
a495685
Remove unnecessary copies from get_Sensor_binning
andLaing May 19, 2020
3114e8e
Add new file with fake MC group for exception tests
andLaing May 19, 2020
ec0387f
Add test to check KeyError raised if unrecognised MC table
andLaing May 19, 2020
5a7063c
Add tests for get_event_numbers_in_file
andLaing May 19, 2020
1437d3f
Add test for is_oldformat_file
andLaing May 19, 2020
ab03d3c
Add test warning raised without dabtabase in load_mcsensor_positions
andLaing May 19, 2020
07ef509
Add test for load_mcsensor_response_df return_raw=True
andLaing May 19, 2020
906c9f6
Add database arguments to cop_mc_info call in hypathia
andLaing May 19, 2020
5ea7df3
Remove MC table comparison from hypathia_test
andLaing May 19, 2020
22391fa
Remove unused imports from mcinfo_io_test
andLaing May 20, 2020
2b64ef6
Remove unused MCParticle class
andLaing May 20, 2020
17fc750
Remove unused MCParticleInfo class
andLaing May 20, 2020
a39abfc
Remove unused get_mc_info_safe
andLaing May 20, 2020
90009f9
Changed get_sensor_binning tests to use DataFrame comparison
andLaing May 21, 2020
d1aec0e
Move mc_particle_and_hits_nexus_data fixture to mcinfo_io_test
andLaing May 21, 2020
57aa97a
Move mc_sensors_nexus_data fixture to mcinfo_io_test
andLaing May 21, 2020
7997926
Change drop in get_sensor_binning to not use inplace
andLaing May 21, 2020
6b2241d
Add new test files for city exact results
andLaing May 22, 2020
bf37968
Use NEWMC files in cities exact result tests. All tables compared
andLaing May 22, 2020
0daeb46
Protec get_sensor_binning from merged files
andLaing May 22, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 30 additions & 22 deletions invisible_cities/cities/beersheba.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@
from enum import auto

from . components import city
from . components import collect
from . components import copy_mc_info
from . components import print_every
from . components import cdst_from_files

Expand All @@ -44,7 +46,6 @@
from .. reco.deconv_functions import richardson_lucy
from .. reco.deconv_functions import InterpolationMethod

from .. io. mcinfo_io import mc_info_writer
from .. io.run_and_event_io import run_and_event_writer
from .. io. dst_io import df_writer
from .. io. dst_io import load_dst
Expand Down Expand Up @@ -305,7 +306,7 @@ def write_deconv(df):


@city
def beersheba(files_in, file_out, compression, event_range, print_mod, run_number,
def beersheba(files_in, file_out, compression, event_range, print_mod, detector_db, run_number,
deconv_params = dict()):
"""
The city corrects Penthesilea hits energy and extracts topology information.
Expand Down Expand Up @@ -404,28 +405,35 @@ def beersheba(files_in, file_out, compression, event_range, print_mod, run_numbe
event_count_out = fl.spy_count()
events_passed_no_hits = fl.count_filter(bool, args = "cdst_passed_no_hits")

evtnum_collect = collect()

with tb.open_file(file_out, "w", filters = tbl.filters(compression)) as h5out:
# Define writers
write_event_info = fl.sink(run_and_event_writer(h5out), args=("run_number", "event_number", "timestamp"))
write_mc_ = mc_info_writer(h5out) if run_number <= 0 else (lambda *_: None)

write_mc = fl.sink( write_mc_, args = ("mc", "event_number"))
write_deconv = fl.sink( deconv_writer(h5out=h5out), args = "deconv_dst" )
write_summary = fl.sink( summary_writer(h5out=h5out), args = "summary" )
return push(source = cdst_from_files(files_in),
pipe = pipe(fl.slice(*event_range, close_all=True) ,
print_every(print_mod) ,
event_count_in.spy ,
cut_sensors ,
drop_sensors ,
filter_events_no_hits ,
events_passed_no_hits .filter ,
deconvolve_events ,
event_count_out.spy ,
fl.fork(write_mc ,
write_deconv ,
write_summary ,
write_event_info)) ,
result = dict(events_in = event_count_in .future,
events_out = event_count_out .future,
events_pass = events_passed_no_hits.future))
result = push(source = cdst_from_files(files_in),
pipe = pipe(fl.slice(*event_range, close_all=True) ,
print_every(print_mod) ,
event_count_in.spy ,
cut_sensors ,
drop_sensors ,
filter_events_no_hits ,
events_passed_no_hits .filter ,
deconvolve_events ,
event_count_out.spy ,
fl.branch("event_number" ,
evtnum_collect.sink) ,
fl.fork(write_deconv ,
write_summary ,
write_event_info)) ,
result = dict(events_in = event_count_in .future,
events_out = event_count_out .future,
evtnum_list = evtnum_collect .future,
events_pass = events_passed_no_hits.future))

if run_number <= 0:
copy_mc_info(files_in, h5out, result.evtnum_list,
detector_db, run_number)

return result
25 changes: 14 additions & 11 deletions invisible_cities/cities/beersheba_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,6 @@ def test_beersheba_contains_all_tables(deconvolution_config):
beersheba(**conf)
with tb.open_file(PATH_OUT) as h5out:
assert "MC" in h5out.root
assert "MC/extents" in h5out.root
assert "MC/hits" in h5out.root
assert "MC/particles" in h5out.root
assert "DECO/Events" in h5out.root
Expand All @@ -70,39 +69,43 @@ def test_beersheba_contains_all_tables(deconvolution_config):


def test_beersheba_exact_result_joint(ICDATADIR, deconvolution_config):
true_out = os.path.join(ICDATADIR, "test_Xe2nu_NEW_exact_deconvolution_joint.h5")
true_out = os.path.join(ICDATADIR, "test_Xe2nu_NEW_exact_deconvolution_joint.NEWMC.h5")
conf, PATH_OUT = deconvolution_config
beersheba(**conf)

tables = ( "MC/extents" , "MC/hits" , "MC/particles" , "MC/generators",
"DECO/Events" ,
"Summary/Events",
"Run/events" , "Run/runInfo" )
tables = ("DECO/Events" ,
"Summary/Events" ,
"Run/events" , "Run/runInfo" ,
"MC/event_mapping", "MC/generators",
"MC/hits" , "MC/particles")

with tb.open_file(true_out) as true_output_file:
with tb.open_file(PATH_OUT) as output_file:
for table in tables:
assert hasattr(output_file.root, table)
got = getattr( output_file.root, table)
expected = getattr(true_output_file.root, table)
assert_tables_equality(got, expected)


def test_beersheba_exact_result_separate(ICDATADIR, deconvolution_config):
true_out = os.path.join(ICDATADIR, "test_Xe2nu_NEW_exact_deconvolution_separate.h5")
true_out = os.path.join(ICDATADIR, "test_Xe2nu_NEW_exact_deconvolution_separate.NEWMC.h5")
conf, PATH_OUT = deconvolution_config
conf['deconv_params']['deconv_mode' ] = 'separate'
conf['deconv_params']['n_iterations' ] = 50
conf['deconv_params']['n_iterations_g'] = 50
beersheba(**conf)

tables = ( "MC/extents" , "MC/hits" , "MC/particles" , "MC/generators",
"DECO/Events" ,
"Summary/Events",
"Run/events" , "Run/runInfo" )
tables = ("DECO/Events" ,
"Summary/Events" ,
"Run/events" , "Run/runInfo" ,
"MC/event_mapping", "MC/generators",
"MC/hits" , "MC/particles")

with tb.open_file(true_out) as true_output_file:
with tb.open_file(PATH_OUT) as output_file:
for table in tables:
assert hasattr(output_file.root, table)
got = getattr( output_file.root, table)
expected = getattr(true_output_file.root, table)
print(got)
Expand Down
48 changes: 27 additions & 21 deletions invisible_cities/cities/components.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import numpy as np
import pandas as pd
import inspect
import warnings

from .. dataflow import dataflow as fl
from .. dataflow.dataflow import sink
Expand All @@ -29,6 +30,7 @@
from .. evm .pmaps import SiPMCharge
from .. core import system_of_units as units
from .. core .exceptions import XYRecoFail
from .. core .exceptions import MCEventNotFound
from .. core .exceptions import NoInputFiles
from .. core .exceptions import NoOutputFile
from .. core .exceptions import InvalidInputFileStructure
Expand Down Expand Up @@ -165,8 +167,13 @@ def append(l,e):
return fl.reduce(append, initial=[])()


def copy_mc_info(files_in : List[str], h5out: tb.File, event_numbers: List[int]) -> None:
"""Copy to an output file the MC info of a list of selected events.
def copy_mc_info(files_in : List[str],
h5out : tb.File ,
event_numbers: List[int],
db_file : str ,
run_number : int ) -> None:
"""
Copy to an output file the MC info of a list of selected events.

Parameters
----------
Expand All @@ -175,20 +182,27 @@ def copy_mc_info(files_in : List[str], h5out: tb.File, event_numbers: List[int])
file_out : tables.File
The output h5 file.
event_numbers : List[int]
List of event numbers for which the MC info is copied to the output file.
List of event numbers for which the MC info is copied
to the output file.
"""

writer = mcinfo_io.mc_info_writer(h5out)
writer = mcinfo_io.mc_writer(h5out)

copied_events = []
for f in files_in:
with tb.open_file(f, "r") as h5in:
try:
event_numbers_in_file = h5in.root.MC.extents.cols.evt_number[:]
event_numbers_to_copy = list(evt for evt in event_numbers \
if evt in event_numbers_in_file)
mcinfo_io.copy_mc_info(h5in, writer, event_numbers_to_copy)
except tb.exceptions.NoSuchNodeError:
continue
if mcinfo_io.check_mc_present(f):
event_numbers_in_file = mcinfo_io.get_event_numbers_in_file(f)
event_numbers_to_copy = np.intersect1d(event_numbers ,
event_numbers_in_file)
mcinfo_io.copy_mc_info(f, writer, event_numbers_to_copy,
db_file, run_number)
copied_events.extend(event_numbers_to_copy)
else:
warnings.warn(f' File does not contain MC tables.\
Use positve run numbers for data', UserWarning)
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good at the end of file loop to check that all the events are actually found in the files; basically collect all event_numbers_to_copy and make sure that it is the same as event_numbers lits

if len(np.setdiff1d(event_numbers, copied_events)) != 0:
raise MCEventNotFound(f' Some events not found in MC tables')


# TODO: consider caching database
Expand Down Expand Up @@ -230,13 +244,6 @@ def get_sipm_wfs(h5in, wf_type):
else : raise TypeError(f"Invalid WfType: {type(wf_type)}")


def get_mc_info_safe(h5in, run_number):
if run_number <= 0:
try : return mcinfo_io.get_mc_info(h5in)
except tb.exceptions.NoSuchNodeError: pass
return


def get_trigger_info(h5in):
group = h5in.root.Trigger if "Trigger" in h5in.root else ()
trigger_type = group.trigger if "trigger" in group else repeat(None)
Expand Down Expand Up @@ -345,15 +352,14 @@ def cdst_from_files(paths: List[str]) -> Iterator[Dict[str,Union[pd.DataFrame, M
evts, _ = zip(*event_info[:])
bool_mask = np.in1d(evts, cdst_df.event.unique())
event_info = event_info[bool_mask]
mc_info = get_mc_info_safe(h5in, run_number)
except (tb.exceptions.NoSuchNodeError, IndexError):
continue
check_lengths(event_info, cdst_df.event.unique())
for evtinfo in event_info:
event_number, timestamp = evtinfo
yield dict(cdst = cdst_df .loc[cdst_df .event==event_number],
summary = summary_df.loc[summary_df.event==event_number],
mc=mc_info, run_number=run_number,
run_number=run_number,
event_number=event_number, timestamp=timestamp)
# NB, the monte_carlo writer is different from the others:
# it needs to be given the WHOLE TABLE (rather than a
Expand Down
5 changes: 4 additions & 1 deletion invisible_cities/cities/components_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

from pytest import mark
from pytest import raises
from pytest import warns

from .. core.configure import EventRange as ER
from .. core.exceptions import InvalidInputFileStructure
Expand Down Expand Up @@ -173,7 +174,9 @@ def test_copy_mc_info_noMC(ICDATADIR, config_tmpdir):
file_in = os.path.join(ICDATADIR, 'run_2983.h5')
file_out = os.path.join(config_tmpdir, 'dummy_out.h5')
with tb.open_file(file_out, "w") as h5out:
copy_mc_info([file_in], h5out, [])
with warns(UserWarning):
copy_mc_info([file_in], h5out, [], 'new', -6400)


@mark.xfail
def test_copy_mc_info_repeated_event_numbers(ICDATADIR, config_tmpdir):
Expand Down
Loading