Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a basic datacollection simulator #163

Closed
wants to merge 19 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ History
Unreleased / master
-------------------
* ``ispyb.job`` is now less facility-specific and can handle recipe paths via a Zocalo configuration file (`#162 <https://github.com/DiamondLightSource/ispyb-api/pull/162>`_)
* Add a basic data collection simulator ``ispyb.simulate``

6.9.0 (2021-09-16)
------------------
Expand Down
88 changes: 88 additions & 0 deletions conf/simulate_example.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Whether to link or copy data
copy_method: copy

# Map each beamline to a session
sessions:
bl: blc00001-1

# Where to copy raw data from
raw_data: /data/ispyb-test

# Where to write simulated data to, can use {beamline} placeholder
data_dir: /data/tests/{beamline}/simulation

ispyb_url: https://ispyb.diamond.ac.uk

# Define Components (Proteins)
components:
# an internal reference for the component
comp1:
# columns to populate for this component
acronym: Component1
sequence: SiSP

comp2:
acronym: Component2

# Define BLSamples
samples:
# an internal reference for this sample
samp1:
# columns to populate for this sample
name: Sample1
# which component this sample is an instance of (one of the keys in components above)
component: comp1

samp2:
name: Sample2
component: comp2

# Define Experiments (DataCollections)
experiments:
# a shortname for this experiment (available via cli)
energy_scan1:
# the experimentType, must map to a valid type in DataCollectionGroup.experimentType
experimentType: Energy scan
# data will be split into its respective imageDirectory and fileTemplate columns
data: energy_scan/energyscan1.h5
# which sample to link this data collection to (one of the keys in samples above)
sample: samp1

# columns to populate
# thumbnails should have a trailing t, i.e. energy_scan/snapshott.png
xtalSnapshotFullPath1: energy_scan/snapshot.png
numberOfImages: 4001
exposureTime: 1
#energy: 8.8143
wavelength: 1.4065
imageContainerSubPath: 1.1/measurement

xrf_map1:
experimentType: XRF map
data: xrf_map/xrfmap1.h5
sample: samp1

xtalSnapshotFullPath1: xrf_map/snapshot.png
numberOfImages: 1600
exposureTime: 0.03
#energy: 2.4817
wavelength: 4.9959

# additionally populate GridInfo
grid:
steps_x: 40
steps_y: 40
dx_mm: 0.001
dy_mm: 0.001
pixelsPerMicronX: -0.44994
pixelsPerMicrony: -0.46537
snapshot_offsetXPixel: 682.16
snapshot_offsetYPixel: 554

# additionally populate BlSubSample
subsample:
x: 9038007
y: 24467003
x2: 9078007
y2: 24507003
type: roi
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Welcome to the ISPyB API documentation!

installation
usage
simulate
api
contributing
authors
Expand Down
85 changes: 85 additions & 0 deletions docs/simulate.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
==============
ispyb.simulate
==============

`ispyb.simulate` creates a new DataCollection row in the ISPyB database from a simple yaml definition. It creates a data collection, related sample information, and associated shipping entities. It then copies some raw data and associated snapshots (and thumbnails).

Simulate a data collection::

ispyb.simulate <beamline> <experiment>
ispyb.simulate bm23 energy_scan1


The simulator will create hierarchically a component (`Protein`), related `BLSample` (with intermediate `Crystal`), and potentially a `SubSample`, contained within a `Container`, `Dewar`, and `Shipment` belonging to the specified `Proposal` if they do not already exist with the defined name. Then the simulator creates a `DataCollection` and `DataCollectionGroup`, linked to the relevant `BLSample` and `BLSession`. If grid info information is specified it will also create an entry in `GridInfo`

***************
Configuration
***************

The configuration file location is defined via the `ISPYB_SIMULATE_CONFIG` environment variable. An example configuration is available in `conf/simulate.yml`_. The structure and requirements of this file are documented in the example.

Each entry in `experiments` represents a different data collection. The `experimentType` column relates to a `DataCollectionGroup.experimentType` entry so must match one of the available types in the database. See `experimentTypes`_ for a full list.

.. _conf/simulate.yml: https://github.com/DiamondLightSource/ispyb-api/blob/master/conf/simulate_example.yml
.. _experimentTypes: https://github.com/DiamondLightSource/ispyb-database/blob/master/schemas/ispyb/tables.sql#L1930

***************************
Available columns per table
***************************

The ISPyB tables are large, and as such only a subset of the columns are exposed by this simulator, the most pertinent in order to create usable data collections and associated entries. These are as listed below for each table.

Component (Protein)
-------------------

* acronym
* name
* sequence
* density
* molecularMass
* description

BLSample
-------------

* name

BLSubSample
-------------

* x
* y
* x2
* y2
* type
stufisher marked this conversation as resolved.
Show resolved Hide resolved

DataCollection
--------------

* imageContainerSubPath
* numberOfImages
* wavelength
* exposureTime
* xtalSnapshotFullPath1-4

GridInfo
-------------

* steps_x
* steps_y
* snapshot_offsetXPixel
* snapshot_offsetYPixel
* dx_mm
* dy_mm
* pixelsPerMicronX
* pixelsPerMicronY

***************
Plugins
***************

The simulator can trigger events before and after the data is copied using the `ispyb.simulator.before_datacollection` and `ispyb.simulator.after_datacollection` entry points. These are passed just the new `DataCollection.dataCollectionId`.

Zocalo
-------------
If zocalo is installed the simulator will also send a message to zocalo before the data is copied, and send another message after the data copy is finished by default triggering the `mimas` recipe.
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ scripts =
console_scripts =
ispyb.job = ispyb.cli.job:main
ispyb.last_data_collections_on = ispyb.cli.last_data_collections_on:main
ispyb.simulate = ispyb.cli.simulate:run
libtbx.dispatcher.script =
ispyb.job = ispyb.job
ispyb.last_data_collections_on = ispyb.last_data_collections_on
Expand Down
68 changes: 68 additions & 0 deletions src/ispyb/cli/simulate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
import argparse
import logging
import os

from ispyb.simulation.datacollection import SimulateDataCollection

try:
import zocalo
import zocalo.configuration
except ModuleNotFoundError:
zocalo = None

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def run():
config_yml = os.getenv("ISPYB_SIMULATE_CONFIG")
if not config_yml:
raise RuntimeError(
"`ISPYB_SIMULATE_CONFIG` environment variable is not defined"
)

try:
sdc = SimulateDataCollection(config_yml)
except AttributeError as e:
exit(f"Simulation Error: {e}")

parser = argparse.ArgumentParser(description="ISPyB simulation tool")
parser.add_argument(
"beamline", help=f"Beamline to run simulation against", choices=sdc.beamlines
)

parser.add_argument(
"experiment", help=f"Experiment to simluate", choices=sdc.experiments
)

parser.add_argument(
"--delay",
default=5,
type=int,
dest="delay",
help="Delay between plugin start and end events",
)
parser.add_argument(
"--debug",
action="store_true",
help="Enable debug output",
)

if zocalo:
zc = zocalo.configuration.from_file()
zc.activate()
zc.add_command_line_options(parser)

args = parser.parse_args()

root = logging.getLogger()
root.setLevel(level=logging.DEBUG if args.debug else logging.INFO)

try:
sdc.do_run(args.beamline, args.experiment, delay=args.delay)
except Exception as e:
if args.debug:
logger.exception("Simulation Error")
print(e)
else:
print(f"Simulation Error: {e}")
Empty file.
66 changes: 66 additions & 0 deletions src/ispyb/simulation/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import configparser
import os
from abc import ABC, abstractmethod
import logging
import pkg_resources

import sqlalchemy
from sqlalchemy.orm import sessionmaker
import ispyb.sqlalchemy as isa
import yaml


logger = logging.getLogger(__name__)


def load_config(config_yml):
if not os.path.exists(config_yml):
raise RuntimeError(f"Cannot find config file: {config_yml}")

with open(config_yml, "r") as stream:
return yaml.safe_load(stream)


class Simulation(ABC):
def __init__(self, config_yml):
self._config = load_config(config_yml)

@property
def config(self):
return self._config

@property
def session(self):
config = configparser.RawConfigParser(allow_no_value=True)
config.read(os.environ["ISPYB_CREDENTIALS"])
url = isa.url(credentials=dict(config.items("ispyb_sqlalchemy")))
return sessionmaker(
bind=sqlalchemy.create_engine(url, connect_args={"use_pure": True})
)

@property
def beamlines(self):
return list(self.config["sessions"].keys())

def before_start(self, dcid):
for entry in pkg_resources.iter_entry_points(
"ispyb.simulator.before_datacollection"
):
fn = entry.load()
logger.info(f"Executing before start plugin `{entry.name}`")
fn(dcid)

def after_end(self, dcid):
for entry in pkg_resources.iter_entry_points(
"ispyb.simulator.after_datacollection"
):
fn = entry.load()
logger.info(f"Executing after end plugin `{entry.name}`")
fn(dcid)

def do_run(self, *args, **kwargs):
self.run(*args, **kwargs)

@abstractmethod
def run(self, *args, **kwargs):
pass
Comment on lines +64 to +66
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need an abstract class for something that only has a single implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that in future you might want to simulate something other than a data collection. There are plenty of things populated by the acquisition client that are not handled by ispyb-api. For example a RobotAction is not a DataCollection, nor is an XFEFluoresenceSpectrum. Would need a bit of tinkering on the arg parser to select the relevant class but should be flexible enough

Loading