A Python Wrapper of EMS API. There is also a R wrapper for EMS API. If you are interest in the R version, please visit https://github.com/ge-flight-analytics/Rems. The goal of this project is provide a way to bring EMS data in Python environment via the EMS's RESTful API.
- New work is completed in the
master
branch of this repository and may contain breaking changes. Pull requests should be made to this branch. - Old stable functionality exists in the
v0.2.1
branch and should never change except for minor fixes. - Releases are publicly available on
PyPi
using semantic versioning.
Install from the package index:
pip install emspy
Alternatively, the package can be installed from the git repository or a zip package:
- Download or clone emsPy. If downloaded, unzip the compressed file.
- Go to the folder that you unzipped or git-cloned, where you can find
setup.py
file. - At the folder, open the command prompt window and run
pip install .
For dev-mode, runpip install -e .
The optional proxy setting can be passed to the EMS connection object with the following format:
proxies = {
'http': 'http://{proxy_server_address}:{port}',
'https': 'https://{proxy_server_address}:{port}'
}
from emspy import Connection
c = Connection("efoqa_usrname", "efoqa_password", proxies = proxies, server = "prod")
With optional server
argument, you can select one of the currently available EMS API servers, which are:
- "prod" (default)
- "cluster" (clustered production version)
- "stable" (stable test version)
- "beta"
- "nightly"
For servers hosted locally or in Azure, the server_url argument should be used instead of the server argument. This argument should be of the format "/api". For example, if the server hosting the API is https://abc-api.us.efoqa.com, then the connection object would look like this
from emspy import Connection
c = Connection("usrname", "password", proxies=proxies, server_url="https://abc-api.us.efoqa.com/api")
The following example instantiates a flight-specific query object that will send queries to the EMS 9 system.
from emspy.query import FltQuery
query = FltQuery(c, 'ems9', data_file = 'metadata.db')
where optional data_file
input specifies the SQLite file that will be used to read/write the meta data in the local machine. If there is no file with a specified file name, a new db file will be created. If no file name is passed, it will generate a db file in the default location (emspy/data). If None is specified, no db file will be created.
The FDW Flights database is one of the frequently used databases. In order to select it as your database for querying, you can simply run the following line.
query.set_database("fdw flights")
In EMS system, all databases & data fields are organized in hierarchical tree structures. In order to use a database that is not the FDW Flights, you need to tell the query object where in the EMS DB tree your database is at. You can explore the EMS database tree from EMS Online. The following example specifies the location of one of the Event databases in the DB tree and then set the Event database that you want to use:
query.update_dbtree("fdw", "events", "standard", "p0")
query.set_database("p0: library flight safety events")
These code lines first send queries to find the database-groups path, FDW → APM Events → Standard Library Profiles → P0: Library Flight Safety Events, and then select the "P0: Library Flight Safety Events" database that is located at the specified path. For a complete example, please check on this Gist.
Similar to the databases, the EMS data fields are organized in a tree structure so the steps are almost identical except that you use update_fieldtree(...)
method in order to march through the tree branches.
Before calling the update_fieldtree(...)
, you can call update_preset_fieldtree()
method to load a basic tree with fields belonging to the following field groups:
- Flight Information
- Aircraft Information
- Navigation Information
Let say you have selected the FDW Flights database. The following code lines will query for the meta-data of basic data fields, and then some of the data fields in the Profile 301 in EMS9.
# Let the query object load preset data fields that are frequently used
query.generate_preset_fieldtree()
# Load other data fields that you want to use
query.update_fieldtree("profiles", "standard", "block-cost", "p301",
"measured", "ground operations (before takeoff)")
The update_fieldtree(...)
above queries the meta-data of all measurements located at the path, Profiles → Standard Library Profiles → Block-Cost Model → P301: Block-Cost Model Planned Fuel Setup and Tests → Measured Items →Ground Operations (before takeoff) in EMS Explorer.
Caution: the process of adding a subtree usually requires a very large number of recursive RESTful API calls which take quite a long time. Please try to specify the subtree to as low level as possible to avoid a long processing time.
By default, the update_fieldtree(...)
method will load ALL subfolders of the last item in the path. If there are a lot of subfolders, this can take a long time and is likely not necessary. the keyword arguments exclude_subtrees
and exclude_tree
can be used to prevent all or some of the subtreed to be loaded. To exlude all subtrees set the exclude_subtrees=FALSE
query.update_fieldtree("profiles", "standard", "block-cost", "p301",
"measured", "ground operations (before takeoff)", exclude_subtrees=FALSE)
To only exclude a list of subtrees use exclude_tree=[...]
query.update_fieldtree('Flight Information', exclude_tree=['Processing', 'FlightPulse'])
As you may noticed in the example codes, you can specify a data entity by the string fraction of its full name. The "key words" of the entity name follows this rule:
- Case insensitive
- Keyword can be a single word or multiple consecutive words that are found in the full name string
- Keyword should uniquely specify a single data entity among all children under their parent database group
- Regular expression is not supported
Finally, you can save your the meta-data of the database/data trees for later uses. Once you save it, you can go directly call set_database(...)
without querying the same meta-data for later executions. However, you will have to update trees again if any of the data entities are modified at the EMS-system side.
# This will save the meta-data into demo.db file, in SQLite format
query.save_metadata()
As a next step, you will start make an actual query. The select(...)
method is used to select what will be the columns of the returned data for your query. Following is an example:
query.select("flight date",
"customer id",
"takeoff valid",
"takeoff airport iata code")
The passed data fields must be part of the data fields in your data tree.
To avoid name collisions, it is also possible to pass a tuple into the select()
method with the full path to the field of interest. All but the last elements will represent a folder in the path, and the last element is the field itself. For example, if you want to select the "Takeoff Airport Code" field in the "Navigation Information\Takeoff\Airport" folder. your select statement would look like this
query.select( ('Navigation Information', 'Takeoff', 'Airport', 'Takeoff Airport Code') )
You need to make a separate select call if you want to add a field with aggregation applied.
query.select("P301: duration from first indication of engines running to start",
aggregate="avg")
Supported aggregation functions are:
- avg
- count
- max
- min
- stdev
- sum
- var
You may want to define grouping, which is described in the next section, when you want to apply an aggregation function.
select(...)
method accepts the keywords too, and even a combination of keywords to specify the parent directories of the fields in the data tree. For example, the following keywords are all valid to select "Flight Date (Exact)" for query:
- Search by a consecutive substring. The method returns a match with the shortest field name if there are multiple match.
- Ex) "flight date"
- Search by exact name.
- Ex) "flight date (exact)"
- Field name keyword along with multiple keywords for the names of upstream field groups (i.e., directories).
- Ex) ("flight info", "date (exact)")
Similarly, you can pass the grouping and ordering condition:
query.group_by("flight date",
"customer id",
"takeoff valid",
"takeoff airport iata code")
query.order_by("flight date")
# the ascending order is default. You can pass a descending order by optional input:
# query.order_by("flight date", order="desc")
Currently the following conditional operators are supported with respect to the data field types:
- Number: "==", "!=", "<", "<=", ">", ">="
- Discrete: "==", "!=", "in", "not in" (Filtering condition made with value, not discrete integer key)
- Boolean: "==", "!="
- String: "==", "!=", "in", "not in"
- Datetime: ">=", "<"
Following is the example:
query.filter("'flight date' >= '2016-1-1'")
query.filter("'takeoff valid' == True")
# Discrete field filtering is pretty much the same as string filtering.
query.filter("'customer id' in ['CQH','EVA']")
query.filter("'takeoff airport iata code' == 'KUL'")
The current filter method has the following limitation:
- Single filtering condition for each filter method call
- Filtering conditions are combined only by "AND" relationship
- The field keyword must be at left-hand side of a conditional expression
- No support of NULL value filtering, which is being worked on now
- The datetime condition should be only with the ISO8601 format
You can pass additional attributes supported by EMS query:
# Returns only the distinct rows. Turned on as default
query.distinct(True)
# If you want get top N the rows of the output data in response to the query,
query.get_top(5000)
# This is optional. If you don't set this value, all output data will be returned.
You can check on the resulting JSON string of the translated query using the following method calls.
# Returns JSON string
# print query.in_json()
# View in Python's native Dictionary form
from pprint import pprint # This gives you a prettier print
print("\n")
pprint(query.in_dict())
{'distinct': True,
'filter': {'args': [{'type': 'filter',
'value': {'args': [{'type': 'field',
'value': u'[-hub-][field][[[ems-core][entity-type][foqa-flights]][[ems-core][base-field][flight.exact-date]]]'},
{'type': 'constant',
'value': '2016-1-1'},
{'type': 'constant',
'value': 'Utc'}],
'operator': 'dateTimeOnAfter'}},
{'type': 'filter',
'value': {'args': [{'type': 'field',
'value': u'[-hub-][field][[[ems-core][entity-type][foqa-flights]][[ems-core][base-field][flight.exist-takeoff]]]'}],
'operator': 'isTrue'}},
{'type': 'filter',
'value': {'args': [{'type': 'field',
'value': u'[-hub-][field][[[ems-core][entity-type][foqa-flights]][[ems-fcs][base-field][fdw-flight-extra.customer]]]'},
{'type': 'constant',
'value': 18},
{'type': 'constant',
'value': 11}],
'operator': 'in'}},
{'type': 'filter',
'value': {'args': [{'type': 'field',
'value': u'[-hub-][field][[[ems-core][entity-type][foqa-flights]][[[nav][type-link][airport-takeoff * foqa-flights]]][[nav][base-field][nav-airport.iata-code]]]'},
{'type': 'constant',
'value': 'KUL'}],
'operator': 'equal'}}],
'operator': 'and'},
'format': 'display',
'groupBy': [{'fieldId': u'[-hub-][field][[[ems-core][entity-type][foqa-flights]][[ems-core][base-field][flight.exact-date]]]'},
{'fieldId': u'[-hub-][field][[[ems-core][entity-type][foqa-flights]][[ems-fcs][base-field][fdw-flight-extra.customer]]]'},
{'fieldId': u'[-hub-][field][[[ems-core][entity-type][foqa-flights]][[ems-core][base-field][flight.exist-takeoff]]]'},
{'fieldId': u'[-hub-][field][[[ems-core][entity-type][foqa-flights]][[[nav][type-link][airport-takeoff * foqa-flights]]][[nav][base-field][nav-airport.iata-code]]]'}],
'orderBy': [{'aggregate': 'none',
'fieldId': u'[-hub-][field][[[ems-core][entity-type][foqa-flights]][[ems-core][base-field][flight.exact-date]]]',
'order': 'asc'}],
'select': [{'aggregate': 'none',
'fieldId': u'[-hub-][field][[[ems-core][entity-type][foqa-flights]][[ems-core][base-field][flight.exact-date]]]'},
{'aggregate': 'none',
'fieldId': u'[-hub-][field][[[ems-core][entity-type][foqa-flights]][[ems-fcs][base-field][fdw-flight-extra.customer]]]'},
{'aggregate': 'none',
'fieldId': u'[-hub-][field][[[ems-core][entity-type][foqa-flights]][[ems-core][base-field][flight.exist-takeoff]]]'},
{'aggregate': 'none',
'fieldId': u'[-hub-][field][[[ems-core][entity-type][foqa-flights]][[[nav][type-link][airport-takeoff * foqa-flights]]][[nav][base-field][nav-airport.iata-code]]]'},
{'aggregate': 'avg',
'fieldId': u'[-hub-][field][[[ems-core][entity-type][foqa-flights]][[ems-apm][flight-field][msmt:profile-cbaa5341ca674914a6ceccd6f498bffc:msmt-0d7fe63d6863451a9c663a09fd780985]]]'}],
'top': 5000}
You can finally send the query to the EMS system and get the data. The output data is returned in Pandas' DataFrame object.
df = query.run()
# This will return your data in Pandas dataframe format
EMS API supports two different query executions which are regular and async queries. The regular query has a data size limit for the output data, which is 25000 rows. On the other hand, the async query is able to handle large output data by letting you send repeated requests for mini batches of the large output data.
The run()
method takes care of the repeated async requests for a query whose returning data is expected to be large.
The batch data size for the async request is set 25,000 rows as default (which is the maximum). If you want to change this size,
# Set the batch size as 20,000 rows per request
df = query.run(n_row = 20000)
You can query data of time-series parameters with respect to individual flight records. Below is a simple example code that sends a flight query first in order to retrieve a set of flights and then sends queries to get some of the time-series parameters for each of these flights.
# Flight query with an APM profile. It will return data for 10 flights
fq = FltQuery(c, "ems9", data_file = "demo.db")
fq.set_database("fdw flights")
# If you reuse the meta-data, you don't need to update db/field trees.
fq.select(
"customer id", "flight record", "airframe", "flight date (exact)",
"takeoff airport code", "takeoff airport icao code", "takeoff runway id",
"takeoff airport longitude", "takeoff airport latitude",
"p185: processed date", "p185: oooi pushback hour gmt",
"p185: oooi pushback hour solar local",
"p185: total fuel burned from first indication of engines running to start of takeoff (kg)")
fq.order_by("flight record", order='desc')
fq.get_top(10)
fq.filter("'p185: processing state' == 'Succeeded'")
flt = fq.run()
# === Run time series query for flights ===
tsq = TSeriesQuery(c, "ems9", data_file = "demo.db")
tsq.select(
"baro-corrected altitude",
"airspeed (calibrated; 1 or only)",
"ground speed (best avail)",
"egt (left inbd eng)",
"egt (right inbd eng)",
"N1 (left inbd eng) (%)",
"N1 (right inbd eng) (%)")
# Run querying multiple flights at once. Start time = 0, end time = 15 mins (900 secs) for all flights.
# A better use case is that those start/end times are fed by timepoint measurements of your APM profile.
res_dat = tsq.multi_run(flt, start = [0]*flt.shape[0], end = [15*60]*flt.shape[0])
The inputs to function multi_run(...)
are:
- flt : a vector of Flight Records or flight data in Pandas DataFrame format. The dataframe should have a column of flight records with its column name "Flight Record"
- start: a list-like object defining the starting times (secs) of the timepoints for individual flights. The vector length must be the same as the number of flight records
- end : a list-like object defining the end times (secs) of the timepoints for individual flights. The vector length must be the same as the number of flight records
- timestep: a list-like object defining the size of timesteps in seconds for individual flights. Default is set 1 second. If you set "None", it will use the parameters' own default timesteps. The vector length must be the same as the number of flight records
The output will be Python dictionary object which contains the following data:
- flt_data : Dictionary. Copy of the flight data for each flight
- ts_data : Pandas DataFrame. the time series data for each flight
In case you just want to query for a single flight, run(...)
function will be better suited. Below is an example of time-series querying for a single flight.
res_dat = tsq.run(1901112, start=0, end=900)
This function will return a Pandas DataFrame that contains timepoints from 0 to 900 secs and corresponding values for selected parameters. You can also pass a timestep as an optional argument. Default timestep is set 1.0 sec.
When running a Time-Series query, emapy create a local cache of found parameters to be re-used on later calls. Sometimes this can cause the wrong parameter to be selected if a parameter with a similar name to the one you want is already in the cache. to force emspy to search for parameters from ems every time and ignore the cache, you can add the force_search=True option in the select() method like this
tsq.select(
"baro-corrected altitude",
"airspeed (calibrated; 1 or only)",
"ground speed (best avail)",
force_search=True)
If you want to load all Analytics from a parameters set, you can use the select_from_pset()
method. This will load ALL parametes in the set in the same order that they are in the EMS parameter set. This method requires one input: the path to the parameter set, including the name of the set itself. Folders in the path need to be separated usuing a backslash ("\"). To find the path to the parameter set, look at the "Load" menu in FDV+
tsq.select_from_pset('folder1\folder2\set name')
Note: this method acts similarly to the select()
method, so it will add to the existing list of selected parameters.
Note: this method will not search the EMS systems for parameters, so it will not add these to the local cache for later searching.
You can retrieve a list of physical parameters for a flight by utilizing methods in the Analytic class.
First, instantiate an analytic_query object with your connection (c
) and a system id (1
):
analytic_query = Analytic(c, 1)
Then, call analytic_query.get_physical_parameter_list(fr)
with a valid Flight Record:
physicals = analytic_query.get_physical_parameter_list(flight_id = flight_id)
physicals.sample(3)
looks like:
id | name | description | units | |
---|---|---|---|---|
277 | foobar123 | PARAMETER 1 | Uid: P1\nName: PARAMETER 1. | DEG |
696 | foobar124 | PARAMETER 2 | Uid: P2\nName: PARAMETER 2. | YR |
48 | foobar125 | PARAMETER 3 | Uid: P3\nName: PARAMETER 3. | DEG |
You can also retrieve analytic metadata for a Flight (including for physical parameters):
analytic_id = physicals['id'].iloc[0]
metadata = analytic_query.get_flight_analytic_metadata(analytic_id=analytic_id, flight_id=flight_id)
metadata
looks like:
{
'Display\\Leading Zero': True,
'Parameter\\Name': Foo
}
You can retrieve information for parameter sets (called analytic sets in the API routes) using the AnlyticSet
class.
Make sure you import the AnalyticSet class like this
from emspy.query import AnalyticSet
Next, instantiate an analytic_set object using a connection (c) and a system_id (1)
from emspy.query import AnalyticSet
analytic_set = AnalyticSet(c,1)
Finally use the get_analytic_set()
method to get info for the set of interest. This method requires one input: the path to the parameter set, including the name of the set itself. Folders in the path need to be separated usuing a backslash ("\"). To find the path to the parameter set, look at the "Load" menu in FDV+
analytic_set_df = analytic_set.get_analytic_set('folder1\folder2\set name')
If you want to know what folders and sets are inside a folder, you can use the get_group_content()
method. This takes just the path to the folder you are interested in as the only input.
To search the root folder, do not pass an input or use an empty string ("")
analytic_set.get_group_content("folder1\folder2")
This method returns a dictionary with two keys: 'groups' and 'sets'. groups will have info about the folders, while sets will have info about parameter sets.