From 2b712d4147303fc2f6448e96c5bda335e7b42e00 Mon Sep 17 00:00:00 2001 From: Henning Janssen Date: Thu, 11 Feb 2021 10:09:06 +0000 Subject: [PATCH] Adapt fleur_parser_tutorial to new schema_dict interface --- .../3_working_with_schema_dicts.ipynb | 114 +++++++----------- ...g_schema_dict_util_parsing_functions.ipynb | 45 +++---- 2 files changed, 65 insertions(+), 94 deletions(-) diff --git a/tutorials/fleur_parser_tutorial_01_2020/3_working_with_schema_dicts.ipynb b/tutorials/fleur_parser_tutorial_01_2020/3_working_with_schema_dicts.ipynb index 3ba2d31..5d29e63 100644 --- a/tutorials/fleur_parser_tutorial_01_2020/3_working_with_schema_dicts.ipynb +++ b/tutorials/fleur_parser_tutorial_01_2020/3_working_with_schema_dicts.ipynb @@ -2,70 +2,53 @@ "cells": [ { "cell_type": "markdown", - "id": "northern-builder", + "id": "dried-attribute", "metadata": {}, "source": [ "## Schema dictionaries\n", "\n", "The basis of the input/output parsers is a set of functions, which extract type, order and various other information from the `FleurInputSchema.xsd` and `FleurOutputSchema.xsd` files for different versions. The obtained information is stored in large dictionaries next to the schema files.\n", "\n", - "To load the information we use the `load_inpschema` and `load_outschema` functions by providing the desired version string. They both work in the same way but are needed, since the outputschema implicitly includes information from the inputschema" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "agricultural-approach", - "metadata": {}, - "outputs": [], - "source": [ - "from masci_tools.io.parsers.fleur.fleur_schema import load_inpschema\n", - "load_inpschema?\n", + "To load the information we use the `InputSchemaDict` and `OutputSchemaDict` classes by providing the desired version string to the `fromVersion` classmethod. They both work in the same way but are needed, since the outputschema implicitly includes information from the inputschema.\n", "\n", - "schema_dict = load_inpschema('0.33')\n", - "print(schema_dict.keys())" - ] - }, - { - "cell_type": "markdown", - "id": "separated-persian", - "metadata": {}, - "source": [ - "If we also want a python object to validate files against this schema we provide the `schema_return` argument" + "These classes not only contain small helper functions but are also immutable, i.e. trying to modify the parsed information will raise an error." ] }, { "cell_type": "code", "execution_count": null, - "id": "qualified-butler", + "id": "occupational-graduation", "metadata": {}, "outputs": [], "source": [ - "schema_dict, xmlschema = load_inpschema('0.33', schema_return=True)\n", - "print(type(xmlschema))" + "from masci_tools.io.parsers.fleur.fleur_schema import InputSchemaDict\n", + "from pprint import pprint\n", + "help(InputSchemaDict)\n", + "\n", + "schema_dict = InputSchemaDict.fromVersion('0.33')" ] }, { "cell_type": "markdown", - "id": "quality-memorabilia", + "id": "alive-musical", "metadata": {}, "source": [ - "To get an explanation of the keys in the schema dictionary we can pass the `show_help` argument" + "The returned python object also contains a xmlschema object for validating files" ] }, { "cell_type": "code", "execution_count": null, - "id": "romance-north", + "id": "hidden-department", "metadata": {}, "outputs": [], "source": [ - "schema_dict = load_inpschema('0.33', show_help=True)" + "print(type(schema_dict.xmlschema))" ] }, { "cell_type": "markdown", - "id": "processed-registrar", + "id": "reasonable-austria", "metadata": {}, "source": [ "Let's for example take a look at `attrib_types`. Here all attributes are classified for the conversion from the strings we get from the xml file. If there are multiple possible types the conversion function will start at the first type and stop when a conversion was successful (`string` is put in last place at all times)" @@ -74,17 +57,16 @@ { "cell_type": "code", "execution_count": null, - "id": "willing-liberal", + "id": "intended-camera", "metadata": {}, "outputs": [], "source": [ - "from pprint import pprint\n", "pprint(schema_dict['attrib_types'])" ] }, { "cell_type": "markdown", - "id": "intended-webcam", + "id": "alpha-nomination", "metadata": {}, "source": [ "In `tag_paths` all possible names of tags are mapped to possible simple xpaths through the input file" @@ -93,7 +75,7 @@ { "cell_type": "code", "execution_count": null, - "id": "modern-ukraine", + "id": "vanilla-motivation", "metadata": {}, "outputs": [], "source": [ @@ -102,7 +84,7 @@ }, { "cell_type": "markdown", - "id": "driving-bangkok", + "id": "equipped-location", "metadata": {}, "source": [ "There are multiple keys for attributes and text tags (`unique_attribs`, `unique_path_attribs` and `other_attribs`), which classify the attributes in terms of three categories:\n", @@ -115,7 +97,7 @@ { "cell_type": "code", "execution_count": null, - "id": "aging-palmer", + "id": "quick-minimum", "metadata": {}, "outputs": [], "source": [ @@ -125,7 +107,7 @@ { "cell_type": "code", "execution_count": null, - "id": "falling-olive", + "id": "thorough-mouse", "metadata": {}, "outputs": [], "source": [ @@ -134,7 +116,7 @@ }, { "cell_type": "markdown", - "id": "approved-giving", + "id": "respiratory-framing", "metadata": {}, "source": [ "This is useful but it does not provide a utility, to get a path and guarantee that you end up with a unique path. For this there are the functions `get_tag_xpath` and `get_attrib_xpath`. They are used by providing the name of the tag/attribute in question and other criteria to select the right path" @@ -143,27 +125,16 @@ { "cell_type": "code", "execution_count": null, - "id": "personal-suspension", - "metadata": {}, - "outputs": [], - "source": [ - "from masci_tools.util.schema_dict_util import get_tag_xpath\n", - "get_tag_xpath?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "naval-defense", + "id": "guided-dependence", "metadata": {}, "outputs": [], "source": [ - "print(get_tag_xpath(schema_dict, 'bzIntegration'))" + "print(schema_dict.get_tag_xpath('bzIntegration'))" ] }, { "cell_type": "markdown", - "id": "organizational-rainbow", + "id": "supposed-intro", "metadata": {}, "source": [ "If the path is not unique an error is raised and we have to be more specific with the selection" @@ -172,36 +143,36 @@ { "cell_type": "code", "execution_count": null, - "id": "intimate-elephant", + "id": "funny-increase", "metadata": {}, "outputs": [], "source": [ - "print(get_tag_xpath(schema_dict, 'ldaU'))" + "print(schema_dict.get_tag_xpath('ldaU'))" ] }, { "cell_type": "code", "execution_count": null, - "id": "parliamentary-sarah", + "id": "united-methodology", "metadata": {}, "outputs": [], "source": [ - "print(get_tag_xpath(schema_dict, 'ldaU', contains='species'))" + "print(schema_dict.get_tag_xpath('ldaU', contains='species'))" ] }, { "cell_type": "code", "execution_count": null, - "id": "muslim-butterfly", + "id": "oriental-placement", "metadata": {}, "outputs": [], "source": [ - "print(get_tag_xpath(schema_dict, 'ldaU', not_contains='atom'))" + "print(schema_dict.get_tag_xpath('ldaU', not_contains='atom'))" ] }, { "cell_type": "markdown", - "id": "broke-dodge", + "id": "suburban-tampa", "metadata": {}, "source": [ "If there is no possible path to fullfill the criteria the function also raises an error" @@ -210,16 +181,16 @@ { "cell_type": "code", "execution_count": null, - "id": "listed-navigation", + "id": "homeless-shame", "metadata": {}, "outputs": [], "source": [ - "print(get_tag_xpath(schema_dict, 'ldaU', contains='species', not_contains='atom'))" + "print(schema_dict.get_tag_xpath('ldaU', contains='species', not_contains='atom'))" ] }, { "cell_type": "markdown", - "id": "attempted-count", + "id": "adolescent-proxy", "metadata": {}, "source": [ "These functions allow for easy version support between different file versions if the tag names themselves do not change" @@ -228,19 +199,18 @@ { "cell_type": "code", "execution_count": null, - "id": "established-hometown", + "id": "comfortable-surveillance", "metadata": {}, "outputs": [], "source": [ - "from masci_tools.util.schema_dict_util import get_attrib_xpath\n", - "schema_dict_max4 = load_inpschema('0.31')\n", - "print(get_attrib_xpath(schema_dict, 'valenceElectrons'))\n", - "print(get_attrib_xpath(schema_dict_max4, 'valenceElectrons'))" + "schema_dict_max4 = InputSchemaDict.fromVersion('0.31')\n", + "print(schema_dict.get_attrib_xpath('valenceElectrons'))\n", + "print(schema_dict_max4.get_attrib_xpath('valenceElectrons'))" ] }, { "cell_type": "markdown", - "id": "funded-parliament", + "id": "sixth-decimal", "metadata": {}, "source": [ "More detailed information about the attributes and tags, which can be on a given tag can be found in the `tag_info` key. This part of the schema dictionary is indexed by the simple xpaths to avoid name clashes" @@ -249,7 +219,7 @@ { "cell_type": "code", "execution_count": null, - "id": "promotional-microwave", + "id": "pacific-behavior", "metadata": {}, "outputs": [], "source": [ @@ -259,7 +229,7 @@ { "cell_type": "code", "execution_count": null, - "id": "published-haven", + "id": "minus-sharp", "metadata": {}, "outputs": [], "source": [ @@ -269,7 +239,7 @@ { "cell_type": "code", "execution_count": null, - "id": "collectible-burlington", + "id": "grand-disclaimer", "metadata": {}, "outputs": [], "source": [] diff --git a/tutorials/fleur_parser_tutorial_01_2020/4_using_schema_dict_util_parsing_functions.ipynb b/tutorials/fleur_parser_tutorial_01_2020/4_using_schema_dict_util_parsing_functions.ipynb index bdf48a2..4cc2568 100644 --- a/tutorials/fleur_parser_tutorial_01_2020/4_using_schema_dict_util_parsing_functions.ipynb +++ b/tutorials/fleur_parser_tutorial_01_2020/4_using_schema_dict_util_parsing_functions.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "enclosed-shakespeare", + "id": "upset-steel", "metadata": {}, "source": [ "## Using the parsing functions in the `schema_dict_util` module\n", @@ -13,7 +13,7 @@ { "cell_type": "code", "execution_count": null, - "id": "fantastic-essex", + "id": "applicable-machine", "metadata": {}, "outputs": [], "source": [ @@ -22,7 +22,7 @@ }, { "cell_type": "markdown", - "id": "exempt-reliance", + "id": "afraid-check", "metadata": {}, "source": [ "This module contains a lot of different small functions, which make use of the schema dictionary. For example the `get_tag_xpath` and `get_attrib_xpath` functions from tutorial 3 are defined here. Then there is `read_constants`, which determines the mathematical constants which can be used in the input file. The rest of the functions make use of `get_tag_xpath` or `get_attrib_xpath` to gain easy access to small parts of the xml files" @@ -31,7 +31,7 @@ { "cell_type": "code", "execution_count": null, - "id": "married-journalism", + "id": "substantial-allowance", "metadata": {}, "outputs": [], "source": [ @@ -40,7 +40,7 @@ }, { "cell_type": "markdown", - "id": "considerable-uncle", + "id": "comfortable-springer", "metadata": {}, "source": [ "To use these functions, we need to get the root of the xml file and the right schema_dict. For functions actually converting attribute values we also need to read in the defined constants (In addition to defined constants in the `inp.xml` there is also a default set)" @@ -49,22 +49,23 @@ { "cell_type": "code", "execution_count": null, - "id": "immediate-visitor", + "id": "provincial-airport", "metadata": {}, "outputs": [], "source": [ "from lxml import etree\n", - "from masci_tools.io.parsers.fleur.fleur_schema import load_inpschema\n", + "from masci_tools.io.parsers.fleur.fleur_schema import InputSchemaDict\n", "from masci_tools.util.schema_dict_util import read_constants\n", "\n", "root = etree.parse('./files/Fe_Example_input.xml').getroot()\n", - "schema_dict = load_inpschema(root.xpath('//@fleurInputVersion')[0])\n", - "constants = read_constants(root, schema_dict)" + "schema_dict = InputSchemaDict.fromVersion(root.xpath('//@fleurInputVersion')[0])\n", + "constants = read_constants(root, schema_dict)\n", + "print(constants)" ] }, { "cell_type": "markdown", - "id": "lovely-photography", + "id": "functional-motion", "metadata": {}, "source": [ "All parsing functions have the same interface. Let's start with `tag_exists`. This will tell us if a certain tag is present in the file" @@ -73,7 +74,7 @@ { "cell_type": "code", "execution_count": null, - "id": "accompanied-myanmar", + "id": "chemical-latitude", "metadata": {}, "outputs": [], "source": [ @@ -84,7 +85,7 @@ }, { "cell_type": "markdown", - "id": "hindu-engineering", + "id": "living-mailing", "metadata": {}, "source": [ "These functions also take the same arguments as `get_tag_xpath` for specifying the concrete path" @@ -93,7 +94,7 @@ { "cell_type": "code", "execution_count": null, - "id": "strange-refrigerator", + "id": "median-insertion", "metadata": {}, "outputs": [], "source": [ @@ -104,16 +105,16 @@ { "cell_type": "code", "execution_count": null, - "id": "third-pendant", + "id": "devoted-planning", "metadata": {}, "outputs": [], "source": [ - "print(get_number_of_nodes(root, schema_dict, 'ldaU', contains='species'))" + "print(get_number_of_nodes(root, schema_dict, 'lo', contains='species'))" ] }, { "cell_type": "markdown", - "id": "super-deadline", + "id": "incoming-survey", "metadata": {}, "source": [ "The function `evaluate_attribute` also allows to directly specify the tag, where the attribute should be parsed. This makes specifying attributes with common names a lot easier (like `units` in the output file for example)" @@ -122,7 +123,7 @@ { "cell_type": "code", "execution_count": null, - "id": "based-little", + "id": "prospective-tolerance", "metadata": {}, "outputs": [], "source": [ @@ -132,7 +133,7 @@ }, { "cell_type": "markdown", - "id": "southern-complexity", + "id": "sitting-error", "metadata": {}, "source": [ "Another option for specifying, which path is supposed to be parsed is using a Element of the xml tree, that is not the root. For example if we would want to parse the `mtSphere` tag for the atom species the naive approch would throw an error since all tags in the atoms section can occur in `species` or `atomGroup`." @@ -141,7 +142,7 @@ { "cell_type": "code", "execution_count": null, - "id": "hydraulic-reducing", + "id": "compact-destination", "metadata": {}, "outputs": [], "source": [ @@ -151,7 +152,7 @@ }, { "cell_type": "markdown", - "id": "acceptable-intake", + "id": "liked-berkeley", "metadata": {}, "source": [ "One easy way to circumvent this, is to first get the `species` element via the `eval_simple_xpath` function and then performing the same call as before on the `species` element.\n", @@ -163,7 +164,7 @@ { "cell_type": "code", "execution_count": null, - "id": "postal-partition", + "id": "shaped-jimmy", "metadata": {}, "outputs": [], "source": [ @@ -174,7 +175,7 @@ }, { "cell_type": "markdown", - "id": "acoustic-harvey", + "id": "valid-adolescent", "metadata": {}, "source": [] }