Importing ecospold1 processes exported from openLCA #126

sc-gcoste · 2022-05-03T07:15:10Z

I have an ecospold1 dataset extracted from openLCA and I would like to import it into Brightway2. Using the SingleOutputEcospold1Importer should read the ecospold files but apparently something is wrong in the file schema.

Code:

import brightway2 as bw

bw.projects.set_current('importing_ecospold1')
bw.bw2setup()

fp = "path/to/EcoSpold01"
importer = bw.SingleOutputEcospold1Importer(fp, 'database_name', use_mp=False)

Output:

Biosphere database already present!!! No setup is needed
Extracting ecospold1 files:
Traceback (most recent call last):
  File "C:\Users\GustaveCoste\AppData\Roaming\JetBrains\PyCharmCE2022.1\scratches\scratch.py", line 7, in <module>
    importer = bw.SingleOutputEcospold1Importer(fp, 'database_name', use_mp=False)
  File "C:\Users\GustaveCoste\miniconda3\envs\playing_with_brightway\lib\site-packages\bw2io\importers\ecospold1.py", line 73, in __init__
    self.data = extractor.extract(filepath, db_name, use_mp=use_mp)
  File "C:\Users\GustaveCoste\miniconda3\envs\playing_with_brightway\lib\site-packages\bw2io\extractors\ecospold1.py", line 60, in extract
    for x in cls.process_file(filepath, db_name):
  File "C:\Users\GustaveCoste\miniconda3\envs\playing_with_brightway\lib\site-packages\bw2io\extractors\ecospold1.py", line 96, in process_file
    data.append(cls.process_dataset(dataset, filepath, db_name))
  File "C:\Users\GustaveCoste\miniconda3\envs\playing_with_brightway\lib\site-packages\bw2io\extractors\ecospold1.py", line 132, in process_dataset
    dataset.metaInformation.modellingAndValidation, "representativeness"
  File "src/lxml/objectify.pyx", line 234, in lxml.objectify.ObjectifiedElement.__getattr__
  File "src/lxml/objectify.pyx", line 453, in lxml.objectify._lookupChildOrRaise
AttributeError: no such child: {http://www.EcoInvent.org/EcoSpold01}modellingAndValidation

Process from Agribalyse3 imported to EcoSpold1 with openLCA (to unzip and place in the directory searched by the importer)
process_000f29c8-0b4b-32f7-96f7-e0f29530d2fb.zip

NB: When using use_mp=True I get multiple MultiprocessingError inviting to rerun with use_mp=False.

The text was updated successfully, but these errors were encountered:

renaud · 2022-09-06T09:15:13Z

hi @sc-gcoste , did you find a way to import agribalyse? I am facing the same issue... thanks!

sc-gcoste · 2022-09-06T09:26:46Z

Hi @renaud, unfortunately no...

proposed fix for brightway-lca#126

Importing ecospold1 #126

c-foschi · 2024-02-15T13:25:44Z

same problem here

rrosnik · 2024-02-16T19:20:01Z

same problem here

sdlfjal · 2024-03-19T04:16:01Z

same here

DemolaASC5 · 2024-05-07T17:12:11Z

I experienced a similar issue, but not with agribalyse. Would recommend looking through your XML files for any apparent issues from OpenLCA. In my case, some tags were not linked and there were a few empty XML files and tweaking functions is_valid_ecospold1 and process_dataset in extractors/ecospold1.py helped. Here are my updated functions for reference:

``
@classmethod
    def is_valid_ecospold1(cls, dataset):
        try:
            ref_func = dataset.metaInformation.processInformation.referenceFunction
            name = ref_func.get("name").strip()
            unit = ref_func.get("unit")
            categories = [ref_func.get("category"), ref_func.get("subCategory")]
            code = int(dataset.get("number"))
            location = dataset.metaInformation.processInformation.technology.get("text")
            technology = dataset.metaInformation.processInformation.technology.get("text")
            # time_period = getattr2(dataset.metaInformation.processInformation, "timePeriod").get("text")
            production_volume = getattr2(dataset.metaInformation.modellingAndValidation, "representativeness").get("productionVolume")
            # sampling = getattr2(dataset.metaInformation.modellingAndValidation, "representativeness").get("samplingProcedure"),
            # extrapolations = getattr2(dataset.metaInformation.modellingAndValidation, "representativeness").get("extrapolations")
            # uncertainty = getattr2(dataset.metaInformation.modellingAndValidation, "representativeness").get("uncertaintyAdjustments")
            # Checking exchanges 
            for exc in dataset.flowData.iterchildren():
                if exc.tag == "comment":
                    continue
                if exc.tag in ("{http://www.EcoInvent.org/EcoSpold01}exchange", "exchange"):
                    if hasattr(exc, "outputGroup"):
                        if exc.outputGroup.text in {"0", "2", "3"}:
                            pass
                        elif exc.outputGroup.text == "1":
                            pass
                        elif exc.outputGroup.text == "4":
                            pass
                        else:
                            raise ValueError(
                                "Can't understand output group {}".format(exc.outputGroup.text)
                            )
                    else:
                        if exc.inputGroup.text in {"1", "2", "3", "5"}:
                            kind = "technosphere"
                        elif exc.inputGroup.text == "4":
                            kind = "biosphere"  # Resources
                        else:
                            raise ValueError(
                                "Can't understand input group {}".format(exc.inputGroup.text)
                            )
                elif exc.tag in (
                    "{http://www.EcoInvent.org/EcoSpold01}allocation",
                    "allocation",
                ):
                    reference = int(exc.get("referenceToCoProduct")),
                    fraction  = float(exc.get("fraction")),
                    exchanges = [int(c.text) for c in exc.iterchildren() if c.tag != "comment"],
                else:
                    raise ValueError("Flow data type %s not understood" % exc.tag)                       
            return True
        except Exception as e: 
            print(f"Error message: {e}")
            return False
        # except AttributeError:
        #     return False

    @classmethod
    def process_dataset(cls, dataset, filename, db_name):
        ref_func = dataset.metaInformation.processInformation.referenceFunction
        def get_comment():
            try: 
                comments = [
                    ref_func.get("generalComment"),
                    ref_func.get("includedProcesses"),
                    (
                        "Location: ",
                        dataset.metaInformation.processInformation.geography.get("text"),
                    ),
                    (
                        "Technology: ",
                        dataset.metaInformation.processInformation.technology.get("text"),
                    ),
                    (
                        "Time period: ",
                        getattr2(dataset.metaInformation.processInformation, "timePeriod").get(
                            "text"
                        ),
                    ),
                    (
                        "Production volume: ",
                        getattr2(
                            dataset.metaInformation.modellingAndValidation, "representativeness"
                        ).get("productionVolume"),
                    ),
                    (
                        "Sampling: ",
                        getattr2(
                            dataset.metaInformation.modellingAndValidation, "representativeness"
                        ).get("samplingProcedure"),
                    ),
                    (
                        "Extrapolations: ",
                        getattr2(
                            dataset.metaInformation.modellingAndValidation, "representativeness"
                        ).get("extrapolations"),
                    ),
                    (
                        "Uncertainty: ",
                        getattr2(
                            dataset.metaInformation.modellingAndValidation, "representativeness"
                        ).get("uncertaintyAdjustments"),
                    ),
                ]
                comment = "\n".join(
                    [
                        (" ".join(x) if isinstance(x, tuple) else x)
                        for x in comments
                        if (x[1] if isinstance(x, tuple) else x)
                    ]
                )
                return comment
            except: 
                return []

        def get_authors():
            try: 
                ai = dataset.metaInformation.administrativeInformation
                data_entry = []
                for elem in ai.iterchildren():
                    if "dataEntryBy" in elem.tag:
                        data_entry.append(elem.get("person"))

                fields = [
                    ("address", "address"),
                    ("company", "companyCode"),
                    ("country", "countryCode"),
                    ("email", "email"),
                    ("name", "name"),
                ]

                authors = []
                for elem in ai.iterchildren():
                    if "person" in elem.tag and elem.get("number") in data_entry:
                        authors.append({label: elem.get(code) for label, code in fields})
                return authors
            except: 
                return []

        data = {
            "categories": [ref_func.get("category"), ref_func.get("subCategory")],
            "code": int(dataset.get("number")),
            "comment": get_comment(),
            "authors": get_authors(),
            "database": db_name,
            "exchanges": cls.process_exchanges(dataset),
            "filename": filename,
            "location": dataset.metaInformation.processInformation.geography.get(
                "location"
            ),
            "name": ref_func.get("name").strip(),
            "type": "process",
            "unit": ref_func.get("unit"),
        }
        try: 
            allocation_exchanges = [
                exc for exc in data["exchanges"] if exc.get("reference")
            ]
        except: 
            allocation_exchanges = []

        if allocation_exchanges != []:
            data["allocations"] = allocation_exchanges
            data["exchanges"] = [exc for exc in data["exchanges"] if exc.get("type")]

        return data
``

Hope this helps!

cmutel · 2024-05-07T18:53:40Z

Dear everyone, apologies for not seeing this or responding earlier. We just merged new ecospold1 handling which does a complete import of all ecospold 1 attributes, including all the annoying paperwork ones. This uses pyecospold, which requires validation against the XSD schema files. Unfortunately, the file that @sc-gcoste uploaded is not a valid ecospold1 file. You can check this yourself in a venv with pyecospold and xmlschema attached. Running the following:

import xmlschema
import pyecospold
from pathlib import Path

xsd = Path(pyecospold.__file__).parent / "schemas" / "v1" / "EcoSpold01Dataset.xsd"


def get_validation_errors(xml_file: Path, xsd_file: Path):
    schema = xmlschema.XMLSchema(xsd_file)
    validation_error_iterator = schema.iter_errors(open(xml_file).read())
    for idx, validation_error in enumerate(validation_error_iterator, start=1):
        print(f'[{idx}]\n\tpath: {validation_error.path}\n\treason: {validation_error.reason}')

get_validation_errors(
    "process_000f29c8-0b4b-32f7-96f7-e0f29530d2fb.xml",
    xsd
)

Gives the following errors:

/ecoSpold/dataset/metaInformation/processInformation/timePeriod: The content of element '{http://www.EcoInvent.org/EcoSpold01}timePeriod' is not complete. Tag ('{http://www.EcoInvent.org/EcoSpold01}startYear' | '{http://www.EcoInvent.org/EcoSpold01}startYearMonth' | '{http://www.EcoInvent.org/EcoSpold01}startDate') expected.
/ecoSpold/dataset/metaInformation/administrativeInformation/dataGeneratorAndPublication: attribute person='0': value has to be greater or equal than 1
/ecoSpold/dataset/metaInformation/administrativeInformation: Unexpected child with tag 'es:dataGeneratorAndPublication' at position 1. Tag 'es:dataEntryBy' expected.
/ecoSpold/dataset/metaInformation: Unexpected child with tag 'es:administrativeInformation' at position 2. Tag 'es:modellingAndValidation' expected.
/ecoSpold/dataset: value (0,) not found for XsdUnique(name='es:pkPersonNumber')

My inclination is to not support files which are very invalid - it would mean writing much more complicated code and would also make testing quite difficult. Note that openLCA is the only ones publishing invalid ecospold 1/2 files - even the big boys do it sometimes. However, we can make adjustments to the schema if there is a good reason. You can find the ecospold1 schema here, and the changes we have made to that schema here.

@tngTUDOR @jsvgoncalves FYI and feel free to express your opinion.
@msrocka FYI

msrocka · 2024-05-08T05:57:10Z

It is not so visible in the user interface, but in the EcoSpold 1 export wizard in openLCA there is a second page when you click Next and there you can set the option Create default values for missing fields:

When I import the attached example dataset above and export it again with this option, it will generate a default start- and end-date:

<timePeriod dataValidForEntirePeriod="true" text="Unspecified">
  <startDate>9999-01-01+01:00</startDate>
  <endDate>9999-12-31+01:00</endDate>
</timePeriod>

and also a default person which is then linked as data generator etc.:

<administrativeInformation>
  <dataEntryBy person="1"/>
<dataGeneratorAndPublication
  person="1"
  dataPublishedIn="0"
  copyright="true"
  accessRestrictedTo="0"/>
<person
  number="1"
  name="default"
  address="Created for EcoSpold 1 compatibility"
  telephone="000"
  companyCode="default"
  countryCode="CH"/>
</administrativeInformation>

edit: I think the dataset is then valid against the updated schema of pyecospold. However, it maybe would make sense to make also other elements in the schema optional.

cmutel · 2024-05-08T07:23:31Z

Thanks a lot @msrocka! It might make sense to have that field checked by default - I am not sure what the specific business stories are to emit data which doesn't validate against the schema, but probably the default should be a valid file, even if some data is not usable.

jsvgoncalves · 2024-05-08T07:54:16Z

My inclination is to not support files which are very invalid - it would mean writing much more complicated code and would also make testing quite difficult. Note that openLCA is the only ones publishing invalid ecospold 1/2 files - even the big boys do it sometimes.

Agreed, would not try to fix very invalid files. But maybe we could try improving the error/exception information to make it a bit more obvious that the file is very invalid.

cmutel · 2024-05-08T08:24:33Z

brightway-lca/pyecospold#51

renaud pushed a commit to renaud/brightway2-io that referenced this issue Sep 6, 2022

Importing ecospold1 brightway-lca#126

edcebd1

proposed fix for brightway-lca#126

renaud mentioned this issue Sep 6, 2022

Importing ecospold1 #126 #136

Merged

cmutel added a commit that referenced this issue Aug 12, 2023

Merge pull request #136 from renaud/patch-1

0b379e6

Importing ecospold1 #126

This was referenced May 8, 2024

Add more complete error reporting on invalid files brightway-lca/pyecospold#51

Open

Add more complete error reporting on invalid files brightway-lca/pyilcd#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Importing ecospold1 processes exported from openLCA #126

Importing ecospold1 processes exported from openLCA #126

sc-gcoste commented May 3, 2022

renaud commented Sep 6, 2022

sc-gcoste commented Sep 6, 2022

c-foschi commented Feb 15, 2024

rrosnik commented Feb 16, 2024

sdlfjal commented Mar 19, 2024

DemolaASC5 commented May 7, 2024 •

edited

Loading

cmutel commented May 7, 2024

msrocka commented May 8, 2024 •

edited

Loading

cmutel commented May 8, 2024

jsvgoncalves commented May 8, 2024

cmutel commented May 8, 2024

Importing ecospold1 processes exported from openLCA #126

Importing ecospold1 processes exported from openLCA #126

Comments

sc-gcoste commented May 3, 2022

renaud commented Sep 6, 2022

sc-gcoste commented Sep 6, 2022

c-foschi commented Feb 15, 2024

rrosnik commented Feb 16, 2024

sdlfjal commented Mar 19, 2024

DemolaASC5 commented May 7, 2024 • edited Loading

cmutel commented May 7, 2024

msrocka commented May 8, 2024 • edited Loading

cmutel commented May 8, 2024

jsvgoncalves commented May 8, 2024

cmutel commented May 8, 2024

DemolaASC5 commented May 7, 2024 •

edited

Loading

msrocka commented May 8, 2024 •

edited

Loading