Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing ecospold1 processes exported from openLCA #126

Open
sc-gcoste opened this issue May 3, 2022 · 11 comments
Open

Importing ecospold1 processes exported from openLCA #126

sc-gcoste opened this issue May 3, 2022 · 11 comments

Comments

@sc-gcoste
Copy link
Contributor

I have an ecospold1 dataset extracted from openLCA and I would like to import it into Brightway2. Using the SingleOutputEcospold1Importer should read the ecospold files but apparently something is wrong in the file schema.

Code:

import brightway2 as bw

bw.projects.set_current('importing_ecospold1')
bw.bw2setup()

fp = "path/to/EcoSpold01"
importer = bw.SingleOutputEcospold1Importer(fp, 'database_name', use_mp=False)

Output:

Biosphere database already present!!! No setup is needed
Extracting ecospold1 files:
Traceback (most recent call last):
  File "C:\Users\GustaveCoste\AppData\Roaming\JetBrains\PyCharmCE2022.1\scratches\scratch.py", line 7, in <module>
    importer = bw.SingleOutputEcospold1Importer(fp, 'database_name', use_mp=False)
  File "C:\Users\GustaveCoste\miniconda3\envs\playing_with_brightway\lib\site-packages\bw2io\importers\ecospold1.py", line 73, in __init__
    self.data = extractor.extract(filepath, db_name, use_mp=use_mp)
  File "C:\Users\GustaveCoste\miniconda3\envs\playing_with_brightway\lib\site-packages\bw2io\extractors\ecospold1.py", line 60, in extract
    for x in cls.process_file(filepath, db_name):
  File "C:\Users\GustaveCoste\miniconda3\envs\playing_with_brightway\lib\site-packages\bw2io\extractors\ecospold1.py", line 96, in process_file
    data.append(cls.process_dataset(dataset, filepath, db_name))
  File "C:\Users\GustaveCoste\miniconda3\envs\playing_with_brightway\lib\site-packages\bw2io\extractors\ecospold1.py", line 132, in process_dataset
    dataset.metaInformation.modellingAndValidation, "representativeness"
  File "src/lxml/objectify.pyx", line 234, in lxml.objectify.ObjectifiedElement.__getattr__
  File "src/lxml/objectify.pyx", line 453, in lxml.objectify._lookupChildOrRaise
AttributeError: no such child: {http://www.EcoInvent.org/EcoSpold01}modellingAndValidation

Process from Agribalyse3 imported to EcoSpold1 with openLCA (to unzip and place in the directory searched by the importer)
process_000f29c8-0b4b-32f7-96f7-e0f29530d2fb.zip

NB: When using use_mp=True I get multiple MultiprocessingError inviting to rerun with use_mp=False.

@renaud
Copy link

renaud commented Sep 6, 2022

hi @sc-gcoste , did you find a way to import agribalyse? I am facing the same issue... thanks!

@sc-gcoste
Copy link
Contributor Author

Hi @renaud, unfortunately no...

renaud pushed a commit to renaud/brightway2-io that referenced this issue Sep 6, 2022
cmutel added a commit that referenced this issue Aug 12, 2023
@c-foschi
Copy link

same problem here

1 similar comment
@rrosnik
Copy link

rrosnik commented Feb 16, 2024

same problem here

@sdlfjal
Copy link

sdlfjal commented Mar 19, 2024

same here

@DemolaASC5
Copy link

DemolaASC5 commented May 7, 2024

I experienced a similar issue, but not with agribalyse. Would recommend looking through your XML files for any apparent issues from OpenLCA. In my case, some tags were not linked and there were a few empty XML files and tweaking functions is_valid_ecospold1 and process_dataset in extractors/ecospold1.py helped. Here are my updated functions for reference:

``
@classmethod
    def is_valid_ecospold1(cls, dataset):
        try:
            ref_func = dataset.metaInformation.processInformation.referenceFunction
            name = ref_func.get("name").strip()
            unit = ref_func.get("unit")
            categories = [ref_func.get("category"), ref_func.get("subCategory")]
            code = int(dataset.get("number"))
            location = dataset.metaInformation.processInformation.technology.get("text")
            technology = dataset.metaInformation.processInformation.technology.get("text")
            # time_period = getattr2(dataset.metaInformation.processInformation, "timePeriod").get("text")
            production_volume = getattr2(dataset.metaInformation.modellingAndValidation, "representativeness").get("productionVolume")
            # sampling = getattr2(dataset.metaInformation.modellingAndValidation, "representativeness").get("samplingProcedure"),
            # extrapolations = getattr2(dataset.metaInformation.modellingAndValidation, "representativeness").get("extrapolations")
            # uncertainty = getattr2(dataset.metaInformation.modellingAndValidation, "representativeness").get("uncertaintyAdjustments")
            # Checking exchanges 
            for exc in dataset.flowData.iterchildren():
                if exc.tag == "comment":
                    continue
                if exc.tag in ("{http://www.EcoInvent.org/EcoSpold01}exchange", "exchange"):
                    if hasattr(exc, "outputGroup"):
                        if exc.outputGroup.text in {"0", "2", "3"}:
                            pass
                        elif exc.outputGroup.text == "1":
                            pass
                        elif exc.outputGroup.text == "4":
                            pass
                        else:
                            raise ValueError(
                                "Can't understand output group {}".format(exc.outputGroup.text)
                            )
                    else:
                        if exc.inputGroup.text in {"1", "2", "3", "5"}:
                            kind = "technosphere"
                        elif exc.inputGroup.text == "4":
                            kind = "biosphere"  # Resources
                        else:
                            raise ValueError(
                                "Can't understand input group {}".format(exc.inputGroup.text)
                            )
                elif exc.tag in (
                    "{http://www.EcoInvent.org/EcoSpold01}allocation",
                    "allocation",
                ):
                    reference = int(exc.get("referenceToCoProduct")),
                    fraction  = float(exc.get("fraction")),
                    exchanges = [int(c.text) for c in exc.iterchildren() if c.tag != "comment"],
                else:
                    raise ValueError("Flow data type %s not understood" % exc.tag)                       
            return True
        except Exception as e: 
            print(f"Error message: {e}")
            return False
        # except AttributeError:
        #     return False

    @classmethod
    def process_dataset(cls, dataset, filename, db_name):
        ref_func = dataset.metaInformation.processInformation.referenceFunction
        def get_comment():
            try: 
                comments = [
                    ref_func.get("generalComment"),
                    ref_func.get("includedProcesses"),
                    (
                        "Location: ",
                        dataset.metaInformation.processInformation.geography.get("text"),
                    ),
                    (
                        "Technology: ",
                        dataset.metaInformation.processInformation.technology.get("text"),
                    ),
                    (
                        "Time period: ",
                        getattr2(dataset.metaInformation.processInformation, "timePeriod").get(
                            "text"
                        ),
                    ),
                    (
                        "Production volume: ",
                        getattr2(
                            dataset.metaInformation.modellingAndValidation, "representativeness"
                        ).get("productionVolume"),
                    ),
                    (
                        "Sampling: ",
                        getattr2(
                            dataset.metaInformation.modellingAndValidation, "representativeness"
                        ).get("samplingProcedure"),
                    ),
                    (
                        "Extrapolations: ",
                        getattr2(
                            dataset.metaInformation.modellingAndValidation, "representativeness"
                        ).get("extrapolations"),
                    ),
                    (
                        "Uncertainty: ",
                        getattr2(
                            dataset.metaInformation.modellingAndValidation, "representativeness"
                        ).get("uncertaintyAdjustments"),
                    ),
                ]
                comment = "\n".join(
                    [
                        (" ".join(x) if isinstance(x, tuple) else x)
                        for x in comments
                        if (x[1] if isinstance(x, tuple) else x)
                    ]
                )
                return comment
            except: 
                return []

        def get_authors():
            try: 
                ai = dataset.metaInformation.administrativeInformation
                data_entry = []
                for elem in ai.iterchildren():
                    if "dataEntryBy" in elem.tag:
                        data_entry.append(elem.get("person"))

                fields = [
                    ("address", "address"),
                    ("company", "companyCode"),
                    ("country", "countryCode"),
                    ("email", "email"),
                    ("name", "name"),
                ]

                authors = []
                for elem in ai.iterchildren():
                    if "person" in elem.tag and elem.get("number") in data_entry:
                        authors.append({label: elem.get(code) for label, code in fields})
                return authors
            except: 
                return []

        data = {
            "categories": [ref_func.get("category"), ref_func.get("subCategory")],
            "code": int(dataset.get("number")),
            "comment": get_comment(),
            "authors": get_authors(),
            "database": db_name,
            "exchanges": cls.process_exchanges(dataset),
            "filename": filename,
            "location": dataset.metaInformation.processInformation.geography.get(
                "location"
            ),
            "name": ref_func.get("name").strip(),
            "type": "process",
            "unit": ref_func.get("unit"),
        }
        try: 
            allocation_exchanges = [
                exc for exc in data["exchanges"] if exc.get("reference")
            ]
        except: 
            allocation_exchanges = []

        if allocation_exchanges != []:
            data["allocations"] = allocation_exchanges
            data["exchanges"] = [exc for exc in data["exchanges"] if exc.get("type")]

        return data
``

Hope this helps!

@cmutel
Copy link
Member

cmutel commented May 7, 2024

Dear everyone, apologies for not seeing this or responding earlier. We just merged new ecospold1 handling which does a complete import of all ecospold 1 attributes, including all the annoying paperwork ones. This uses pyecospold, which requires validation against the XSD schema files. Unfortunately, the file that @sc-gcoste uploaded is not a valid ecospold1 file. You can check this yourself in a venv with pyecospold and xmlschema attached. Running the following:

import xmlschema
import pyecospold
from pathlib import Path

xsd = Path(pyecospold.__file__).parent / "schemas" / "v1" / "EcoSpold01Dataset.xsd"


def get_validation_errors(xml_file: Path, xsd_file: Path):
    schema = xmlschema.XMLSchema(xsd_file)
    validation_error_iterator = schema.iter_errors(open(xml_file).read())
    for idx, validation_error in enumerate(validation_error_iterator, start=1):
        print(f'[{idx}]\n\tpath: {validation_error.path}\n\treason: {validation_error.reason}')

get_validation_errors(
    "process_000f29c8-0b4b-32f7-96f7-e0f29530d2fb.xml",
    xsd
)

Gives the following errors:

My inclination is to not support files which are very invalid - it would mean writing much more complicated code and would also make testing quite difficult. Note that openLCA is the only ones publishing invalid ecospold 1/2 files - even the big boys do it sometimes. However, we can make adjustments to the schema if there is a good reason. You can find the ecospold1 schema here, and the changes we have made to that schema here.

@tngTUDOR @jsvgoncalves FYI and feel free to express your opinion.
@msrocka FYI

@msrocka
Copy link

msrocka commented May 8, 2024

It is not so visible in the user interface, but in the EcoSpold 1 export wizard in openLCA there is a second page when you click Next and there you can set the option Create default values for missing fields:

image

When I import the attached example dataset above and export it again with this option, it will generate a default start- and end-date:

<timePeriod dataValidForEntirePeriod="true" text="Unspecified">
  <startDate>9999-01-01+01:00</startDate>
  <endDate>9999-12-31+01:00</endDate>
</timePeriod>

and also a default person which is then linked as data generator etc.:

<administrativeInformation>
  <dataEntryBy person="1"/>
<dataGeneratorAndPublication
  person="1"
  dataPublishedIn="0"
  copyright="true"
  accessRestrictedTo="0"/>
<person
  number="1"
  name="default"
  address="Created for EcoSpold 1 compatibility"
  telephone="000"
  companyCode="default"
  countryCode="CH"/>
</administrativeInformation>

edit: I think the dataset is then valid against the updated schema of pyecospold. However, it maybe would make sense to make also other elements in the schema optional.

@cmutel
Copy link
Member

cmutel commented May 8, 2024

Thanks a lot @msrocka! It might make sense to have that field checked by default - I am not sure what the specific business stories are to emit data which doesn't validate against the schema, but probably the default should be a valid file, even if some data is not usable.

@jsvgoncalves
Copy link
Member

My inclination is to not support files which are very invalid - it would mean writing much more complicated code and would also make testing quite difficult. Note that openLCA is the only ones publishing invalid ecospold 1/2 files - even the big boys do it sometimes.

Agreed, would not try to fix very invalid files. But maybe we could try improving the error/exception information to make it a bit more obvious that the file is very invalid.

@cmutel
Copy link
Member

cmutel commented May 8, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants