two inputs possible #59

mpahl · 2019-11-08T14:04:50Z

No description provided.

hansendx · 2019-11-12T12:38:17Z

Please be a little more elaborate with your pull request description and commit messages.

collect_stata/__main__.py

hansendx · 2019-11-12T12:54:12Z

collect_stata/__main__.py

+        if input_path is None:
+            for file_de in input_german_path.glob("*.dta"):
+                file = None
+                process = Process(target=_run, args=(file, file_de, output_path, study_name))
+                processes.append(process)
+                process.start()
+        if input_german_path is None:
+            for file in input_path.glob("*.dta"):
+                file_de = None
+                process = Process(target=_run, args=(file, file_de, output_path, study_name))
+                processes.append(process)
+                process.start()
+        if input_path is not None and input_german_path is not None:
+            for file in input_path.glob("*.dta"):
+                file_de = pathlib.Path(str(input_german_path) + "/" + os.path.basename(str(file)))
+                process = Process(target=_run, args=(file, file_de, output_path, study_name))
+                processes.append(process)
+                process.start()

        # complete the processes
        for process in processes:
            process.join()
    else:
-        for file in input_path.glob("*.dta"):
-            _run(file=file, output_path=output_path, study_name=study_name)
+        if input_path is None:
+            for file_de in input_german_path.glob("*.dta"):
+                file = None
+                _run(file=file, file_de=file_de, output_path=output_path, study_name=study_name)
+        if input_german_path is None:
+            for file in input_path.glob("*.dta"):
+                file_de = None
+                _run(file=file, file_de=file_de, output_path=output_path, study_name=study_name)
+        if input_path is not None and input_german_path is not None:
+            for file in input_path.glob("*.dta"):
+                file_de = pathlib.Path(str(input_german_path) + "/" + os.path.basename(str(file)))
+                _run(file=file, file_de=file_de, output_path=output_path, study_name=study_name)

    duration = time.time() - start_time
    logging.info("Duration {:.5f} seconds".format(duration))


-def _run(file: pathlib.Path, output_path: pathlib.Path, study_name: str) -> None:
+def _run(file: pathlib.Path, file_de: pathlib.Path, output_path: pathlib.Path, study_name: str) -> None:
    """Encapsulate data processing run with multiprocessing."""
-    file_path = output_path.joinpath(file.stem).with_suffix(".json")
+    if file is None:
+        file_path = output_path.joinpath(file_de.stem).with_suffix(".json")
+        stata_data_de = StataDataExtractor(file_de)
+        stata_data_de.parse_file()
+
+        write_json(stata_data_de.data, None, stata_data_de.metadata, file_path, study=study_name)
+
+    elif file_de is None:
+        file_path = output_path.joinpath(file.stem).with_suffix(".json")
+        stata_data = StataDataExtractor(file)
+        stata_data.parse_file()
+
+        write_json(stata_data.data, stata_data.metadata, None, file_path, study=study_name)
+
+    elif file is not None and file_de is not None:
+        file_path = output_path.joinpath(file.stem).with_suffix(".json")
+        stata_data = StataDataExtractor(file)
+        stata_data.parse_file()
+        stata_data_de = StataDataExtractor(file_de)
+        stata_data_de.get_variable_metadata()


This looks like you could remove a lot of redundancy and maybe reduce complexity

hansendx · 2019-11-12T13:04:25Z

collect_stata/write_json.py

+    metadata_en: List[Dict[str, Union[str, Dict[str, List[Union[int, str, bool]]]]]],
+    metadata_de: List[Dict[str, Union[str, Dict[str, List[Union[int, str, bool]]]]]],


Why do you pass this to write_json and not directly into update_meatadata?

hansendx · 2019-11-12T13:06:11Z

collect_stata/write_json.py

+    metadata: List[Dict[str, Union[str, Dict[str, List[Union[int, str, bool]]]]]],
+    metadata_de: List[Dict[str, Union[str, Dict[str, List[Union[int, str, bool]]]]]],


We updated to python 3.8 on the develop branch. You can use TypedDicts to make this more readable and more precise. https://mypy.readthedocs.io/en/latest/more_types.html#typeddict

hansendx · 2019-11-12T13:08:40Z

collect_stata/write_json.py

+    Input:
+    metadata: Metadata of the english imported data.
+    metadata_de: Metadata of the german imported data.
+
+    Output:
+    metadata: Metadata variable with german and english labels if given.


Please use Args: and Returns. Please indent blocks after :
https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html

hansendx · 2019-11-12T13:38:39Z

collect_stata/write_json.py

@@ -202,10 +202,44 @@ def generate_statistics(

    return metadata

+def update_metadata(


This function would better fit into one of the other modules.
https://en.wikipedia.org/wiki/Cohesion_(computer_science)

hansendx · 2019-12-06T09:51:09Z

Please integrate the changes from develop.
The function stata_to_json was replaced with the class StataToJson.

hansendx · 2019-12-12T13:28:13Z

collect_stata/__main__.py

@@ -4,7 +4,7 @@
 import argparse
 import logging
 import os
-import pathlib
+import pathlib import Path


syntax error.
Please use your local linter and or the pre-commit hooks provided, to prevent this.

lgtm-com · 2019-12-12T13:30:26Z

This pull request introduces 2 alerts when merging 0dc5f9f into 8f9220b - view on LGTM.com

new alerts:

1 for Unused import
1 for Syntax error

Sometimes elem["categories"]["missings"] was not initiated before calling elem["categories"]["missings"].append()

variable_meta["categories"]["values"] sometimes contained numbers of numpy integer types. Those numbers caused json.dumps() to fail since it could not be serialize these integer objects.

Wrong indentation in multithreading block caused only one process to run.

lgtm-com · 2020-01-31T11:59:14Z

This pull request introduces 2 alerts when merging 0df40e0 into 8f9220b - view on LGTM.com

new alerts:

2 for Variable defined multiple times

Process gathering and starting was only done inside the block for processing two languages. * Remove lint * Fix type hints * Refactor bloated code

* Homogenize naming * Fix formatting

* Value labels are not switched when language of the stata file is switched. + Variable labels are though, which lead to the false expectations and faulty implementation. * Value labels are bound to a key which in turn is associated with a language and a variable. This key seems to be arbitrarily assignable. * It seems that value labels can only be associated with their variables through the position of the key inside the lbllist stata reader attribute.

* Cast statistics values from numpy internal type to python numeric values.

lgtm-com · 2020-02-26T08:29:19Z

This pull request introduces 1 alert when merging 029afbd into 8f9220b - view on LGTM.com

new alerts:

1 for Unused local variable

hansendx · 2020-02-27T14:09:04Z

labels_de list is too short in some variables.
It seems that all other lists under the "categories" key are filled in to make them comparable with the value label lists of related variables.

Here is a snippet with labels and labels_de:

      "labels": [
        "[-6] Version of questionnaire with modified filtering",
        "[-5] Not included in this version of the questionnaire",
        "[-4] Inadmissible multiple response",
        "[-3] Answer improbable",
        "[-2] Does not apply",
        "[-1] No Answer",
        "[0] 0 Satisfied: On Scale 0-Low to 10-High",
        "[10] 10 Satisfied: On Scale 0-Low to 10-High",
        "8",
        "7",
        "5",
        "9",
        "6",
        "4",
        "3",
        "2",
        "1"
      ],
      "labels_de": [
        "[-6] Fragebogenversion mit geaenderter Filterfuehrung",
        "[-5] In Fragebogenversion nicht enthalten",
        "[-4] Unzulaessige Mehrfachantwort",
        "[-3] nicht valide",
        "[-2] trifft nicht zu",
        "[-1] keine Angabe",
        "[0] Ganz unzufrieden",
        "[10] Ganz zufrieden"
      ],

Here is the full variable:

  {
    "name": "qp14301",
    "dataset": "qp",
    "label": "Satisfaction With Life At Today",
    "categories": {
      "values": [
        -6,
        -5,
        -4,
        -3,
        -2,
        -1,
        0,
        10,
        8,
        7,
        5,
        9,
        6,
        4,
        3,
        2,
        1
      ],
      "labels": [
        "[-6] Version of questionnaire with modified filtering",
        "[-5] Not included in this version of the questionnaire",
        "[-4] Inadmissible multiple response",
        "[-3] Answer improbable",
        "[-2] Does not apply",
        "[-1] No Answer",
        "[0] 0 Satisfied: On Scale 0-Low to 10-High",
        "[10] 10 Satisfied: On Scale 0-Low to 10-High",
        "8",
        "7",
        "5",
        "9",
        "6",
        "4",
        "3",
        "2",
        "1"
      ],
      "labels_de": [
        "[-6] Fragebogenversion mit geaenderter Filterfuehrung",
        "[-5] In Fragebogenversion nicht enthalten",
        "[-4] Unzulaessige Mehrfachantwort",
        "[-3] nicht valide",
        "[-2] trifft nicht zu",
        "[-1] keine Angabe",
        "[0] Ganz unzufrieden",
        "[10] Ganz zufrieden"
      ],
      "missings": [
        true,
        true,
        true,
        true,
        true,
        true,
        false,
        false,
        false,
        false,
        false,
        false,
        false,
        false,
        false,
        false,
        false
      ],
      "frequencies": [
        0,
        0,
        0,
        0,
        0,
        66,
        111,
        1554,
        7484,
        5242,
        2993,
        2828,
        2674,
        753,
        512,
        269,
        90
      ]
    },
    "scale": "cat",
    "label_de": "Lebenszufriedenh. gegenwaertig",
    "study": "soep-core",
    "statistics": {
      "valid": 24510,
      "invalid": 66
    }
  },
  {
    "name": "qp14302",
    "dataset": "qp",
    "label": "Satisfaction With Life In Five Years",
    "categories": {
      "values": [
        -6,
        -5,
        -4,
        -3,
        -2,
        -1,
        0,
        10,
        8,
        7,
        9,
        5,
        6,
        4,
        3,
        2,
        1
      ],
      "labels": [
        "[-6] Version of questionnaire with modified filtering",
        "[-5] Not included in this version of the questionnaire",
        "[-4] Inadmissible multiple response",
        "[-3] Answer improbable",
        "[-2] Does not apply",
        "[-1] No Answer",
        "[0] 0 Satisfied: On Scale 0-Low to 10-High",
        "[10] 10 Satisfied: On Scale 0-Low to 10-High",
        "8",
        "7",
        "9",
        "5",
        "6",
        "4",
        "3",
        "2",
        "1"
      ],
      "labels_de": [
        "[-6] Fragebogenversion mit geaenderter Filterfuehrung",
        "[-5] In Fragebogenversion nicht enthalten",
        "[-4] Unzulaessige Mehrfachantwort",
        "[-3] nicht valide",
        "[-2] trifft nicht zu",
        "[-1] keine Angabe",
        "[0] Ganz unzufrieden",
        "[10] Ganz zufrieden"
      ],
      "missings": [
        true,
        true,
        true,
        true,
        true,
        true,
        false,
        false,
        false,
        false,
        false,
        false,
        false,
        false,
        false,
        false,
        false
      ],
      "frequencies": [
        0,
        0,
        0,
        0,
        0,
        560,
        141,
        1836,
        6847,
        4347,
        3732,
        2799,
        2359,
        924,
        580,
        328,
        123
      ]
    },
    "scale": "cat",
    "label_de": "Lebenszufriedenh. in 5 Jahren",
    "study": "soep-core",
    "statistics": {
      "valid": 24016,
      "invalid": 560
    }
  }

lgtm-com · 2022-11-21T08:28:07Z

This pull request introduces 2 alerts when merging ae65259 into 8f9220b - view on LGTM.com

new alerts:

1 for Unused import
1 for Variable defined multiple times

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

two inputs possible

35e1567

mpahl requested a review from hansendx November 8, 2019 14:04

hansendx requested changes Nov 12, 2019

View reviewed changes

hansendx reviewed Nov 12, 2019

View reviewed changes

mpahl added 2 commits November 12, 2019 15:51

fix typing errors

ee560d8

update _run function

c3a67b2

Merge branch 'develop' into rewrite_for_bilingual_datasets

0dc5f9f

hansendx requested changes Dec 12, 2019

View reviewed changes

hansendx added 4 commits January 31, 2020 10:13

Fix syntax error

9a2fb03

Fix KeyError

c7b1259

Sometimes elem["categories"]["missings"] was not initiated before calling elem["categories"]["missings"].append()

Fix TypeError when writing json

b828278

variable_meta["categories"]["values"] sometimes contained numbers of numpy integer types. Those numbers caused json.dumps() to fail since it could not be serialize these integer objects.

Fix wrong indentation

0df40e0

Wrong indentation in multithreading block caused only one process to run.

hansendx added 2 commits February 3, 2020 10:01

Update dependencies

5b9486d

Fix multiprocessing with only one language

43073eb

Process gathering and starting was only done inside the block for processing two languages. * Remove lint * Fix type hints * Refactor bloated code

hansendx added the blocked label Feb 7, 2020

hansendx added 4 commits February 26, 2020 08:50

Simplify identification of missing values

ddcfe6e

Refactor __main__ and write_json

0846f3d

* Homogenize naming * Fix formatting

Fix datatype issues

029afbd

* Cast statistics values from numpy internal type to python numeric values.

Update with orphaned changes

ae65259

hansendx and others added 3 commits August 1, 2023 14:32

Fix attribute name

a919fa9

Update for newer pandas version

6074e21

Fix issue where german labels are dropped

bed30a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

two inputs possible #59

two inputs possible #59

Uh oh!

mpahl commented Nov 8, 2019

Uh oh!

hansendx commented Nov 12, 2019

Uh oh!

Uh oh!

hansendx Nov 12, 2019

Uh oh!

hansendx Nov 12, 2019

Uh oh!

hansendx Nov 12, 2019

Uh oh!

hansendx Nov 12, 2019

Uh oh!

hansendx Nov 12, 2019 •

edited

Loading

Uh oh!

hansendx commented Dec 6, 2019

Uh oh!

hansendx Dec 12, 2019 •

edited

Loading

Uh oh!

lgtm-com bot commented Dec 12, 2019

Uh oh!

lgtm-com bot commented Jan 31, 2020

Uh oh!

lgtm-com bot commented Feb 26, 2020

Uh oh!

hansendx commented Feb 27, 2020

Uh oh!

lgtm-com bot commented Nov 21, 2022

Uh oh!

Uh oh!

		metadata_en: List[Dict[str, Union[str, Dict[str, List[Union[int, str, bool]]]]]],
		metadata_de: List[Dict[str, Union[str, Dict[str, List[Union[int, str, bool]]]]]],

		metadata: List[Dict[str, Union[str, Dict[str, List[Union[int, str, bool]]]]]],
		metadata_de: List[Dict[str, Union[str, Dict[str, List[Union[int, str, bool]]]]]],

		@@ -202,10 +202,44 @@ def generate_statistics(

		return metadata

		def update_metadata(

two inputs possible #59

Are you sure you want to change the base?

two inputs possible #59

Uh oh!

Conversation

mpahl commented Nov 8, 2019

Uh oh!

hansendx commented Nov 12, 2019

Uh oh!

Uh oh!

hansendx Nov 12, 2019

Choose a reason for hiding this comment

Uh oh!

hansendx Nov 12, 2019

Choose a reason for hiding this comment

Uh oh!

hansendx Nov 12, 2019

Choose a reason for hiding this comment

Uh oh!

hansendx Nov 12, 2019

Choose a reason for hiding this comment

Uh oh!

hansendx Nov 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hansendx commented Dec 6, 2019

Uh oh!

hansendx Dec 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lgtm-com bot commented Dec 12, 2019

Uh oh!

lgtm-com bot commented Jan 31, 2020

Uh oh!

lgtm-com bot commented Feb 26, 2020

Uh oh!

hansendx commented Feb 27, 2020

Uh oh!

lgtm-com bot commented Nov 21, 2022

Uh oh!

Uh oh!

hansendx Nov 12, 2019 •

edited

Loading

hansendx Dec 12, 2019 •

edited

Loading