Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNES Download .DBF error #167

Closed
Twetler opened this issue Oct 19, 2023 · 4 comments · Fixed by #168
Closed

CNES Download .DBF error #167

Twetler opened this issue Oct 19, 2023 · 4 comments · Fixed by #168
Assignees
Labels

Comments

@Twetler
Copy link

Twetler commented Oct 19, 2023

Hey guys!!

I was following @esloch 's guidance on this unpack requires a buffer of 32 bytes issue.

After also following the CNES notebook I face some errors when ftp.databases.cnes.CNES tries to convert .DBF files to parquet.

I'm using the latest version git+https://github.com/AlertaDengue/PySUS@0.10.2

STEPS:


from pysus.ftp.databases.cnes import CNES

cnes = CNES()
cnes.load()

files = cnes.get_files('ST', year=2020, uf = 'SP', month = [1,2,3,4,5,6,7,8,9,10,11,12])

--Output
files  = [STSP2001.dbc,
 STSP2002.dbc,
 STSP2003.dbc,
 STSP2004.dbc,
 STSP2005.dbc,
 STSP2006.dbc,
 STSP2007.DBF,
 STSP2007.dbc,
 STSP2008.DBF,
 STSP2008.dbc,
 STSP2009.dbc,
 STSP2010.dbc,
 STSP2011.dbc,
 STSP2012.dbc]

It downloads the .DBF files correctly, but when it tries to convert it to .parquet drops the following error when I execute

parquets = cnes.download(files)

ValueError                                Traceback (most recent call last)
File ~/projects/datasus/DataSUS/env/lib/python3.11/site-packages/pysus/data/__init__.py:104, in dbf_to_parquet(dbf, _pbar)
    102 chunk_size = 30_000
    103 for chunk in stream_dbf(
--> 104     DBF(path, encoding="iso-8859-1", raw=True), chunk_size
    105 ):
    106     if _pbar:

Seems to be a ParquetSet error. Couldn't understand it further. Have you guys had anything related to this?

@luabida luabida self-assigned this Oct 19, 2023
@luabida
Copy link
Collaborator

luabida commented Oct 19, 2023

hello @Twetler, I'm going to fix this issue and it will be available on next release soon enough. But while it's not available, you can just remove the .DBF files from this list (you should remove STSP2007 and STSP2008 parquets & DBFs from ~/pysus as well) and try downloading it again.

from pysus.ftp.databases.cnes import CNES

cnes = CNES()
cnes.load()

files = cnes.get_files('ST', year=2020, uf = 'SP', month = [1,2,3,4,5,6,7,8,9,10,11,12])

dbc_files = [f for f in files if f.extension != ".DBF"]

dbc_files
[STSP2001.dbc,
 STSP2002.dbc,
 STSP2003.dbc,
 STSP2004.dbc,
 STSP2005.dbc,
 STSP2006.dbc,
 STSP2007.dbc,
 STSP2008.dbc,
 STSP2009.dbc,
 STSP2010.dbc,
 STSP2011.dbc,
 STSP2012.dbc]

@github-actions
Copy link

🎉 This issue has been resolved in version 0.10.3 🎉

The release is available on:

Your semantic-release bot 📦🚀

@luabida
Copy link
Collaborator

luabida commented Oct 19, 2023

@Twetler please update pysus version to 0.10.3, don't forget to delete the previous corrupted files from your local directory ($HOME/pysus)

@Twetler
Copy link
Author

Twetler commented Oct 19, 2023

Guys thanks for the help and for the fix.

My pipeline is working awesomelly due to the lib, cheers for the great work!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants