This package collects a set of utilities for handling with public databases published by Brazil's DATASUS The documentation of how to use PySUS can be found here
- Decode encoded patient age to any time unit (years, months, etc)
- Convert
.dbc
files to DBF databases or read them into pandas dataframes. DBC files are basically DBFs compressed by a proprietary algorithm. - Read SINAN dbf files returning DataFrames with properly typed columns
- Download SINASC data
- Download SIH data
- Download SIA data
- Download SIM data
- Download CIHA data
There are some dependencies which can't be installed through pip, namely libffi
. Therefore on an ubuntu system:
sudo apt install libffi-dev
Then you can proceed to
sudo pip install PySUS
Reading SINAN files:
>>> from pysus.preprocessing.sinan import read_sinan_dbf
>>> df = read_sinan_dbf('mytest.dbf', encoding='latin-1')
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65535 entries, 0 to 65534
Data columns (total 10 columns):
DT_DIGITA 65469 non-null object
DT_NOTIFIC 65535 non-null object
DT_SIN_PRI 65535 non-null object
ID_AGRAVO 65535 non-null object
ID_BAIRRO 50675 non-null float64
ID_MUNICIP 65535 non-null int64
NM_BAIRRO 60599 non-null object
NU_ANO 65535 non-null int64
SEM_NOT 65535 non-null int64
SEM_PRI 65535 non-null int64
dtypes: float64(1), int64(4), object(5)
memory usage: 5.0+ MB
>>> df.DT_DIGITA[0]
datetime.date(2016, 4, 1)
Reading .dbc
file:
>>> from pysus.utilities.readdbc import read_dbc
>>> df = read_dbc(filename, encoding='iso-8859-1')
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1239 entries, 0 to 1238
Data columns (total 58 columns):
AP_MVM 1239 non-null object
AP_CONDIC 1239 non-null object
AP_GESTAO 1239 non-null object
AP_CODUNI 1239 non-null object
AP_AUTORIZ 1239 non-null object
AP_CMP 1239 non-null object
AP_PRIPAL 1239 non-null object
AP_VL_AP 1239 non-null float64
...
Downloading and reading SINASC data:
In[1]: from pysus.online_data.sinasc import download
In[2]: df = download('SE', 2015)
In[3]: df.head()
Out[3]:
NUMERODN CODINST ORIGEM ... TPROBSON PARIDADE KOTELCHUCK
0 19533794 MSE2805100001 1 ... 11 1 9
1 52927108 MSE2802700001 1 ... 11 1 9
2 54673238 MSE2804400001 1 ... 11 1 5
3 54673239 MSE2804400001 1 ... 10 1 3
4 54695292 MBA2916500001 1 ... 03 1 2
[5 rows x 64 columns]
Dowloading and reading SIM data:
In[1]: from pysus.online_data.SIM import download
In[2]: df = download('ba', 2007)
In[3]: df.head()
Out[3]:
NUMERODO TIPOBITO DTOBITO ... UFINFORM CODINST CB_PRE
0 01499664 2 30072007 ... 29 RBA2914800001 C229
1 09798190 2 04072007 ... 29 RBA2914800001 R98
2 01499665 2 25082007 ... 29 RBA2914800001 I10
3 10595623 2 11092007 ... 29 RBA2914800001 G839
4 10599666 2 09082007 ... 29 EBA2927400001 I499
[5 rows x 56 columns]
Dowloading and reading CIHA data:
In[1]: from pysus.online_data.CIHA import download
In[2]: df = download('mg', 2009, 7)
In[3]: df.head()
Out[3]:
ANO_CMPT MES_CMPT ESPEC CGC_HOSP ... CAR_INT HOMONIMO CNES FONTE
0 2009 07 16505851000126 ... 2126796 1
1 2009 07 16505851000126 ... 2126796 2
2 2009 07 16505851000126 ... 2126796 6
3 2009 07 16505851000126 ... 2126796 6
4 2009 07 16505851000126 ... 2126796 1
[5 rows x 27 columns]