-
Notifications
You must be signed in to change notification settings - Fork 683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeDecodeError on scraping data from multiple sources #424
Comments
I am experiencing the same error here. I think it happens with google source.
|
I am also having the same problem. UnicodeDecodeError Traceback (most recent call last) ~/envs/3.5/lib/python3.5/site-packages/pandas_datareader/data.py in DataReader(name, data_source, start, end, retry_count, pause, session, access_key) ~/envs/3.5/lib/python3.5/site-packages/pandas_datareader/base.py in read(self) ~/envs/3.5/lib/python3.5/site-packages/pandas_datareader/base.py in _read_one_data(self, url, params) ~/envs/3.5/lib/python3.5/site-packages/pandas_datareader/base.py in _read_url_as_StringIO(self, url, params) ~/envs/3.5/lib/python3.5/site-packages/pandas/compat/init.py in bytes_to_str(b, encoding) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xec in position 352: invalid continuation byte |
I get the same problem - this all worked fine just last week import pandas_datareader.data as wb start = datetime.datetime(2017, 1, 1) stocks = ['LON:KGF', 'LON:ADM'] error message
|
A quick fix is below, porting from the source, paring it down and making a few slight tweaks. I believe the issue is with the body returned by I really haven't tested this rigorously but the Google API does appear to be working okay. For instance, the export link generated by import datetime
import requests
from io import StringIO
# This is just a wrapper importing the compatible version of
# urllib's urlencode--see pandas docs
from pandas.io.common import urlencode
import pandas as pd
BASE = 'http://finance.google.com/finance/historical'
# There seems to be confusion over whether the date api has changed.
# https://github.com/pydata/pandas-datareader/pull/425
# Both formats seem to work, but I'll use the "newer" one here to be safe
def get_params(symbol, start, end):
params = {
'q': symbol,
'startdate': start.strftime('%Y/%m/%d'),
'enddate': end.strftime('%Y/%m/%d'),
'output': "csv"
}
return params
def build_url(symbol, start, end):
params = get_params(symbol, start, end)
return BASE + '?' + urlencode(params)
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime.today()
sym = 'SPY'
url = build_url(sym, start, end)
data = requests.get(url).text
data = pd.read_csv(StringIO(data), index_col='Date', parse_dates=True)
print(data.head())
# Open High Low Close Volume
# Date
# 2017-11-30 263.76 266.05 263.67 265.01 127894389
# 2017-11-29 263.02 263.63 262.20 262.71 77512102
# 2017-11-28 260.76 262.90 260.66 262.87 98971719
# 2017-11-27 260.41 260.75 260.00 260.23 52274922
# 2017-11-24 260.32 260.48 260.16 260.36 27856514 |
Check if |
That's very strange @qmpzqmpz because the url seems to be correct in source, at least in 0.5.0: But when I test, url attribute shows the "old" url.
|
@bsolomon1124 your fix works well. |
Can someone write a pull request and fix the bugs??? |
Busy week for me but I can try to submit this this weekend. Although, it looks like some other commits have been failing the travisci build. |
Testing your fix works nicely thanks. In the datareader I pulled pricing for a list of tickers (e.g. LON:BARC, LON:KGF, LON:BLND) the result gave me a dataframe with a panel for each stock. If I feed a list to the fix above it, the API doesn't like it. I know I can iterate through the list but wanted to do it once and return a panel. Am I missing something stupid? |
@nzd31155 Yeah, the pandas-datareader code for reading multiple signals is a loop that reads each individually and then returns a Panel. You can find it here: https://github.com/pydata/pandas-datareader/blob/master/pandas_datareader/base.py#L189 The class structure is like this--
Just fyi that Panel has a deprecation warning on it as of pandas 0.20. A MultiIndex df would be a good alternative. I'm not an active developer on pandas-datareader but glad to take a deeper look when I have a moment and try to get something going that passes the build tests. |
Just an update regarding the url: it's correct in the GitHub repo, but outdated in the PyPI download with equivalent version. (Go figure...) To check: >>> import pandas_datareader as pdr
>>> test = pdr.google.daily.GoogleDailyReader('')
>>> test.url??
Type: property
String form: <property object at 0x10bdb39a8>
Source:
# test.url.fget
@property
def url(self):
return 'http://www.google.com/finance/historical' |
This appears to be fixed in master, so closing for now. Reopen if this persists after 0.6.0 |
|
from pandas_datareader import data |
I still have the same issue as well.
|
When I run the following code I get
UnicodeDecodeError: 'utf-8' codec can't decode byte [x] in position [y]: invalid continuation byte
, where[x]
&[y]
vary depending on the requested stock data or source:I've tested on various sources (3 so far on 4 or 5 stocks) and I consistently get this error.
I'm running this on a conda environment, where the python's version is
3.6
andpandas_datareader
's version is0.5.0
.Could someone point out what is the issue here?
The text was updated successfully, but these errors were encountered: