分析结果：

[TOC]

分析结果：

使用方法：

download这个项目

python crawler.py 获得数据info.json,可以的话用我抓好的100个数据./info.json，自己抓耗时也得几分钟。

json数据结构为:

{
  'user1Id':{
    'nikeName':nikeName,
    'fans':['fans1','fans2'],//粉丝列表前２０个
    'level':level,
    'songsAllRank':{'song1Id':'song1Name','song2Id':'song2Name'}／／所有时间听歌排行前１００
  }
}

python cluster即可得到图

需要：

firefox 浏览器
pip install selenium
浏览器驱动:
- firefoxhttps://github.com/mozilla/geckodriver/releases
  - chmod +x geckodriver
  - sudo cp geckodriver /usr/bin
- chromechromedriver
  - chmod +x chromedriver
  - sudo cp chromedriver /sur/bin

个性化运行时可能需要修改的地方：

crawler.py Ids换为您需要的id，也可保持不变
crawler.py 中的crawler(100),100换为你想要抓取的人数，默认为100个

遇到过的障碍：

ifream 中的数据抓取

# 获取g_iframe中的元素信息
driver.switch_to_frame('g_iframe')

selenium的span单击报错：

</iframe> is not clickable at point
解决办法：

    # change
    songsAll = driver.find_element_by_css_selector('#songsall')
    action_chains = ActionChains(driver)
    action_chains.click(songsAll)
    action_chains.perform()
  
    # to
    songsAll = driver.find_element_by_css_selector('#songsall')
    driver.execute_script('arguments[0].click();',songsAll)

element找不到的情况：

# selenium隐式等待2秒
driver.implicitly_wait(2)

pandas.read_json()会自动转换为时间戳(现已不用pandas方案，直接用json)
```
# 禁止转换
pd.read_json(json.dumps(UserDict),convert_axes=False)
```

字典过滤：

def dataCleaning(data):
    # 字典过滤,将采集数据中搜有时间听歌100首以上的前100首过滤出来
    return {k:v for(k,v) in data.items() if len(v['songsAllRank']) !=0}

matplotlib 中文字例未解决前：解决办法：
- 第一步:下载字体:msyh.ttf (微软雅黑)放在系统字体文件夹下:/usr/share/fonts
  
  同时也复制并放在matplotlib的字体文件夹/fonts/ttf下
- 第二步：修改matplotlib配置文件：如上图的目录删除font.family和font.sans-serif两行前的#， font.family 改为Microsoft YaHei 并在font.sans-serif后添加中文字体Microsoft YaHei 如图：
- 第三步：删除~/.cache/matplotlib下文件fontList.py3k.cache

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
changeMatplotlibrc.png		changeMatplotlibrc.png
chromedriver		chromedriver
cluster.py		cluster.py
clusterAnswer2.png		clusterAnswer2.png
crawler.py		crawler.py
figureCantChinese.png		figureCantChinese.png
geckodriver.log		geckodriver.log
info-json.png		info-json.png
info.json		info.json
matplotlib目录.png		matplotlib目录.png
msyh.ttf.zip		msyh.ttf.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

分析结果：

使用方法：

需要：

个性化运行时可能需要修改的地方：

遇到过的障碍：

About

Releases

Packages

Languages

gao-lex/NeteaseUserSimilarity

Folders and files

Latest commit

History

Repository files navigation

分析结果：

使用方法：

需要：

个性化运行时可能需要修改的地方：

遇到过的障碍：

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages