Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There are also some links that cannot be obtained, and the problem of abnormal acquisition #3

Closed
wongchenv opened this issue Apr 21, 2021 · 4 comments

Comments

@zolrath
Copy link
Owner

zolrath commented Apr 22, 2021

The mp.weixin.qq.com URLs load the title onto the page using Javascript.
If we use a headless browser or something of that nature I could let Javascript execute then grab the title after the page has loaded but that wouldn't function on mobile.

Hypothetically if we had some kind of funds for this project I could build a new CORS proxy using a headless browser and use that for URL fetching, fixing both the issue of Javascript loaded pages as well as encoding issues but as I'd need to pay to host it publicly that's not something I'm considering at the moment.

Alternatively that browser/API could be run locally by a user and turned on in settings.
I'll think on this as well!

@zolrath
Copy link
Owner

zolrath commented Apr 23, 2021

I've got a local scraping solution working on desktop, it still doesn't get a title out of weixin.qq.com but it succeeds at the other two. Need to perform mobile tests.

[Title Unknown](https://mp.weixin.qq.com/mp/appmsgalbum?__biz=Mzg4MjAwNTUwNw==&action=getalbum&album_id=1448541657456295937&scene=173&from_msgid=2247484083&from_itemidx=1&count=10#wechat_redirect&scene=0&subscene=90&sessionid=1606652573&enterid=1606653138)

[弱点 (豆瓣)](https://movie.douban.com/subject/3552028/)

[手绘100张,耗时1个月,我终于破解了【达芬奇密码书】的全部秘密!_哔哩哔哩 (゜-゜)つロ 干杯~-bilibili](https://www.bilibili.com/video/BV1qy4y1t7fn?spm_id_from=333.851.b_7265636f6d6d656e64.3)

@zolrath
Copy link
Owner

zolrath commented Apr 23, 2021

Fixed for Desktop on 1.2.0
Mobile still relies on the CORS proxy that doesn't support these characters.

@zolrath zolrath closed this as completed Apr 23, 2021
@DDDOH
Copy link

DDDOH commented Dec 5, 2023

WeChat links failed again. The link is given here: https://mp.weixin.qq.com/s/nVilywouNxnZlb-l3Buj3w

image

Is it possible to set a rule, and for websites following this rule we will fetch the whole page and get the title for them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@zolrath @wongchenv @DDDOH and others