Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

有其他提供免费代理的网站在这里说下,我添加到项目里 #71

Closed
jhao104 opened this issue Sep 21, 2017 · 32 comments
Closed

Comments

@jhao104
Copy link
Owner

jhao104 commented Sep 21, 2017

现在的代理网站不是很多,这样可用的代理IP就很少。我也尝试过扫描的方法,但是效率比较低

@CokkyWoo
Copy link

@jhao104
Copy link
Owner Author

jhao104 commented Oct 16, 2017

@CokkyWoo 这两个网站都是有的

@qinyongliang
Copy link

@1yzz
Copy link

1yzz commented Feb 24, 2019

@abc1763613206
Copy link

https://ip.ihuan.me

@llliuwenjie
Copy link

http://www.xiladaili.com/ 西拉代理

@dota2heqiuzhi
Copy link

dota2heqiuzhi commented Nov 19, 2019

@jhao104 你好。这边我看有人提供了几个墙外的代理网址,似乎都不错。可以抽空添加一下吗?(我自己没搞定,有的做了反爬,有的浏览器能打开,但是request连不上····)
谢谢了。
http://free-proxy.cz/zh/proxylist/country/US/https/ping/all
http://www.gatherproxy.com
http://proxydb.net/?protocol=https&anonlvl=4

目前代理墙外的代理网址只有3个,能抓到的太少了

顺便请教一下,为什么li浏览器可以打开,但是requests连不上
image
image

@jhao104
Copy link
Owner Author

jhao104 commented Nov 19, 2019

@dota2heqiuzhi
Copy link

@jhao104 那你有时间添加这些网址吗?
没空我就自己琢磨了···

@jhao104
Copy link
Owner Author

jhao104 commented Nov 19, 2019

@jhao104 那你有时间添加这些网址吗?
没空我就自己琢磨了···

墙外的你可以自己先搞

@dota2heqiuzhi
Copy link

dota2heqiuzhi commented Nov 21, 2019

@jhao104 那你有时间添加这些网址吗?
没空我就自己琢磨了···

墙外的你可以自己先搞

这两个网址都做了反爬···搞不定。
大佬空了可以搞定一个,我学习学习?
http://free-proxy.cz/zh/proxylist/country/US/https/ping/all
http://proxydb.net/?protocol=https&anonlvl=4

@jhao104

@jhao104
Copy link
Owner Author

jhao104 commented Nov 25, 2019

@jhao104 那你有时间添加这些网址吗?
没空我就自己琢磨了···

墙外的你可以自己先搞

这两个网址都做了反爬···搞不定。
大佬空了可以搞定一个,我学习学习?
http://free-proxy.cz/zh/proxylist/country/US/https/ping/all
http://proxydb.net/?protocol=https&anonlvl=4

@jhao104

image
image

就是js动态生成的,你把这段j s扣出来用pyv8或者pyexecjs执行就能拿到了

@dota2heqiuzhi
Copy link

dota2heqiuzhi commented Nov 25, 2019 via email

@1yzz
Copy link

1yzz commented Dec 5, 2019

http://proxydb.net/?protocol=https&anonlvl=4

    @staticmethod
    def proxyDBNet():
        urls = [
            'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=CN',
            'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=',
            'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=SG',
            'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=US',
            'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=CZ',
            'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=AR',
        ]
        request = WebRequest()

        for url in urls:
            r = request.get(url, timeout=20)
            proxies = re.findall(r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d+)', r.text)
            for proxy in proxies:
                yield proxy

@dota2heqiuzhi
Copy link

dota2heqiuzhi commented Dec 5, 2019 via email

@1yzz
Copy link

1yzz commented Dec 5, 2019

这个网站也做了反爬的(js)。你的爬取逻辑应该抓不到数据吧 1yzz notifications@github.com 于2019年12月5日周四 上午11:18写道:

http://proxydb.net/?protocol=https&anonlvl=4 @staticmethod def proxyDBNet(): urls = [ 'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=CN', 'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=', 'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=SG', 'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=US', 'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=CZ', 'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=AR', ] request = WebRequest() for url in urls: r = request.get(url, timeout=20) proxies = re.findall(r'(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}:\d+)', r.text) for proxy in proxies: yield proxy — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#71?email_source=notifications&email_token=AIUSUJDU3ONTHN5BM2KNKZLQXBXGRA5CNFSM4D34MZLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF7KUXQ#issuecomment-561949278>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUSUJHR4KAZ7HOACHQYOS3QXBXGRANCNFSM4D34MZLA .

 <td>
                        <script>
                            var  q =
                             '32.5.301'.split('').reverse().join('');
                            var yxy = /* */ atob('\x4d\x69\x34\x78\x4e\x44\x59\x3d'.replace(/\\x([0-9A-Fa-f]{2})/g,function(){return String.fromCharCode(parseInt(arguments[1], 16))}));
                            var  pp =  (8080 - ([]+[]))/**//**/ +  (+document.querySelector('[data-rnnumg]').getAttribute('data-rnnumg'))-[]+[];
                            document.write('<a href="/' + q + yxy + '/' + pp + '#http">' + q + yxy + String.fromCharCode(58) + pp + '</a>');
                        </script>
                    </td>

找到这个元素,script里面的内容定义一个函数,pyv8执行一下。document.querySelector('[data-rnnumg]').getAttribute('data-rnnumg') 这个值也能在DOM里面找到,可以解析DOM树,替换内容。

@1yzz
Copy link

1yzz commented Dec 5, 2019

这个网站也做了反爬的(js)。你的爬取逻辑应该抓不到数据吧 1yzz notifications@github.com 于2019年12月5日周四 上午11:18写道:

http://proxydb.net/?protocol=https&anonlvl=4 @staticmethod def proxyDBNet(): urls = [ 'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=CN', 'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=', 'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=SG', 'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=US', 'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=CZ', 'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=AR', ] request = WebRequest() for url in urls: r = request.get(url, timeout=20) proxies = re.findall(r'(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}:\d+)', r.text) for proxy in proxies: yield proxy — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#71?email_source=notifications&email_token=AIUSUJDU3ONTHN5BM2KNKZLQXBXGRA5CNFSM4D34MZLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF7KUXQ#issuecomment-561949278>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUSUJHR4KAZ7HOACHQYOS3QXBXGRANCNFSM4D34MZLA .

https://github.com/scrapinghub/splash 有个这个东西,html丢过去就完事了。但是不确定会不会影响爬虫效率/。

@hanjackcyw
Copy link

hanjackcyw commented Jan 16, 2020

我看好多人要这个网站的代理,我刚好才爬过,贴一下代码如下,需要安装scrapy包, 主要是我用scrapy用习惯了,当然用其它各种包做xpath解析也行。

    @staticmethod
    def freeProxy21():
        url = 'http://free-proxy.cz/en/proxylist'

        request = WebRequest()
        r = request.get(url, timeout=10)

        sel = scrapy.Selector(text=r.text)

        max_page = max([int(v) for v in sel.xpath('//div[@class="paginator"]/a/text()').extract() if v.isdigit()])
        print(max_page)

        for page in range(1, max_page + 1):
            r = request.get(url+'/main/{}'.format(page), timeout=10)

            sel = scrapy.Selector(text=r.text)

            proxies = sel.xpath('//table[@id="proxy_list"]/tbody/tr/td/script[contains(text(),"decode")]/text()').extract()
            ports = sel.xpath('//table[@id="proxy_list"]/tbody/tr/td/span/text()').extract()

            for index, value in enumerate(proxies):
                try:
                    proxy_ip = re.search('.*decode\(\"(.*)\"\)', value).group(1)
                    if proxy_ip:
                        proxy = '{}:{}'.format(base64.b64decode(proxy_ip).decode('utf-8'), ports[index])
                        yield proxy
                except Exception as e:
                    pass

@hailiang-wang
Copy link

@hanjackcyw scrapy会带来很大体积,如果只是为了使用 Selector可以用Scrapy底层的库。

https://parsel.readthedocs.io/en/latest/

@dpawsbear
Copy link

好像这个代理也不错:https://proxy.mimvp.com/freeopen

@lyonLeeLPL
Copy link

https://www.feizhuip.com/News-getInfo-id-1307.html 这个也许不错

@TophTab
Copy link

TophTab commented Dec 23, 2020

可以看看这个,蜻蜓的免费
https://proxy.horocn.com/free-china-proxy/all.html?page=lr&max_id=2N

另外大佬,用docker搭在云服务器上,命令里的redis是改成自己的吗?

@jwdeaa
Copy link

jwdeaa commented Feb 16, 2021

A new proxy list: http://pzzqz.com/

@jhao104
Copy link
Owner Author

jhao104 commented Apr 2, 2021

A new proxy list: http://pzzqz.com/

已添加

@jingshaoqi
Copy link

@jhao104
Copy link
Owner Author

jhao104 commented Dec 27, 2021

https://zhimahttp.com/?utm-source=bdtg&utm-keyword=?400359 芝麻免费代理

他这个免费的代码很挫,更新时间都很久了

@julianghttp

This comment was marked as spam.

@jhao104 jhao104 closed this as completed Aug 2, 2022
@xswwxx
Copy link

xswwxx commented Dec 31, 2022

https://openproxylist.xyz/http.txt 这种的添加模式要怎么弄

@CaoYunzhou
Copy link

https://openproxylist.xyz/http.txt 这种的添加模式要怎么弄

@staticmethod
def freeProxy17():
    urls = [
        'https://openproxylist.xyz/http.txt',
        'http://pubproxy.com/api/proxy?limit=3&format=txt&http=true&type=https',
        'https://www.proxy-list.download/api/v1/get?type=https',
        'https://raw.githubusercontent.com/shiftytr/proxy-list/master/proxy.txt'
    ]
    request = WebRequest()
    for url in urls:
        r = request.get(url, timeout=20)
        for proxy in r.text.split('\n'):
            if proxy:
                yield proxy

@djme0
Copy link

djme0 commented Jun 22, 2023

https://uu-proxy.com/
https://www.proxyscan.io/
http://www.kxdaili.com/dailiip.html
https://www.xsdaili.cn/

@xiumao-cat
Copy link

https://ip.uqidata.com/free/index.html
https://www.69ip.cn/?page=3
https://proxy.ip3366.net/free/
https://www.binglx.cn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests