Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: allow proxy parameter to load models/embeddings #3080

Closed
Guust-Franssens opened this issue Feb 3, 2023 · 2 comments
Closed

[Feature]: allow proxy parameter to load models/embeddings #3080

Guust-Franssens opened this issue Feb 3, 2023 · 2 comments
Labels
feature A new feature

Comments

@Guust-Franssens
Copy link

Guust-Franssens commented Feb 3, 2023

Problem statement

At the time of writing, I don't see a way to download models when the machine is situated behind a Proxy Server.
This issue has been raised before (#1249), however it was closed due to staleness (and not fixed).

Solution

For methods that allow for a download, allow for a proxy configuration to be passed. This can then be passed to the method that handles downloading from URL.

I believe the following snippet handles the downloading, but correct me if I am wrong:

flair/flair/file_utils.py

Lines 223 to 269 in a22e70a

def get_from_cache(url: str, cache_dir: Path) -> Path:
"""
Given a URL, look for the corresponding dataset in the local cache.
If it's not there, download it. Then return the path to the cached file.
"""
cache_dir.mkdir(parents=True, exist_ok=True)
filename = re.sub(r".+/", "", url)
# get cache path to put the file
cache_path = cache_dir / filename
if cache_path.exists():
return cache_path
# make HEAD request to check ETag
response = requests.head(url, headers={"User-Agent": "Flair"}, allow_redirects=True)
if response.status_code != 200:
raise IOError(f"HEAD request failed for url {url} with status code {response.status_code}.")
# add ETag to filename if it exists
# etag = response.headers.get("ETag")
if not cache_path.exists():
# Download to temporary file, then copy to cache dir once finished.
# Otherwise you get corrupt cache entries if the download gets interrupted.
fd, temp_filename = tempfile.mkstemp()
logger.info("%s not found in cache, downloading to %s", url, temp_filename)
# GET file object
req = requests.get(url, stream=True, headers={"User-Agent": "Flair"})
content_length = req.headers.get("Content-Length")
total = int(content_length) if content_length is not None else None
progress = Tqdm.tqdm(unit="B", total=total, unit_scale=True, unit_divisor=1024)
with open(temp_filename, "wb") as temp_file:
for chunk in req.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
progress.update(len(chunk))
temp_file.write(chunk)
progress.close()
logger.info("copying %s to cache at %s", temp_filename, cache_path)
shutil.copyfile(temp_filename, str(cache_path))
logger.info("removing temp file %s", temp_filename)
os.close(fd)
os.remove(temp_filename)
return cache_path

Allowing for proxy can be done simply with requests. However the parameter for the proxy would need to be added to all methods that download over URL or be able to get set as a config in someway.

http_proxy  = "http://10.10.1.10:3128"
https_proxy = "https://10.10.1.11:1080"
ftp_proxy   = "ftp://10.10.1.10:3128"

proxies = { 
              "http"  : http_proxy, 
              "https" : https_proxy, 
              "ftp"   : ftp_proxy
            }

r = requests.get(url, headers=headers, proxies=proxies)

source: https://stackoverflow.com/questions/8287628/proxies-with-python-requests-module

I might try to contribute later if I have time.

@Guust-Franssens Guust-Franssens added the feature A new feature label Feb 3, 2023
@helpmefindaname
Copy link
Collaborator

Hi @Guust-Franssens I can see how this adds value, so I implemented a draft for it: #3082 however I have no proxy setup so I cannot fully test it. Can you try it out?

@Guust-Franssens
Copy link
Author

Hey @helpmefindaname,

Tested it and it works!
Code used for testing:

def set_proxies(proxies: typing.Dict[str, str]) -> None:
    """
    Allows for data downloaded from urls to be forwarded to a proxy, see https://requests.readthedocs.io/en/latest/user/advanced/#proxies
    :param proxies: A dictionary of proxies according to the requests documentation.
    :return: None
    """
    global url_proxies
    url_proxies = proxies

set_proxies({"http": "...", "https": "..."})

# test on random flair embedding
req = requests.get("https://flair.informatik.hu-berlin.de/resources/embeddings/flair/lm-pt-forward.pt", stream=True, headers={"User-Agent": "Flair"}, proxies=url_proxies)

print(req)

>>> <Response [200]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature
Projects
None yet
Development

No branches or pull requests

2 participants