[Feature]: allow proxy parameter to load models/embeddings #3080

Guust-Franssens · 2023-02-03T08:46:27Z

Problem statement

At the time of writing, I don't see a way to download models when the machine is situated behind a Proxy Server.
This issue has been raised before (#1249), however it was closed due to staleness (and not fixed).

Solution

For methods that allow for a download, allow for a proxy configuration to be passed. This can then be passed to the method that handles downloading from URL.

I believe the following snippet handles the downloading, but correct me if I am wrong:

flair/flair/file_utils.py

Lines 223 to 269 in a22e70a

    
           def get_from_cache(url: str, cache_dir: Path) -> Path: 
        
               """ 
        
               Given a URL, look for the corresponding dataset in the local cache. 
        
               If it's not there, download it. Then return the path to the cached file. 
        
               """ 
        
               cache_dir.mkdir(parents=True, exist_ok=True) 
        
               filename = re.sub(r".+/", "", url) 
        
               # get cache path to put the file 
        
               cache_path = cache_dir / filename 
        
               if cache_path.exists(): 
        
                   return cache_path 
        
               # make HEAD request to check ETag 
        
               response = requests.head(url, headers={"User-Agent": "Flair"}, allow_redirects=True) 
        
               if response.status_code != 200: 
        
                   raise IOError(f"HEAD request failed for url {url} with status code {response.status_code}.") 
        
               # add ETag to filename if it exists 
        
               # etag = response.headers.get("ETag") 
        
               if not cache_path.exists(): 
        
                   # Download to temporary file, then copy to cache dir once finished. 
        
                   # Otherwise you get corrupt cache entries if the download gets interrupted. 
        
                   fd, temp_filename = tempfile.mkstemp() 
        
                   logger.info("%s not found in cache, downloading to %s", url, temp_filename) 
        
                   # GET file object 
        
                   req = requests.get(url, stream=True, headers={"User-Agent": "Flair"}) 
        
                   content_length = req.headers.get("Content-Length") 
        
                   total = int(content_length) if content_length is not None else None 
        
                   progress = Tqdm.tqdm(unit="B", total=total, unit_scale=True, unit_divisor=1024) 
        
                   with open(temp_filename, "wb") as temp_file: 
        
                       for chunk in req.iter_content(chunk_size=1024): 
        
                           if chunk:  # filter out keep-alive new chunks 
        
                               progress.update(len(chunk)) 
        
                               temp_file.write(chunk) 
        
                   progress.close() 
        
                   logger.info("copying %s to cache at %s", temp_filename, cache_path) 
        
                   shutil.copyfile(temp_filename, str(cache_path)) 
        
                   logger.info("removing temp file %s", temp_filename) 
        
                   os.close(fd) 
        
                   os.remove(temp_filename) 
        
               return cache_path

Allowing for proxy can be done simply with requests. However the parameter for the proxy would need to be added to all methods that download over URL or be able to get set as a config in someway.

http_proxy  = "http://10.10.1.10:3128"
https_proxy = "https://10.10.1.11:1080"
ftp_proxy   = "ftp://10.10.1.10:3128"

proxies = { 
              "http"  : http_proxy, 
              "https" : https_proxy, 
              "ftp"   : ftp_proxy
            }

r = requests.get(url, headers=headers, proxies=proxies)

source: https://stackoverflow.com/questions/8287628/proxies-with-python-requests-module

I might try to contribute later if I have time.

helpmefindaname · 2023-02-06T11:45:18Z

Hi @Guust-Franssens I can see how this adds value, so I implemented a draft for it: #3082 however I have no proxy setup so I cannot fully test it. Can you try it out?

Guust-Franssens · 2023-02-06T12:06:09Z

Hey @helpmefindaname,

Tested it and it works!
Code used for testing:

def set_proxies(proxies: typing.Dict[str, str]) -> None:
    """
    Allows for data downloaded from urls to be forwarded to a proxy, see https://requests.readthedocs.io/en/latest/user/advanced/#proxies
    :param proxies: A dictionary of proxies according to the requests documentation.
    :return: None
    """
    global url_proxies
    url_proxies = proxies

set_proxies({"http": "...", "https": "..."})

# test on random flair embedding
req = requests.get("https://flair.informatik.hu-berlin.de/resources/embeddings/flair/lm-pt-forward.pt", stream=True, headers={"User-Agent": "Flair"}, proxies=url_proxies)

print(req)

>>> <Response [200]>

gh-3080: add functionality for using proxies

Guust-Franssens added the feature A new feature label Feb 3, 2023

Guust-Franssens closed this as completed Feb 6, 2023

helpmefindaname mentioned this issue Feb 6, 2023

gh-3080: add functionality for using proxies #3082

Merged

alanakbik added a commit that referenced this issue Feb 7, 2023

Merge pull request #3082 from flairNLP/gh-3080/proxy-url

990d92c

gh-3080: add functionality for using proxies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: allow proxy parameter to load models/embeddings #3080

[Feature]: allow proxy parameter to load models/embeddings #3080

Guust-Franssens commented Feb 3, 2023 •

edited

Loading

helpmefindaname commented Feb 6, 2023

Guust-Franssens commented Feb 6, 2023

[Feature]: allow proxy parameter to load models/embeddings #3080

[Feature]: allow proxy parameter to load models/embeddings #3080

Comments

Guust-Franssens commented Feb 3, 2023 • edited Loading

Problem statement

Solution

helpmefindaname commented Feb 6, 2023

Guust-Franssens commented Feb 6, 2023

Guust-Franssens commented Feb 3, 2023 •

edited

Loading