-
Notifications
You must be signed in to change notification settings - Fork 7
APIs
Each API has the following usage limits (thresholds), please check if you are exceeding these limits if you start receiving the HTTP response status Error 429 too many requests:
- Arquivo.pt API (Full-text & URL search): 250 requests per minute
- Image Search API v1.1 (beta version): 400 requests per minute
- CDX-server API (URL search): 250 requests per minute
- Memento API (URL search): 400 requests per minute
- Training module on Automatic processing of information preserved from the Web (module C)
If you need to download a large amount of web-archived resources, such as all the URLs archived from a large website along time, we suggest the following methodology:
-
Download the CDXJ index files, (what is CDXJ?) of the Arquivo.pt collections you selected to process. For this purpose, analyse the "column A: Collection ID" and the corresponding CDXJ index files on "column H: Collection CDXJ File");
-
Create a list of selected URLs to be downloaded, extracted from the CDXJ index files (e.g. using Linux grep command);
-
Download the web-archived resources for the list of selected URLs from Arquivo.pt by using the above APIs or, by building links to directly access the web-archived resources. These links are available on the Technical details of the Options top-right menu when accessing a web-archived page. For instance, for the URL http://publico.pt/ with timestamp 20120201160355 extracted from the CDXJ index file, build the following links to download the:
- original file of the web-archived page (loses replay quality because the original internal links are not rewritten to reference web-archived images or stylesheets): https://arquivo.pt/noFrame/replay/20120201160355id_/http://publico.pt/
- web-archived page without the Arquivo.pt UI frame (internal links are rewritten to reference web-archived resources): https://arquivo.pt/noFrame/replay/20120201160355/http://publico.pt/
- original file of the web-archived page/web-archived page without the Arquivo.pt UI frame (endpoint https://arquivo.pt/noFrame/replay/): 4437 requests/minute. ** If the client exceeds this limit, it will receive an error "HTTP 429 Too many requests" and should decrease its download rate.
If you have any trouble using our APIs, please contact us so that we can try to help you.
Short link to this page: arquivo.pt/api