-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring of robotparser-rs #20
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
target | ||
Cargo.lock | ||
.vscode/ | ||
.idea/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
//! # Supported libraries | ||
//! To enable support for the required library, you need to add this feature to your `Cargo.toml`. | ||
//! Now only one library is supported - `reqwest`. | ||
//! But you can also add support for other libraries. | ||
|
||
use url::Origin; | ||
#[cfg(feature = "reqwest")] | ||
/// Support for reqwest library. | ||
pub mod reqwest; | ||
|
||
/// User agent of this crate. | ||
pub const DEFAULT_USER_AGENT: &str = "robotparser-rs (https://crates.io/crates/robotparser)"; | ||
|
||
/// Trait to fetch and parse the robots.txt file. | ||
/// Must be implemented on http-client. | ||
pub trait RobotsTxtClient { | ||
type Result; | ||
fn fetch_robots_txt(&self, origin: Origin) -> Self::Result; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
mod sync_reqwest; | ||
pub use self::sync_reqwest::*; | ||
mod async_reqwest; | ||
pub use self::async_reqwest::*; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
use reqwest::{Client, Request}; | ||
use reqwest::{Method, Error}; | ||
use reqwest::header::HeaderValue; | ||
use url::{Origin, Url}; | ||
use reqwest::header::USER_AGENT; | ||
use crate::http::{RobotsTxtClient, DEFAULT_USER_AGENT}; | ||
use crate::parser::{ParseResult, parse_fetched_robots_txt}; | ||
use crate::model::FetchedRobotsTxt; | ||
use std::pin::Pin; | ||
use futures::task::{Context, Poll}; | ||
use futures::Future; | ||
use futures::future::TryFutureExt; | ||
use futures::future::ok as future_ok; | ||
|
||
type FetchFuture = Box<dyn Future<Output=Result<(ResponseInfo, String), Error>>>; | ||
|
||
impl RobotsTxtClient for Client { | ||
type Result = RobotsTxtResponse; | ||
fn fetch_robots_txt(&self, origin: Origin) -> Self::Result { | ||
let url = format!("{}/robots.txt", origin.unicode_serialization()); | ||
let url = Url::parse(&url).expect("Unable to parse robots.txt url"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know this is not currently tested but maybe we can add a test for that wdyt ? |
||
let mut request = Request::new(Method::GET, url); | ||
let _ = request.headers_mut().insert(USER_AGENT, HeaderValue::from_static(DEFAULT_USER_AGENT)); | ||
let response = self | ||
.execute(request) | ||
.and_then(|response| { | ||
let response_info = ResponseInfo {status_code: response.status().as_u16()}; | ||
return response.text().and_then(|response_text| { | ||
return future_ok((response_info, response_text)); | ||
}); | ||
}); | ||
let response: Pin<Box<dyn Future<Output=Result<(ResponseInfo, String), Error>>>> = Box::pin(response); | ||
return RobotsTxtResponse { | ||
origin, | ||
response, | ||
} | ||
} | ||
} | ||
|
||
struct ResponseInfo { | ||
status_code: u16, | ||
} | ||
|
||
/// Future for fetching robots.txt result. | ||
pub struct RobotsTxtResponse { | ||
origin: Origin, | ||
response: Pin<FetchFuture>, | ||
} | ||
|
||
impl RobotsTxtResponse { | ||
/// Returns origin of robots.txt | ||
pub fn get_origin(&self) -> &Origin { | ||
return &self.origin; | ||
} | ||
} | ||
|
||
impl Future for RobotsTxtResponse { | ||
type Output = Result<ParseResult<FetchedRobotsTxt>, Error>; | ||
|
||
fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> { | ||
let self_mut = self.get_mut(); | ||
let response_pin = self_mut.response.as_mut(); | ||
match response_pin.poll(cx) { | ||
Poll::Ready(Ok((response_info, text))) => { | ||
let robots_txt = parse_fetched_robots_txt(self_mut.origin.clone(), response_info.status_code, &text); | ||
return Poll::Ready(Ok(robots_txt)); | ||
}, | ||
Poll::Ready(Err(error)) => { | ||
return Poll::Ready(Err(error)); | ||
}, | ||
Poll::Pending => { | ||
return Poll::Pending; | ||
}, | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
use reqwest::blocking::{Client, Request}; | ||
use reqwest::{Method, Error}; | ||
use reqwest::header::HeaderValue; | ||
use url::{Origin, Url}; | ||
use reqwest::header::USER_AGENT; | ||
use crate::http::{RobotsTxtClient, DEFAULT_USER_AGENT}; | ||
use crate::parser::{ParseResult, parse_fetched_robots_txt}; | ||
use crate::model::FetchedRobotsTxt; | ||
|
||
impl RobotsTxtClient for Client { | ||
type Result = Result<ParseResult<FetchedRobotsTxt>, Error>; | ||
fn fetch_robots_txt(&self, origin: Origin) -> Self::Result { | ||
let url = format!("{}/robots.txt", origin.unicode_serialization()); | ||
let url = Url::parse(&url).expect("Unable to parse robots.txt url"); | ||
let mut request = Request::new(Method::GET, url); | ||
let _ = request.headers_mut().insert(USER_AGENT, HeaderValue::from_static(DEFAULT_USER_AGENT)); | ||
let response = self.execute(request)?; | ||
let status_code = response.status().as_u16(); | ||
let text = response.text()?; | ||
let robots_txt = parse_fetched_robots_txt(origin, status_code, &text); | ||
return Ok(robots_txt); | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually I let the maintainer decide the next version release