-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elizaveta Shved - 0 #38
base: master
Are you sure you want to change the base?
Changes from 17 commits
f7267d5
b6ebd01
4bc09fb
d9e582a
7b2122c
a2e931a
0b070ae
3dfc96c
934fe1d
56baf47
3acc415
4bf4ee0
67cc7fd
f863674
83692eb
7a64cd2
e552a14
40edb65
23476dc
f0db675
428a448
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,7 @@ | ||
.DS_Store | ||
*.mov | ||
.idea | ||
test.db | ||
/data/ | ||
data_parser.log | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Your pull request MUST NOT change global gitignore There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
ruby-2.5.1 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
<html> | ||
<head><title>404 Not Found</title></head> | ||
<body> | ||
<center><h1>404 Not Found</h1></center> | ||
<hr><center>nginx</center> | ||
</body> | ||
</html> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
<html> | ||
<head><title>404 Not Found</title></head> | ||
<body> | ||
<center><h1>404 Not Found</h1></center> | ||
<hr><center>nginx</center> | ||
</body> | ||
</html> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
<html> | ||
<head><title>404 Not Found</title></head> | ||
<body> | ||
<center><h1>404 Not Found</h1></center> | ||
<hr><center>nginx</center> | ||
</body> | ||
</html> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
<html> | ||
<head><title>404 Not Found</title></head> | ||
<body> | ||
<center><h1>404 Not Found</h1></center> | ||
<hr><center>nginx</center> | ||
</body> | ||
</html> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
<html> | ||
<head><title>404 Not Found</title></head> | ||
<body> | ||
<center><h1>404 Not Found</h1></center> | ||
<hr><center>nginx</center> | ||
</body> | ||
</html> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
<html> | ||
<head><title>404 Not Found</title></head> | ||
<body> | ||
<center><h1>404 Not Found</h1></center> | ||
<hr><center>nginx</center> | ||
</body> | ||
</html> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
<html> | ||
<head><title>404 Not Found</title></head> | ||
<body> | ||
<center><h1>404 Not Found</h1></center> | ||
<hr><center>nginx</center> | ||
</body> | ||
</html> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
require 'rubygems' | ||
require 'roo' | ||
require 'roo-xls' | ||
require 'sqlite3' | ||
require 'logger' | ||
|
||
COST_INDEX = 3 | ||
REGION_INDEX = 2 | ||
DATE_FOR_DENOMIZATION = 148_218_120_0 | ||
AMOUNT_FOR_DENOMIZATION = 10_000.0 | ||
REGIONS = [{ region_name: 'Брестская область', row_number: 6 }, | ||
{ region_name: 'Витебская область', row_number: 8 }, | ||
{ region_name: 'Гомельская область', row_number: 10 }, | ||
{ region_name: 'Гродненская область', row_number: 12 }, | ||
{ region_name: 'Минск', row_number: 14 }, | ||
{ region_name: 'Минская область', row_number: 16 }, | ||
{ region_name: 'Могилевская область', row_number: 18 }].freeze | ||
|
||
ROW_DATA_TYPES = { '0': String, '6': Numeric, '8': Numeric, '10': Numeric, | ||
elishved marked this conversation as resolved.
Show resolved
Hide resolved
|
||
'12': Numeric, '14': Numeric, '16': Numeric, '18': Numeric }.freeze | ||
|
||
class DataParser | ||
def perform_files | ||
elishved marked this conversation as resolved.
Show resolved
Hide resolved
|
||
xls_files = Dir['./data/*.xls'] | ||
xlsx_files = Dir['./data/*.xlsx'] | ||
puts 'Perform data:' | ||
xlsx_files.each do |path| | ||
print '.' | ||
date = convert_to_unix_date(path) | ||
table = Roo::Spreadsheet.open(path) | ||
perform_rows(table, date) | ||
rescue Zip::Error => e | ||
logger.warn "Problem with parser, error: #{e}, file - #{path} " | ||
end | ||
|
||
xls_files.each do |path| | ||
print '.' | ||
date = convert_to_unix_date(path) | ||
table = Roo::Excel.new(path) | ||
perform_rows(table, date) | ||
rescue Zip::Error => e | ||
logger.warn "Problem with parser, error: #{e}, file - #{path} " | ||
rescue SQLite3::SQLException => e | ||
logger.warn "Problem with parser, error: #{e}, file - #{path} " | ||
rescue Ole::Storage::FormatError => e | ||
logger.warn "Problem with parser, error: #{e}, file - #{path} " | ||
end | ||
end | ||
|
||
def perform_rows(table, date) | ||
elishved marked this conversation as resolved.
Show resolved
Hide resolved
elishved marked this conversation as resolved.
Show resolved
Hide resolved
elishved marked this conversation as resolved.
Show resolved
Hide resolved
|
||
(9..table.last_row).each do |number| | ||
elishved marked this conversation as resolved.
Show resolved
Hide resolved
|
||
row = table.row(number) | ||
check_data = validate_row_data(row) | ||
next unless check_data | ||
|
||
product_name = row[0].downcase | ||
regions_cost = REGIONS.map { |region| row[region[:row_number]] } | ||
|
||
if date < DATE_FOR_DENOMIZATION | ||
regions_cost = regions_cost.map { |region_cost| region_cost / AMOUNT_FOR_DENOMIZATION unless region_cost.nil? } | ||
end | ||
|
||
REGIONS.each_with_index do |elem, index| | ||
@db.execute "INSERT INTO Items (name, region, price, date) VALUES('#{product_name}','#{elem[:region_name]}','#{regions_cost[index]}', '#{date}')" | ||
elishved marked this conversation as resolved.
Show resolved
Hide resolved
|
||
end | ||
end | ||
end | ||
|
||
def perform_data | ||
FileUtils.touch 'site_parsing_data.db' | ||
@db = SQLite3::Database.open 'site_parsing_data.db' | ||
@db.execute 'CREATE TABLE IF NOT EXISTS Items(Id INTEGER PRIMARY KEY AUTOINCREMENT, | ||
Name Text, Region TEXT, Price REAL, Date INTEGER )' | ||
perform_files | ||
rescue SQLite3::Exception => e | ||
puts 'Exception occurred' | ||
puts e | ||
ensure | ||
@db&.close | ||
end | ||
|
||
def convert_to_unix_date(path) | ||
date_keys = File.basename(path, '.*').split('_') | ||
month = date_keys[0].to_i | ||
year = date_keys[1].to_i | ||
Date.new(year, month, 1).to_time.to_i | ||
end | ||
|
||
def logger | ||
@logger ||= Logger.new('data_parser.log') | ||
end | ||
end | ||
|
||
private | ||
|
||
def validate_row_data(row) | ||
ROW_DATA_TYPES.each do |key, data_type| | ||
return false unless row[key.to_s.to_i].is_a?(data_type) | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
require 'rubygems' | ||
require 'sqlite3' | ||
|
||
class DbRequester | ||
FILE_NAME = 'site_parsing_data.db'.freeze | ||
def initialize(product_name) | ||
@product_name = product_name | ||
@db = SQLite3::Database.open FILE_NAME | ||
end | ||
|
||
def request | ||
elishved marked this conversation as resolved.
Show resolved
Hide resolved
elishved marked this conversation as resolved.
Show resolved
Hide resolved
elishved marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if check_record | ||
last_time | ||
lowest_cost_item | ||
maximim_cost_item | ||
similar_price | ||
else | ||
puts "#{@product_name} can not be found in database." | ||
end | ||
rescue SQLite3::Exception => e | ||
puts 'Exception occurred' | ||
puts e | ||
ensure | ||
@db&.close | ||
end | ||
|
||
private | ||
|
||
def lowest_cost_item | ||
str = @product_name.downcase | ||
response = @db.execute "SELECT * FROM Items WHERE Name LIKE '#{str} %' ORDER BY Price ASC LIMIT(1)" | ||
record = response.last | ||
date = revert_unix_date(record.last) | ||
cost = record[COST_INDEX] | ||
region = record[REGION_INDEX] | ||
puts "Lowest was on #{date.year}/#{date.month} at price #{cost} BYN in #{region}" | ||
end | ||
|
||
def maximim_cost_item | ||
str = @product_name.downcase | ||
response = @db.execute "SELECT * FROM Items WHERE Name LIKE '#{str} %' ORDER BY Price DESC LIMIT(1)" | ||
record = response.last | ||
date = revert_unix_date(record.last) | ||
cost = record[COST_INDEX] | ||
region = record[REGION_INDEX] | ||
puts "Maximum was on #{date.year}/#{date.month} at price #{cost} BYN in #{region}" | ||
end | ||
|
||
def similar_price | ||
response = @db.execute "SELECT DISTINCT Name FROM Items WHERE Price < '#{@last_time_cost + 0.5}' AND Price > '#{@last_time_cost - 0.5}' LIMIT(2)" | ||
elishved marked this conversation as resolved.
Show resolved
Hide resolved
|
||
puts "For similar price you also can afford #{response[1][0].capitalize} and #{response[0][0].capitalize}" | ||
end | ||
|
||
def check_record | ||
str = @product_name.downcase | ||
response = @db.execute "SELECT * FROM Items WHERE Name LIKE '#{str} %' ORDER BY Date DESC LIMIT(1)" | ||
response.first.nil? ? false : true | ||
end | ||
|
||
def last_time | ||
str = @product_name.downcase | ||
response = @db.execute "SELECT * FROM Items WHERE Name LIKE '#{str} %' ORDER BY Date DESC LIMIT(1)" | ||
@last_time_cost = response.last[COST_INDEX] | ||
region = response.last[REGION_INDEX] | ||
puts "'#{@product_name.capitalize}' is #{@last_time_cost} BYN in #{region} these days." | ||
end | ||
|
||
def revert_unix_date(unix_date) | ||
Time.at(unix_date) | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
require 'rubygems' | ||
require 'faraday' | ||
require 'nokogiri' | ||
require 'open-uri' | ||
|
||
MONTHES = { 'январь': '01', 'февраль': '02', 'март': '03', 'апрель': '04', 'май': '05', 'июнь': '06', | ||
'июль': '07', 'август': '08', 'сентябрь': '09', 'октябрь': '10', 'ноябрь': '11', 'декабрь': '12' }.freeze | ||
PARSING_URL = 'http://www.belstat.gov.by/ofitsialnaya-statistika/makroekonomika-i-okruzhayushchaya-sreda/tseny/operativnaya-informatsiya_4/srednie-tseny-na-potrebitelskie-tovary-i-uslugi-po-respublike-belarus/'.freeze | ||
SITE_URL = 'http://www.belstat.gov.by'.freeze | ||
|
||
class DownloadSheets | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Metrics/ClassLength: Class has too many lines. [155/100] |
||
def perform_data | ||
puts 'Download files:' | ||
links = get_data | ||
links.each do |date, link| | ||
puts "Perform #{date}" | ||
type = link.index('xlsx').nil? ? 'xls' : 'xlsx' | ||
download_file(date, generate_link(link), type) | ||
end | ||
end | ||
|
||
private | ||
|
||
def generate_link(link) | ||
link = "#{SITE_URL}#{link}" unless link.include?(SITE_URL) | ||
URI.encode(link) | ||
elishved marked this conversation as resolved.
Show resolved
Hide resolved
|
||
end | ||
|
||
def get_data | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Metrics/MethodLength: Method has too many lines. [11/10] |
||
file_links = get_file_links | ||
data_links = {} | ||
file_links.each do |link| | ||
file_link = link.attributes['href'].value | ||
year = find_year(file_link) | ||
next if year.nil? | ||
|
||
month_key = link.children.first.text[3..-1] | ||
month_value = MONTHES[month_key.to_sym] | ||
data_links["#{month_value}_#{year}"] = file_link | ||
end | ||
data_links | ||
end | ||
|
||
def get_file_links | ||
elishved marked this conversation as resolved.
Show resolved
Hide resolved
|
||
site_link = PARSING_URL | ||
page = Faraday.get(site_link) | ||
html = Nokogiri::HTML(page.body) | ||
html.css('.l-main').css('.table').first.css('a') | ||
end | ||
|
||
def find_year(link) | ||
max_count_iteration = 1 | ||
result = loop do | ||
index = link.index('20') | ||
return nil if index.nil? || max_count_iteration > 5 | ||
return link[index..index + 3] unless link[index..index + 3].match(/^(\d)+$/).nil? | ||
|
||
link = link[index + 4..-1] | ||
max_count_iteration += 1 | ||
end | ||
result | ||
end | ||
|
||
def download_file(date, link, type) | ||
response = Faraday.get(link) | ||
File.write("./data/#{date}.#{type}", response.body) | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
require 'rubygems' | ||
require 'sqlite3' | ||
require './data_parser.rb' | ||
require './download_sheets.rb' | ||
require './db_requester.rb' | ||
|
||
FILES_COUNT = 100 | ||
|
||
class Waiter | ||
def check_files | ||
FileUtils.mkdir_p './data' | ||
DownloadSheets.new.perform_data if Dir['./data/*.*'].count < FILES_COUNT | ||
end | ||
|
||
def check_data | ||
File.new('site_parsing_data.db', 'a') | ||
@db = SQLite3::Database.open 'site_parsing_data.db' | ||
@db.execute 'CREATE TABLE IF NOT EXISTS Items(Id INTEGER PRIMARY KEY AUTOINCREMENT, | ||
Name Text, Region TEXT, Price REAL, Date INTEGER )' | ||
count = @db.execute 'SELECT count(*) FROM Items' | ||
count[0][0].positive? ? true : DataParser.new.perform_data | ||
end | ||
|
||
def looping | ||
check_files | ||
check_data | ||
loop do | ||
puts 'What price are you looking for?' | ||
DbRequester.new(gets.chomp).request | ||
end | ||
end | ||
end | ||
Waiter.new.looping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do NOT touch global gitignore