Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elizaveta Shved - 0 #38

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
.DS_Store
*.mov
.idea
test.db
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do NOT touch global gitignore

/data/
data_parser.log

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your pull request MUST NOT change global gitignore

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This

1 change: 1 addition & 0 deletions Elizaveta_Shved/Elizaveta Shved : 0 :/.ruby-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ruby-2.5.1
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
7 changes: 7 additions & 0 deletions Elizaveta_Shved/Elizaveta Shved : 0 :/data/02_2014.xls
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>
Binary file not shown.
7 changes: 7 additions & 0 deletions Elizaveta_Shved/Elizaveta Shved : 0 :/data/02_2016.xlsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
7 changes: 7 additions & 0 deletions Elizaveta_Shved/Elizaveta Shved : 0 :/data/03_2016.xlsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
7 changes: 7 additions & 0 deletions Elizaveta_Shved/Elizaveta Shved : 0 :/data/05_2018.xlsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
7 changes: 7 additions & 0 deletions Elizaveta_Shved/Elizaveta Shved : 0 :/data/07_2018.xlsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
7 changes: 7 additions & 0 deletions Elizaveta_Shved/Elizaveta Shved : 0 :/data/11_2015.xlsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
7 changes: 7 additions & 0 deletions Elizaveta_Shved/Elizaveta Shved : 0 :/data/12_2018.xlsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>
100 changes: 100 additions & 0 deletions Elizaveta_Shved/Elizaveta Shved : 0 :/data_parser.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
require 'rubygems'
require 'roo'
require 'roo-xls'
require 'sqlite3'
require 'logger'

COST_INDEX = 3
REGION_INDEX = 2
DATE_FOR_DENOMIZATION = 148_218_120_0
AMOUNT_FOR_DENOMIZATION = 10_000.0
REGIONS = [{ region_name: 'Брестская область', row_number: 6 },
{ region_name: 'Витебская область', row_number: 8 },
{ region_name: 'Гомельская область', row_number: 10 },
{ region_name: 'Гродненская область', row_number: 12 },
{ region_name: 'Минск', row_number: 14 },
{ region_name: 'Минская область', row_number: 16 },
{ region_name: 'Могилевская область', row_number: 18 }].freeze

ROW_DATA_TYPES = { '0': String, '6': Numeric, '8': Numeric, '10': Numeric,
elishved marked this conversation as resolved.
Show resolved Hide resolved
'12': Numeric, '14': Numeric, '16': Numeric, '18': Numeric }.freeze

class DataParser
def perform_files
elishved marked this conversation as resolved.
Show resolved Hide resolved
xls_files = Dir['./data/*.xls']
xlsx_files = Dir['./data/*.xlsx']
puts 'Perform data:'
xlsx_files.each do |path|
print '.'
date = convert_to_unix_date(path)
table = Roo::Spreadsheet.open(path)
perform_rows(table, date)
rescue Zip::Error => e
logger.warn "Problem with parser, error: #{e}, file - #{path} "
end

xls_files.each do |path|
print '.'
date = convert_to_unix_date(path)
table = Roo::Excel.new(path)
perform_rows(table, date)
rescue Zip::Error => e
logger.warn "Problem with parser, error: #{e}, file - #{path} "
rescue SQLite3::SQLException => e
logger.warn "Problem with parser, error: #{e}, file - #{path} "
rescue Ole::Storage::FormatError => e
logger.warn "Problem with parser, error: #{e}, file - #{path} "
end
end

def perform_rows(table, date)
elishved marked this conversation as resolved.
Show resolved Hide resolved
elishved marked this conversation as resolved.
Show resolved Hide resolved
elishved marked this conversation as resolved.
Show resolved Hide resolved
(9..table.last_row).each do |number|
elishved marked this conversation as resolved.
Show resolved Hide resolved
row = table.row(number)
check_data = validate_row_data(row)
next unless check_data

product_name = row[0].downcase
regions_cost = REGIONS.map { |region| row[region[:row_number]] }

if date < DATE_FOR_DENOMIZATION
regions_cost = regions_cost.map { |region_cost| region_cost / AMOUNT_FOR_DENOMIZATION unless region_cost.nil? }
end

REGIONS.each_with_index do |elem, index|
@db.execute "INSERT INTO Items (name, region, price, date) VALUES('#{product_name}','#{elem[:region_name]}','#{regions_cost[index]}', '#{date}')"
elishved marked this conversation as resolved.
Show resolved Hide resolved
end
end
end

def perform_data
FileUtils.touch 'site_parsing_data.db'
@db = SQLite3::Database.open 'site_parsing_data.db'
@db.execute 'CREATE TABLE IF NOT EXISTS Items(Id INTEGER PRIMARY KEY AUTOINCREMENT,
Name Text, Region TEXT, Price REAL, Date INTEGER )'
perform_files
rescue SQLite3::Exception => e
puts 'Exception occurred'
puts e
ensure
@db&.close
end

def convert_to_unix_date(path)
date_keys = File.basename(path, '.*').split('_')
month = date_keys[0].to_i
year = date_keys[1].to_i
Date.new(year, month, 1).to_time.to_i
end

def logger
@logger ||= Logger.new('data_parser.log')
end
end

private

def validate_row_data(row)
ROW_DATA_TYPES.each do |key, data_type|
return false unless row[key.to_s.to_i].is_a?(data_type)
end
end
71 changes: 71 additions & 0 deletions Elizaveta_Shved/Elizaveta Shved : 0 :/db_requester.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
require 'rubygems'
require 'sqlite3'

class DbRequester
FILE_NAME = 'site_parsing_data.db'.freeze
def initialize(product_name)
@product_name = product_name
@db = SQLite3::Database.open FILE_NAME
end

def request
elishved marked this conversation as resolved.
Show resolved Hide resolved
elishved marked this conversation as resolved.
Show resolved Hide resolved
elishved marked this conversation as resolved.
Show resolved Hide resolved
if check_record
last_time
lowest_cost_item
maximim_cost_item
similar_price
else
puts "#{@product_name} can not be found in database."
end
rescue SQLite3::Exception => e
puts 'Exception occurred'
puts e
ensure
@db&.close
end

private

def lowest_cost_item
str = @product_name.downcase
response = @db.execute "SELECT * FROM Items WHERE Name LIKE '#{str} %' ORDER BY Price ASC LIMIT(1)"
record = response.last
date = revert_unix_date(record.last)
cost = record[COST_INDEX]
region = record[REGION_INDEX]
puts "Lowest was on #{date.year}/#{date.month} at price #{cost} BYN in #{region}"
end

def maximim_cost_item
str = @product_name.downcase
response = @db.execute "SELECT * FROM Items WHERE Name LIKE '#{str} %' ORDER BY Price DESC LIMIT(1)"
record = response.last
date = revert_unix_date(record.last)
cost = record[COST_INDEX]
region = record[REGION_INDEX]
puts "Maximum was on #{date.year}/#{date.month} at price #{cost} BYN in #{region}"
end

def similar_price
response = @db.execute "SELECT DISTINCT Name FROM Items WHERE Price < '#{@last_time_cost + 0.5}' AND Price > '#{@last_time_cost - 0.5}' LIMIT(2)"
elishved marked this conversation as resolved.
Show resolved Hide resolved
puts "For similar price you also can afford #{response[1][0].capitalize} and #{response[0][0].capitalize}"
end

def check_record
str = @product_name.downcase
response = @db.execute "SELECT * FROM Items WHERE Name LIKE '#{str} %' ORDER BY Date DESC LIMIT(1)"
response.first.nil? ? false : true
end

def last_time
str = @product_name.downcase
response = @db.execute "SELECT * FROM Items WHERE Name LIKE '#{str} %' ORDER BY Date DESC LIMIT(1)"
@last_time_cost = response.last[COST_INDEX]
region = response.last[REGION_INDEX]
puts "'#{@product_name.capitalize}' is #{@last_time_cost} BYN in #{region} these days."
end

def revert_unix_date(unix_date)
Time.at(unix_date)
end
end
68 changes: 68 additions & 0 deletions Elizaveta_Shved/Elizaveta Shved : 0 :/download_sheets.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
require 'rubygems'
require 'faraday'
require 'nokogiri'
require 'open-uri'

MONTHES = { 'январь': '01', 'февраль': '02', 'март': '03', 'апрель': '04', 'май': '05', 'июнь': '06',
'июль': '07', 'август': '08', 'сентябрь': '09', 'октябрь': '10', 'ноябрь': '11', 'декабрь': '12' }.freeze
PARSING_URL = 'http://www.belstat.gov.by/ofitsialnaya-statistika/makroekonomika-i-okruzhayushchaya-sreda/tseny/operativnaya-informatsiya_4/srednie-tseny-na-potrebitelskie-tovary-i-uslugi-po-respublike-belarus/'.freeze
SITE_URL = 'http://www.belstat.gov.by'.freeze

class DownloadSheets

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics/ClassLength: Class has too many lines. [155/100]

def perform_data
puts 'Download files:'
links = get_data
links.each do |date, link|
puts "Perform #{date}"
type = link.index('xlsx').nil? ? 'xls' : 'xlsx'
download_file(date, generate_link(link), type)
end
end

private

def generate_link(link)
link = "#{SITE_URL}#{link}" unless link.include?(SITE_URL)
URI.encode(link)
elishved marked this conversation as resolved.
Show resolved Hide resolved
end

def get_data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics/MethodLength: Method has too many lines. [11/10]
Naming/AccessorMethodName: Do not prefix reader method names with get_.

file_links = get_file_links
data_links = {}
file_links.each do |link|
file_link = link.attributes['href'].value
year = find_year(file_link)
next if year.nil?

month_key = link.children.first.text[3..-1]
month_value = MONTHES[month_key.to_sym]
data_links["#{month_value}_#{year}"] = file_link
end
data_links
end

def get_file_links
elishved marked this conversation as resolved.
Show resolved Hide resolved
site_link = PARSING_URL
page = Faraday.get(site_link)
html = Nokogiri::HTML(page.body)
html.css('.l-main').css('.table').first.css('a')
end

def find_year(link)
max_count_iteration = 1
result = loop do
index = link.index('20')
return nil if index.nil? || max_count_iteration > 5
return link[index..index + 3] unless link[index..index + 3].match(/^(\d)+$/).nil?

link = link[index + 4..-1]
max_count_iteration += 1
end
result
end

def download_file(date, link, type)
response = Faraday.get(link)
File.write("./data/#{date}.#{type}", response.body)
end
end
33 changes: 33 additions & 0 deletions Elizaveta_Shved/Elizaveta Shved : 0 :/run.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
require 'rubygems'
require 'sqlite3'
require './data_parser.rb'
require './download_sheets.rb'
require './db_requester.rb'

FILES_COUNT = 100

class Waiter
def check_files
FileUtils.mkdir_p './data'
DownloadSheets.new.perform_data if Dir['./data/*.*'].count < FILES_COUNT
end

def check_data
File.new('site_parsing_data.db', 'a')
@db = SQLite3::Database.open 'site_parsing_data.db'
@db.execute 'CREATE TABLE IF NOT EXISTS Items(Id INTEGER PRIMARY KEY AUTOINCREMENT,
Name Text, Region TEXT, Price REAL, Date INTEGER )'
count = @db.execute 'SELECT count(*) FROM Items'
count[0][0].positive? ? true : DataParser.new.perform_data
end

def looping
check_files
check_data
loop do
puts 'What price are you looking for?'
DbRequester.new(gets.chomp).request
end
end
end
Waiter.new.looping