Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added threads for download files #621

Merged
merged 4 commits into from
Sep 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ ENV S3_KEY ""
ENV S3_PRIVATE_KEY ""
ENV FlAG "all"
ENV EXT ""
ENV THREADS "8"

RUN mkdir -pv ~/.s3
RUN gem install bundler
Expand Down
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8

source 'https://rubygems.org'
gem 'concurrent-ruby'
gem 'onlyoffice_s3_wrapper', '>= 0.6.0'
gem 'rake'
gem 'webrick'
Expand Down
3 changes: 3 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ GEM
aws-sigv4 (1.6.0)
aws-eventstream (~> 1, >= 1.0.2)
base64 (0.1.1)
concurrent-ruby (1.2.2)
jmespath (1.6.2)
json (2.6.3)
language_server-protocol (3.17.0.3)
Expand Down Expand Up @@ -71,9 +72,11 @@ GEM
webrick (1.8.1)

PLATFORMS
arm64-darwin-22
x86_64-linux

DEPENDENCIES
concurrent-ruby
onlyoffice_s3_wrapper (>= 0.6.0)
rake
rubocop
Expand Down
22 changes: 14 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,25 +31,28 @@ the files will be downloaded to Downloads\s3_files folder
#### Additional options for downloading files, add to the start command

``-e EXT=<download extension>`` - To download files by extension.

``-e THREADS=<download extension>`` - Sets the number of threads to download files.
By default 8.
An example of a startup with additional options

* For linux

```bash
docker run -v <the path to the downloaded files>:/downloader/tmp \
-e S3_KEY=<is a public s3 key for getting files> \
-e S3_PRIVATE_KEY=<is a private s3 key for getting files> \
-e EXT=ppt onlyoffice/s3_file_downloader:latest
-e S3_PRIVATE_KEY=<is a private s3 key for getting files> \
-e EXT=ppt onlyoffice/s3_file_downloader:latest \
-e THREADS=12
```

* For Windows

```bash
docker run -v <the path to the downloaded files>:/downloader/tmp ^
-e S3_KEY=<is a public s3 key for getting files> ^
-e S3_PRIVATE_KEY=<is a private s3 key for getting files> ^
-e EXT=ppt onlyoffice/s3_file_downloader:latest
-e S3_PRIVATE_KEY=<is a private s3 key for getting files> ^
-e EXT=ppt onlyoffice/s3_file_downloader:latest ^
-e THREADS=12
```

## Running without a docker
Expand All @@ -64,18 +67,21 @@ An example of a startup with additional options

### Usage

To change the number of file download threads,
change the `threads` parameter in `./config.json`.

``rake download[all]`` - To download all files

``rake download[file]`` - To download files. Reading the array of files comes
from "./data/files_to_download.list"
from `./data/files_to_download.list`

``rake download[ext,your extension]`` - To download files by extension.
You must specify the extension, the second parameter.

``rake download[arrext]`` - To download files by extensions from array.
The array is in the file "./data/static_data.rb". Change "EXTENSION_ARRAY"
The array is in the file `./config.json`. Change `extensions_array`.

``rake download[arrfile]`` - To download files by file names from array.
The array is in the file "./data/static_data.rb". Change "FILE_NAMES_ARRAY".')
The array is in the file ``./config.json``. Change `file_names_array`.

### the files will be downloaded to the tmp folder
6 changes: 4 additions & 2 deletions Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ desc 'Download files'
task :download, :download_flag, :extension do |_t, args|
download_flag = args[:download_flag].to_sym
extension = args[:extension].to_s
Downloader.new.download_with_options(download_flag, extension)
threads = JSON.load_file(File.join(Dir.pwd, 'config.json'))['threads'].to_i
Downloader.new(threads_count: threads).download_with_options(download_flag, extension)
end

desc 'Download files for docker'
Expand All @@ -18,6 +19,7 @@ task :docker do |_t|
:ext
end
extension = ENV['EXT'].to_s
threads = ENV['THREADS'].to_i
Config.init_s3_from_env
Downloader.new.download_with_options(download_flag, extension)
Downloader.new(threads_count: threads).download_with_options(download_flag, extension)
end
5 changes: 5 additions & 0 deletions config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"threads": "8",
"extensions_array": [],
"file_names_array": []
}
4 changes: 2 additions & 2 deletions data/static_data.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

# class with some constants and static data
class StaticData
EXTENSION_ARRAY = %w[doc csv].freeze
FILE_NAMES_ARRAY = ['doc/(NS)-CHAUZIMU-MWA-CHILENGEDWE.doc', 'doc/01 - Font (2).doc', 'doc/01 - Font.doc'].freeze
EXTENSION_ARRAY = JSON.load_file(File.join(Dir.pwd, 'config.json'))['extensions_array'].freeze
FILE_NAMES_ARRAY = JSON.load_file(File.join(Dir.pwd, 'config.json'))['file_names_array'].freeze
end
76 changes: 42 additions & 34 deletions lib/main.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,50 +3,36 @@
require 'logger'
require 'onlyoffice_s3_wrapper'
require_relative '../data/static_data'
require 'concurrent-ruby'

# Methods for download files
class Downloader
def initialize
def initialize(threads_count: 1)
@threads_count = threads_count
@tmp_dir = './tmp'
FileUtils.makedirs(@tmp_dir)
@logger = Logger.new("#{@tmp_dir}/Failed_download_log")
@logger_stdout = Logger.new($stdout)
end

def s3
@s3 ||= OnlyofficeS3Wrapper::AmazonS3Wrapper.new(bucket_name: 'conversion-testing-files', region: 'us-east-1')
end

# The method checks the existence of the directory,
# and if it does not exist, creates a new one using the name as a parameter
def create_dir(dir_name)
return if File.exist? dir_name

FileUtils.makedirs(dir_name)
puts "Directory #{dir_name} created"
end

def download(array_of_files)
array_of_files.each do |filename|
dir_name = filename.split('/')[0]
if File.exist? "#{@tmp_dir}/#{filename}"
@logger_stdout.info("File `#{filename}` already downloaded")
else
create_dir("#{@tmp_dir}/#{dir_name}")
@logger_stdout.info("Starting to download a file: #{filename}")
begin
s3.download_file_by_name(filename, "#{@tmp_dir}/#{dir_name}")
rescue StandardError => e
@logger_stdout.error("Error: '#{e}' happened while downloading #{filename}")
@logger.error("Error: '#{e}' happened while downloading #{filename}")
end
def download_file(filename)
dir_name = filename.split('/')[0]
if File.exist? "#{@tmp_dir}/#{filename}"
@logger_stdout.info("File `#{filename}` already downloaded")
else
create_dir("#{@tmp_dir}/#{dir_name}")
@logger_stdout.info("Starting to download a file: #{filename}")
begin
s3.download_file_by_name(filename, "#{@tmp_dir}/#{dir_name}")
rescue StandardError => e
@logger_stdout.error("Error: '#{e}' happened while downloading #{filename}")
@logger.error("Error: '#{e}' happened while downloading #{filename}")
end
end
end

def download_all
array_of_files = s3.get_files_by_prefix
download(array_of_files)
download(s3.get_files_by_prefix)
end

def download_from_file
Expand All @@ -61,14 +47,12 @@ def download_from_file
end

def download_by_extension(extension)
array_of_files = s3.files_from_folder(extension.to_s)
download(array_of_files)
download(s3.files_from_folder(extension.to_s))
end

def download_by_array_extension
StaticData::EXTENSION_ARRAY.each do |extension|
array_of_files = s3.files_from_folder(extension.to_s)
download(array_of_files)
download(s3.files_from_folder(extension.to_s))
end
end

Expand Down Expand Up @@ -96,4 +80,28 @@ def download_with_options(download_flag, extension)
puts(message)
end
end

private

def s3
@s3 ||= OnlyofficeS3Wrapper::AmazonS3Wrapper.new(bucket_name: 'conversion-testing-files', region: 'us-east-1')
end

# The method checks the existence of the directory,
# and if it does not exist, creates a new one using the name as a parameter
def create_dir(dir_name)
return if File.exist? dir_name

FileUtils.makedirs(dir_name)
puts "Directory #{dir_name} created"
end

def download(array_of_files)
thread_pool = Concurrent::FixedThreadPool.new(@threads_count.to_i)
array_of_files.each do |filename|
thread_pool.post { download_file(filename) }
end
thread_pool.shutdown
thread_pool.wait_for_termination
end
end