Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Memory Size of Roo::Excelx::Cell::* classes #449

Merged
merged 1 commit into from
Sep 16, 2018

Conversation

chopraanmol1
Copy link
Member

One of the factor which determines the memsize of object is total no. of instance variables

Reducing no. of instance variables can also reduce memsize. To do so I've done two major things

  1. Remove unnecessary instance_variable link
  2. Created method attr_reader_with_default
    i. With help of this method we can remove instance variable which are static accross class e.g. type and in somecases cell_type
    ii. We can use better default to reduce memory e.g. style

Script

require 'roo'
require 'objspace'

coordinate = Roo::Excelx::Coordinate.new 1,1
base_date = Date.new(1899, 12, 30)

cells = []
formulas = [nil, "something"]#.reverse
styles = [1,2]#.reverse # Changing the sequence of this will result into different result
formulas.each do |formula|
  styles.each do |style|
    cells << ["formula=#{formula};style=#{style}", Roo::Excelx::Cell::String.new('Sample String',formula, style, nil, coordinate)]
    cells << ["formula=#{formula};style=#{style}", Roo::Excelx::Cell::Boolean.new('1',formula, style,nil,coordinate)]
    cells << ["formula=#{formula};style=#{style}", Roo::Excelx::Cell::Number.new('20.25', formula, [:numeric_or_formula, '%.2f'], style, nil, coordinate)]
    cells << ["formula=#{formula};style=#{style}", Roo::Excelx::Cell::Date.new(5, formula, [:numeric_or_formula, 'yyyy-mm-ddd'], style,nil,base_date,coordinate)]
    cells << ["formula=#{formula};style=#{style}", Roo::Excelx::Cell::DateTime.new(5.25, formula, [:numeric_or_formula, 'yyyy-mm-ddd hh:mm'], style,nil,base_date,coordinate)]
    cells << ["formula=#{formula};style=#{style}", Roo::Excelx::Cell::Time.new(0.25, formula, [:numeric_or_formula, 'hh:mm'], style,nil,base_date,coordinate)]
    cells << ["", Roo::Excelx::Cell::Empty.new(coordinate)]
  end
end

cells = cells.collect{|variant,cell| ["#{cell.class}(#{variant})",cell] }.uniq(&:first).sort_by(&:first)

cells.each do |text, cell|
  puts "#{text} => #{ObjectSpace.memsize_of(cell)} (#{cell.instance_variables.length})"
end

Result on Master

Roo::Excelx::Cell::Boolean(formula=;style=1) => 120 (8)
Roo::Excelx::Cell::Boolean(formula=;style=2) => 104 (8)
Roo::Excelx::Cell::Boolean(formula=something;style=1) => 104 (8)
Roo::Excelx::Cell::Boolean(formula=something;style=2) => 104 (8)
Roo::Excelx::Cell::Date(formula=;style=1) => 120 (9)
Roo::Excelx::Cell::Date(formula=;style=2) => 112 (9)
Roo::Excelx::Cell::Date(formula=something;style=1) => 112 (9)
Roo::Excelx::Cell::Date(formula=something;style=2) => 112 (9)
Roo::Excelx::Cell::DateTime(formula=;style=1) => 120 (9)
Roo::Excelx::Cell::DateTime(formula=;style=2) => 112 (9)
Roo::Excelx::Cell::DateTime(formula=something;style=1) => 112 (9)
Roo::Excelx::Cell::DateTime(formula=something;style=2) => 112 (9)
Roo::Excelx::Cell::Empty() => 96 (7)
Roo::Excelx::Cell::Number(formula=;style=1) => 120 (9)
Roo::Excelx::Cell::Number(formula=;style=2) => 112 (9)
Roo::Excelx::Cell::Number(formula=something;style=1) => 112 (9)
Roo::Excelx::Cell::Number(formula=something;style=2) => 112 (9)
Roo::Excelx::Cell::String(formula=;style=1) => 120 (8)
Roo::Excelx::Cell::String(formula=;style=2) => 104 (8)
Roo::Excelx::Cell::String(formula=something;style=1) => 104 (8)
Roo::Excelx::Cell::String(formula=something;style=2) => 104 (8)
Roo::Excelx::Cell::Time(formula=;style=1) => 120 (10)
Roo::Excelx::Cell::Time(formula=;style=2) => 120 (10)
Roo::Excelx::Cell::Time(formula=something;style=1) => 120 (10)
Roo::Excelx::Cell::Time(formula=something;style=2) => 120 (10)

Result after this patch

Roo::Excelx::Cell::Boolean(formula=;style=1) => 40 (3)
Roo::Excelx::Cell::Boolean(formula=;style=2) => 80 (4)
Roo::Excelx::Cell::Boolean(formula=something;style=1) => 88 (4)
Roo::Excelx::Cell::Boolean(formula=something;style=2) => 80 (5)
Roo::Excelx::Cell::Date(formula=;style=1) => 80 (5)
Roo::Excelx::Cell::Date(formula=;style=2) => 96 (6)
Roo::Excelx::Cell::Date(formula=something;style=1) => 104 (6)
Roo::Excelx::Cell::Date(formula=something;style=2) => 96 (7)
Roo::Excelx::Cell::DateTime(formula=;style=1) => 80 (5)
Roo::Excelx::Cell::DateTime(formula=;style=2) => 96 (6)
Roo::Excelx::Cell::DateTime(formula=something;style=1) => 104 (6)
Roo::Excelx::Cell::DateTime(formula=something;style=2) => 96 (7)
Roo::Excelx::Cell::Empty() => 40 (1)
Roo::Excelx::Cell::Number(formula=;style=1) => 80 (5)
Roo::Excelx::Cell::Number(formula=;style=2) => 96 (6)
Roo::Excelx::Cell::Number(formula=something;style=1) => 104 (6)
Roo::Excelx::Cell::Number(formula=something;style=2) => 96 (7)
Roo::Excelx::Cell::String(formula=;style=1) => 40 (3)
Roo::Excelx::Cell::String(formula=;style=2) => 80 (4)
Roo::Excelx::Cell::String(formula=something;style=1) => 88 (4)
Roo::Excelx::Cell::String(formula=something;style=2) => 80 (5)
Roo::Excelx::Cell::Time(formula=;style=1) => 96 (6)
Roo::Excelx::Cell::Time(formula=;style=2) => 104 (7)
Roo::Excelx::Cell::Time(formula=something;style=1) => 120 (7)
Roo::Excelx::Cell::Time(formula=something;style=2) => 104 (8)

NOTE: As of now it is Proof of Concept and is currently WIP. Also many factors effect memsize due to which result will vary by simply changing the sequence it is running in. But overall it will reduce allocated and retained memory.

@coveralls
Copy link

coveralls commented Sep 13, 2018

Coverage Status

Coverage increased (+0.03%) to 94.028% when pulling ca31e0e on chopraanmol1:reduce_memsize_of_excelx_cell into 4f9b166 on roo-rb:master.

Copy link
Contributor

@tgturner tgturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM let me know when you think its ready to merge, maybe add in a test case for the new attr_reader_with_default?

@@ -4,10 +4,26 @@ class Cell
class Base
attr_reader :cell_type, :cell_value, :value

# TODO: Extract it to module
def self.attr_reader_with_default(attr_hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, interesting implementation to avoid creating more instance variables.

@tgturner
Copy link
Contributor

@chopraanmol1 Looks like this needs to be rebased against master with the new changes from #434

One of the factor which determines the memsize of object is total no. of instance variables

Reducing no. of instance variables can also reduce memsize. To do so I've done two major things

1. Remove unnecessary instance_variable link
2. Created method attr_reader_with_default
	i.  With help of this method we can remove instance variable which are static accross class e.g. type and in somecases cell_type
	ii. We can use better default to reduce memory e.g. style

Script

```
require 'roo'
require 'objspace'

coordinate = Roo::Excelx::Coordinate.new 1,1
base_date = Date.new(1899, 12, 30)

cells = []
formulas = [nil, "something"]#.reverse
styles = [1,2]#.reverse # Changing the sequence of this will result into different result
formulas.each do |formula|
	styles.each do |style|
		cells << ["formula=#{formula};style=#{style}", Roo::Excelx::Cell::String.new('Sample String',formula, style, nil, coordinate)]
		cells << ["formula=#{formula};style=#{style}", Roo::Excelx::Cell::Boolean.new('1',formula, style,nil,coordinate)]
		cells << ["formula=#{formula};style=#{style}", Roo::Excelx::Cell::Number.new('20.25', formula, [:numeric_or_formula, '%.2f'], style, nil, coordinate)]
		cells << ["formula=#{formula};style=#{style}", Roo::Excelx::Cell::Date.new(5, formula, [:numeric_or_formula, 'yyyy-mm-ddd'], style,nil,base_date,coordinate)]
		cells << ["formula=#{formula};style=#{style}", Roo::Excelx::Cell::DateTime.new(5.25, formula, [:numeric_or_formula, 'yyyy-mm-ddd hh:mm'], style,nil,base_date,coordinate)]
		cells << ["formula=#{formula};style=#{style}", Roo::Excelx::Cell::Time.new(0.25, formula, [:numeric_or_formula, 'hh:mm'], style,nil,base_date,coordinate)]
		cells << ["", Roo::Excelx::Cell::Empty.new(coordinate)]
	end
end

cells = cells.collect{|variant,cell| ["#{cell.class}(#{variant})",cell] }.uniq(&:first).sort_by(&:first)

cells.each do |text, cell|
	puts "#{text} => #{ObjectSpace.memsize_of(cell)} (#{cell.instance_variables.length})"
end

```

Result on Master
```
Roo::Excelx::Cell::Boolean(formula=;style=1) => 120 (8)
Roo::Excelx::Cell::Boolean(formula=;style=2) => 104 (8)
Roo::Excelx::Cell::Boolean(formula=something;style=1) => 104 (8)
Roo::Excelx::Cell::Boolean(formula=something;style=2) => 104 (8)
Roo::Excelx::Cell::Date(formula=;style=1) => 120 (9)
Roo::Excelx::Cell::Date(formula=;style=2) => 112 (9)
Roo::Excelx::Cell::Date(formula=something;style=1) => 112 (9)
Roo::Excelx::Cell::Date(formula=something;style=2) => 112 (9)
Roo::Excelx::Cell::DateTime(formula=;style=1) => 120 (9)
Roo::Excelx::Cell::DateTime(formula=;style=2) => 112 (9)
Roo::Excelx::Cell::DateTime(formula=something;style=1) => 112 (9)
Roo::Excelx::Cell::DateTime(formula=something;style=2) => 112 (9)
Roo::Excelx::Cell::Empty() => 96 (7)
Roo::Excelx::Cell::Number(formula=;style=1) => 120 (9)
Roo::Excelx::Cell::Number(formula=;style=2) => 112 (9)
Roo::Excelx::Cell::Number(formula=something;style=1) => 112 (9)
Roo::Excelx::Cell::Number(formula=something;style=2) => 112 (9)
Roo::Excelx::Cell::String(formula=;style=1) => 120 (8)
Roo::Excelx::Cell::String(formula=;style=2) => 104 (8)
Roo::Excelx::Cell::String(formula=something;style=1) => 104 (8)
Roo::Excelx::Cell::String(formula=something;style=2) => 104 (8)
Roo::Excelx::Cell::Time(formula=;style=1) => 120 (10)
Roo::Excelx::Cell::Time(formula=;style=2) => 120 (10)
Roo::Excelx::Cell::Time(formula=something;style=1) => 120 (10)
Roo::Excelx::Cell::Time(formula=something;style=2) => 120 (10)
```

Result after this patch
```
Roo::Excelx::Cell::Boolean(formula=;style=1) => 40 (3)
Roo::Excelx::Cell::Boolean(formula=;style=2) => 80 (4)
Roo::Excelx::Cell::Boolean(formula=something;style=1) => 88 (4)
Roo::Excelx::Cell::Boolean(formula=something;style=2) => 80 (5)
Roo::Excelx::Cell::Date(formula=;style=1) => 80 (5)
Roo::Excelx::Cell::Date(formula=;style=2) => 96 (6)
Roo::Excelx::Cell::Date(formula=something;style=1) => 104 (6)
Roo::Excelx::Cell::Date(formula=something;style=2) => 96 (7)
Roo::Excelx::Cell::DateTime(formula=;style=1) => 80 (5)
Roo::Excelx::Cell::DateTime(formula=;style=2) => 96 (6)
Roo::Excelx::Cell::DateTime(formula=something;style=1) => 104 (6)
Roo::Excelx::Cell::DateTime(formula=something;style=2) => 96 (7)
Roo::Excelx::Cell::Empty() => 40 (1)
Roo::Excelx::Cell::Number(formula=;style=1) => 80 (5)
Roo::Excelx::Cell::Number(formula=;style=2) => 96 (6)
Roo::Excelx::Cell::Number(formula=something;style=1) => 104 (6)
Roo::Excelx::Cell::Number(formula=something;style=2) => 96 (7)
Roo::Excelx::Cell::String(formula=;style=1) => 40 (3)
Roo::Excelx::Cell::String(formula=;style=2) => 80 (4)
Roo::Excelx::Cell::String(formula=something;style=1) => 88 (4)
Roo::Excelx::Cell::String(formula=something;style=2) => 80 (5)
Roo::Excelx::Cell::Time(formula=;style=1) => 96 (6)
Roo::Excelx::Cell::Time(formula=;style=2) => 104 (7)
Roo::Excelx::Cell::Time(formula=something;style=1) => 120 (7)
Roo::Excelx::Cell::Time(formula=something;style=2) => 104 (8)
```
@chopraanmol1 chopraanmol1 force-pushed the reduce_memsize_of_excelx_cell branch from 214ee80 to ca31e0e Compare September 16, 2018 13:11
@tgturner tgturner self-assigned this Sep 16, 2018
@tgturner
Copy link
Contributor

@chopraanmol1 Great changes here!

@tgturner tgturner merged commit 782420b into roo-rb:master Sep 16, 2018
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jan 20, 2019
pkgsrc change: add "USE_LANGUAGES= # none".

##  [2.8.0] 2019-01-18
### Fixed
- Fixed inconsistent column length for CSV [375](roo-rb/roo#375)
- Fixed formatted_value with `%` for Excelx [416](roo-rb/roo#416)
- Improved Memory consumption and performance [434](roo-rb/roo#434) [449](roo-rb/roo#449) [454](roo-rb/roo#454) [456](roo-rb/roo#456) [458](roo-rb/roo#458) [462](roo-rb/roo#462) [466](roo-rb/roo#466)
- Accept both Transitional and Strict Type for Excelx's worksheets [441](roo-rb/roo#441)
- Fixed ruby warnings [442](roo-rb/roo#442) [476](roo-rb/roo#476)
- Restore support for URL as file identifier for CSV [462](roo-rb/roo#462)
- Fixed missing location for Excelx's links [482](roo-rb/roo#482)

### Changed / Added
- Drop support for ruby 2.2.x and lower
- Updated rubyzip version for fixing security issue. Now minimal version is 1.2.1
- Roo::Excelx::Coordinate now inherits Array [458](roo-rb/roo#458)
- Improved Roo::HeaderRowNotFoundError exception's message [461](roo-rb/roo#461)
- Added `empty_cell` option which by default disable allocation for Roo::Excelx::Cell::Empty [464](roo-rb/roo#464)
- Added support for variable number of decimals for Excelx's formatted_value [387](roo-rb/roo#387)
- Added `disable_html_injection` option to disable html injection for shared string in `Roo::Excelx` [392](roo-rb/roo#392)
- Added image extraction for Excelx [414](roo-rb/roo#414) [397](roo-rb/roo#397)
- Added support for `1e6` as scientific notation for Excelx [433](roo-rb/roo#433)
- Added support for Integer as 0 based index for Excelx's `sheet_for` [455](roo-rb/roo#455)
- Extended `no_hyperlinks` option for non streaming Excelx methods [459](roo-rb/roo#459)
- Added `empty_cell` option to disable Roo::Excelx::Cell::Empty allocation for Excelx [464](roo-rb/roo#464)
- Added support for Integer with leading zero for Roo:Excelx [479](roo-rb/roo#479)
- Refactored Excelx code [453](roo-rb/roo#453) [477](roo-rb/roo#477) [483](roo-rb/roo#483) [484](roo-rb/roo#484)

### Deprecations
- Roo::Excelx::Sheet#present_cells is deprecated [454](roo-rb/roo#454)
- Roo::Utils.split_coordinate is deprecated [458](roo-rb/roo#458)
- Roo::Excelx::Cell::Base#link is deprecated [457](roo-rb/roo#457)
aravindm pushed a commit to chobiwa/roo that referenced this pull request Jun 18, 2019
One of the factor which determines the memsize of object is total no. of instance variables

Reducing no. of instance variables can also reduce memsize. To do so I've done two major things

1. Remove unnecessary instance_variable link
2. Created method attr_reader_with_default
	i.  With help of this method we can remove instance variable which are static accross class e.g. type and in somecases cell_type
	ii. We can use better default to reduce memory e.g. style
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants