Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch List implementation to use Hash-based lookup #133

Merged
merged 14 commits into from
Feb 10, 2017
Merged

Conversation

weppos
Copy link
Owner

@weppos weppos commented Feb 9, 2017

This is a major refactoring of the internals of the List implementation (the way the list is stored), and the find operation algorithm. The goal was to decrease the memory footprint and increase the speed of the lookup.

This is part of a study and research I am conducting about data structures and algorithms. The various commits contains extra information about the various changes and optimizations.


Before

$ ruby test/benchmarks/bm_find.rb

                                user     system      total        real
NAME_SHORT                  1.540000   0.000000   1.540000 (  1.560285)
NAME_MEDIUM                 1.740000   0.020000   1.760000 (  1.774570)
NAME_LONG                   2.050000   0.010000   2.060000 (  2.101608)
NAME_WILD                   0.630000   0.010000   0.640000 (  0.633376)
NAME_EXCP                   0.660000   0.000000   0.660000 (  0.663655)
IAAA                        0.710000   0.000000   0.710000 (  0.712431)
IZZZ                        0.620000   0.000000   0.620000 (  0.621207)
PAAA                        6.900000   0.060000   6.960000 (  7.105149)
PZZZ                        0.930000   0.000000   0.930000 (  0.932058)
JP                         51.190000   0.430000  51.620000 ( 52.718784)
IT                          9.110000   0.030000   9.140000 (  9.183792)
COM                         7.580000   0.010000   7.590000 (  7.591188)

$ ruby test/profilers/list_profsize.rb

   301,518   PublicSuffix::List size
   247,194   Size of rules
    54,287   Size of indexes

$ ruby test/profilers/initialization_profiler.rb

Total allocated: 6525680 bytes (72086 objects)
Total retained:  1020309 bytes (19234 objects)

allocated memory by class
-----------------------------------
   3819072  Hash
   1826448  String
    557440  Array
    320080  PublicSuffix::Rule::Normal
      2040  PublicSuffix::Rule::Wildcard
       320  PublicSuffix::Rule::Exception
       240  File
        40  PublicSuffix::List

allocated objects by class
-----------------------------------
     38284  String
     16124  Hash
      9615  Array
      8002  PublicSuffix::Rule::Normal
        51  PublicSuffix::Rule::Wildcard
         8  PublicSuffix::Rule::Exception
         1  File
         1  PublicSuffix::List

retained memory by class
-----------------------------------
    389541  String
    320080  PublicSuffix::Rule::Normal
    229560  Array
     78728  Hash
      2040  PublicSuffix::Rule::Wildcard
       320  PublicSuffix::Rule::Exception
        40  PublicSuffix::List

retained objects by class
-----------------------------------
      9617  String
      8002  PublicSuffix::Rule::Normal
      1554  Array
        51  PublicSuffix::Rule::Wildcard
         8  PublicSuffix::Rule::Exception
         1  Hash
         1  PublicSuffix::List

Allocated String Report
-----------------------------------
      1796  "jp"
      1712  ""
       761  "no"
            ...

Retained String Report
-----------------------------------
         2  "aaa"
         2  "aarp"
         2  "abarth"
            ...

$ ruby test/profilers/find_profiler.rb

Total allocated: 31472 bytes (691 objects)
Total retained:  0 bytes (0 objects)

allocated memory by class
-----------------------------------
     26640  String
      2840  Array
       584  Hash
       584  RubyVM::Env
       400  Proc
       288  Enumerator::Lazy
        48  Enumerator::Generator
        48  Enumerator::Yielder
        40  PublicSuffix::Rule::Wildcard

allocated objects by class
-----------------------------------
       666  String
         5  Array
         5  Hash
         5  Proc
         5  RubyVM::Env
         2  Enumerator::Lazy
         1  Enumerator::Generator
         1  Enumerator::Yielder
         1  PublicSuffix::Rule::Wildcard

retained memory by class
-----------------------------------
NO DATA

retained objects by class
-----------------------------------
NO DATA

After

$ ruby test/benchmarks/bm_find.rb

                                user     system      total        real
NAME_SHORT                  0.370000   0.000000   0.370000 (  0.376614)
NAME_MEDIUM                 0.480000   0.000000   0.480000 (  0.489633)
NAME_LONG                   0.590000   0.010000   0.600000 (  0.603704)
NAME_WILD                   0.570000   0.000000   0.570000 (  0.577077)
NAME_EXCP                   0.700000   0.010000   0.710000 (  0.709454)
IAAA                        0.400000   0.000000   0.400000 (  0.406585)
IZZZ                        0.440000   0.000000   0.440000 (  0.436526)
PAAA                        0.790000   0.010000   0.800000 (  0.833797)
PZZZ                        0.740000   0.000000   0.740000 (  0.758879)
JP                          0.760000   0.010000   0.770000 (  0.777570)
IT                          0.400000   0.000000   0.400000 (  0.394240)
COM                         0.400000   0.000000   0.400000 (  0.399312)

$ ruby test/profilers/list_profsize.rb

   263,481   PublicSuffix::List size
   263,451   Size of rules

$ ruby test/profilers/initialization_profiler.rb

Total allocated: 6205052 bytes (60280 objects)
Total retained:  1052326 bytes (16127 objects)

allocated memory by class
-----------------------------------
   4143744  Hash
   1416148  String
    322440  PublicSuffix::Rule::Entry
    320080  PublicSuffix::Rule::Normal
      2040  PublicSuffix::Rule::Wildcard
       320  PublicSuffix::Rule::Exception
       240  File
        40  PublicSuffix::List

allocated objects by class
-----------------------------------
     28032  String
     16124  Hash
      8061  PublicSuffix::Rule::Entry
      8002  PublicSuffix::Rule::Normal
        51  PublicSuffix::Rule::Wildcard
         8  PublicSuffix::Rule::Exception
         1  File
         1  PublicSuffix::List

retained memory by class
-----------------------------------
    403400  Hash
    326446  String
    322440  PublicSuffix::Rule::Entry
        40  PublicSuffix::List

retained objects by class
-----------------------------------
      8064  String
      8061  PublicSuffix::Rule::Entry
         1  Hash
         1  PublicSuffix::List

Retained String Report
-----------------------------------
         1  "*.compute.amazonaws.com.cn"
         1  "*.githubcloudusercontent.com"
         1  "0.bg"
            ...

$ ruby test/profilers/find_profiler.rb

Total allocated: 1728 bytes (24 objects)
Total retained:  0 bytes (0 objects)

allocated memory by class
-----------------------------------
      1048  Hash
       520  String
        80  Array
        40  PublicSuffix::Rule::Normal
        40  PublicSuffix::Rule::Wildcard

allocated objects by class
-----------------------------------
        13  String
         7  Hash
         2  Array
         1  PublicSuffix::Rule::Normal
         1  PublicSuffix::Rule::Wildcard

retained memory by class
-----------------------------------
NO DATA

retained objects by class
-----------------------------------
NO DATA

The Hash doesn't require manual reindexing when new rules are added.
Moreover, the Hash-based algorithm has almost O(1) lookup time.

Actually, the lookup time is O(k), where k is the number of parts in
the input string.

    find("www.example.com") -> k = 2
    find("www.example.com") -> k = 3
    find("www.subdomain.example.com") -> k = 4

It's fair to consider that the average number of parts is 3, and
hostnames longer than 5 parts are quite uncommon.

Note that the Hash-based lookup is highly influenced by whatever
underlying Hash implementation is provided by the programming language.
A Perfect Hash would be preferable in terms of lookup time as it offers
real O(1) lookup time complexity (whereas a dynamic Hash is avg O(1)),
however a Perfect Hash would require a computation of a perfect hashing
function, without considering that it would not allow the flexibility
of adding/removing rules at runtime.
➜  publicsuffix-ruby git:(thesis-hash) ✗ ruby benchmarks/bm_parts.rb
Warming up --------------------------------------
          tokenizer1    26.384k i/100ms
          tokenizer2    26.571k i/100ms
          tokenizer3    32.293k i/100ms
          tokenizer4    27.595k i/100ms
Calculating -------------------------------------
          tokenizer1    310.488k (± 6.6%) i/s -      1.557M in 5.035961s
          tokenizer2    308.801k (± 8.3%) i/s -      1.541M in 5.027643s
          tokenizer3    378.716k (± 5.3%) i/s -      1.905M in 5.045422s
          tokenizer4    305.493k (± 9.6%) i/s -      1.518M in 5.018550s

Comparison:
          tokenizer3:   378716.5 i/s
          tokenizer1:   310488.3 i/s - 1.22x  slower
          tokenizer2:   308800.6 i/s - 1.23x  slower
          tokenizer4:   305493.5 i/s - 1.24x  slower
After I finally realize why the benchmarks were still using
the old code, and fixing the issue in 5ed8d00, here's the new
benchmarks that compare the existing implementation with the new
lookup based on Hash.

Using the naive indexing:

    ➜  publicsuffix-ruby git:(master) ruby benchmarks/bm_find.rb
    Rehearsal -------------------------------------------------------------
    NAME_SHORT                  1.550000   0.010000   1.560000 (  1.563616)
    NAME_SHORT (noprivate)      2.060000   0.020000   2.080000 (  2.117548)
    NAME_MEDIUM                 1.720000   0.020000   1.740000 (  1.760489)
    NAME_MEDIUM (noprivate)     2.430000   0.020000   2.450000 (  2.649166)
    NAME_LONG                   1.630000   0.000000   1.630000 (  1.643268)
    NAME_LONG (noprivate)       2.210000   0.020000   2.230000 (  2.262352)
    NAME_WILD                   0.600000   0.000000   0.600000 (  0.601043)
    NAME_WILD (noprivate)       1.320000   0.070000   1.390000 (  1.475682)
    NAME_EXCP                   0.940000   0.060000   1.000000 (  1.071000)
    NAME_EXCP (noprivate)       1.120000   0.010000   1.130000 (  1.136978)
    IAAA                        0.690000   0.000000   0.690000 (  0.694769)
    IAAA (noprivate)            1.010000   0.010000   1.020000 (  1.011105)
    IZZZ                        0.560000   0.000000   0.560000 (  0.569191)
    IZZZ (noprivate)            0.900000   0.000000   0.900000 (  0.895128)
    PAAA                        7.310000   0.090000   7.400000 (  8.036596)
    PAAA (noprivate)            7.910000   0.080000   7.990000 (  8.450394)
    PZZZ                        1.060000   0.000000   1.060000 (  1.109186)
    PZZZ (noprivate)            1.390000   0.010000   1.400000 (  1.411946)
    JP                         50.590000   0.390000  50.980000 ( 52.698865)
    JP (noprivate)             49.840000   0.230000  50.070000 ( 50.385524)
    IT                          9.440000   0.020000   9.460000 (  9.502403)
    IT (noprivate)              9.940000   0.030000   9.970000 ( 10.008055)
    COM                         8.610000   0.030000   8.640000 (  8.657849)
    COM (noprivate)             9.330000   0.130000   9.460000 (  9.700029)
    -------------------------------------------------- total: 175.410000sec

                                   user     system      total        real
    NAME_SHORT                  1.580000   0.000000   1.580000 (  1.588811)
    NAME_SHORT (noprivate)      2.000000   0.010000   2.010000 (  2.024544)
    NAME_MEDIUM                 1.960000   0.020000   1.980000 (  2.012659)
    NAME_MEDIUM (noprivate)     2.150000   0.020000   2.170000 (  2.193273)
    NAME_LONG                   1.660000   0.000000   1.660000 (  1.666938)
    NAME_LONG (noprivate)       2.010000   0.000000   2.010000 (  2.018177)
    NAME_WILD                   0.600000   0.000000   0.600000 (  0.601061)
    NAME_WILD (noprivate)       0.920000   0.000000   0.920000 (  0.920315)
    NAME_EXCP                   0.700000   0.010000   0.710000 (  0.708406)
    NAME_EXCP (noprivate)       1.260000   0.010000   1.270000 (  1.298971)
    IAAA                        0.810000   0.010000   0.820000 (  0.829160)
    IAAA (noprivate)            1.180000   0.000000   1.180000 (  1.207569)
    IZZZ                        0.640000   0.010000   0.650000 (  0.646752)
    IZZZ (noprivate)            1.020000   0.000000   1.020000 (  1.037327)
    PAAA                        6.180000   0.020000   6.200000 (  6.227082)
    PAAA (noprivate)            6.970000   0.050000   7.020000 (  7.089971)
    PZZZ                        0.930000   0.000000   0.930000 (  0.937254)
    PZZZ (noprivate)            1.310000   0.010000   1.320000 (  1.324235)
    JP                         47.930000   0.200000  48.130000 ( 48.440196)
    JP (noprivate)             48.440000   0.260000  48.700000 ( 49.110888)
    IT                          9.660000   0.090000   9.750000 (  9.874755)
    IT (noprivate)              9.950000   0.070000  10.020000 ( 10.163920)
    COM                         7.930000   0.020000   7.950000 (  7.986893)
    COM (noprivate)             8.170000   0.010000   8.180000 (  8.186619)

Using Hash:

    ➜  publicsuffix-ruby git:(thesis-hash) ruby benchmarks/bm_find.rb
    Rehearsal -------------------------------------------------------------
    NAME_SHORT                  0.310000   0.000000   0.310000 (  0.363447)
    NAME_SHORT (noprivate)      0.360000   0.000000   0.360000 (  0.402509)
    NAME_MEDIUM                 0.320000   0.000000   0.320000 (  0.317237)
    NAME_MEDIUM (noprivate)     0.410000   0.000000   0.410000 (  0.413092)
    NAME_LONG                   0.400000   0.000000   0.400000 (  0.396608)
    NAME_LONG (noprivate)       0.510000   0.000000   0.510000 (  0.510915)
    NAME_WILD                   0.390000   0.000000   0.390000 (  0.393804)
    NAME_WILD (noprivate)       0.510000   0.010000   0.520000 (  0.507487)
    NAME_EXCP                   0.400000   0.000000   0.400000 (  0.401723)
    NAME_EXCP (noprivate)       0.520000   0.000000   0.520000 (  0.525549)
    IAAA                        0.240000   0.000000   0.240000 (  0.244243)
    IAAA (noprivate)            0.360000   0.000000   0.360000 (  0.359558)
    IZZZ                        0.250000   0.000000   0.250000 (  0.249716)
    IZZZ (noprivate)            0.360000   0.000000   0.360000 (  0.356862)
    PAAA                        0.440000   0.000000   0.440000 (  0.445464)
    PAAA (noprivate)            0.590000   0.000000   0.590000 (  0.591834)
    PZZZ                        0.450000   0.000000   0.450000 (  0.446044)
    PZZZ (noprivate)            0.520000   0.000000   0.520000 (  0.524458)
    JP                          0.320000   0.000000   0.320000 (  0.327063)
    JP (noprivate)              0.430000   0.000000   0.430000 (  0.430906)
    IT                          0.270000   0.000000   0.270000 (  0.265015)
    IT (noprivate)              0.340000   0.000000   0.340000 (  0.345299)
    COM                         0.250000   0.000000   0.250000 (  0.244028)
    COM (noprivate)             0.340000   0.010000   0.350000 (  0.343862)
    ---------------------------------------------------- total: 9.310000sec

                                   user     system      total        real
    NAME_SHORT                  0.220000   0.000000   0.220000 (  0.221509)
    NAME_SHORT (noprivate)      0.320000   0.000000   0.320000 (  0.329044)
    NAME_MEDIUM                 0.290000   0.000000   0.290000 (  0.296088)
    NAME_MEDIUM (noprivate)     0.390000   0.000000   0.390000 (  0.393592)
    NAME_LONG                   0.420000   0.000000   0.420000 (  0.419251)
    NAME_LONG (noprivate)       0.500000   0.000000   0.500000 (  0.499873)
    NAME_WILD                   0.420000   0.000000   0.420000 (  0.421002)
    NAME_WILD (noprivate)       0.480000   0.000000   0.480000 (  0.485180)
    NAME_EXCP                   0.400000   0.000000   0.400000 (  0.401010)
    NAME_EXCP (noprivate)       0.510000   0.000000   0.510000 (  0.506889)
    IAAA                        0.250000   0.000000   0.250000 (  0.257035)
    IAAA (noprivate)            0.350000   0.000000   0.350000 (  0.352895)
    IZZZ                        0.250000   0.000000   0.250000 (  0.250804)
    IZZZ (noprivate)            0.350000   0.010000   0.360000 (  0.352272)
    PAAA                        0.440000   0.000000   0.440000 (  0.444238)
    PAAA (noprivate)            0.540000   0.000000   0.540000 (  0.549019)
    PZZZ                        0.440000   0.000000   0.440000 (  0.449137)
    PZZZ (noprivate)            0.550000   0.000000   0.550000 (  0.559688)
    JP                          0.330000   0.000000   0.330000 (  0.337413)
    JP (noprivate)              0.450000   0.010000   0.460000 (  0.458545)
    IT                          0.240000   0.000000   0.240000 (  0.247337)
    IT (noprivate)              0.350000   0.000000   0.350000 (  0.351233)
    COM                         0.260000   0.000000   0.260000 (  0.261882)
    COM (noprivate)             0.340000   0.000000   0.340000 (  0.347857)
Using the naive indexing:

    ➜  publicsuffix-ruby git:(master) ruby test/profilers/execution_profiler.rb
    Total allocated: 204162 bytes (4420 objects)
    Total retained:  0 bytes (0 objects)

    allocated memory by gem
    -----------------------------------
        204002  publicsuffix-ruby/lib
           160  other

    allocated memory by class
    -----------------------------------
        177036  String
         18416  Array
          2560  Hash
          2134  Regexp
          1168  RubyVM::Env
          1120  MatchData
           800  Proc
           576  Enumerator::Lazy
            96  Enumerator::Generator
            96  Enumerator::Yielder
            80  PublicSuffix::Domain
            80  PublicSuffix::Rule::Wildcard

    allocated objects by gem
    -----------------------------------
          4416  publicsuffix-ruby/lib
             4  other

    allocated objects by class
    -----------------------------------
          4332  String
            32  Array
            16  Hash
            10  Proc
            10  RubyVM::Env
             4  Enumerator::Lazy
             4  MatchData
             4  Regexp
             2  Enumerator::Generator
             2  Enumerator::Yielder
             2  PublicSuffix::Domain
             2  PublicSuffix::Rule::Wildcard

    retained memory by gem
    -----------------------------------
    NO DATA

    retained memory by file
    -----------------------------------
    NO DATA

    retained memory by location
    -----------------------------------
    NO DATA

    retained memory by class
    -----------------------------------
    NO DATA

    retained objects by gem
    -----------------------------------
    NO DATA

    retained objects by file
    -----------------------------------
    NO DATA

    retained objects by location
    -----------------------------------
    NO DATA

    retained objects by class
    -----------------------------------
    NO DATA

Using Hash:

    ➜  publicsuffix-ruby git:(thesis-hash) ruby test/profilers/execution_profiler.rb
    Total allocated: 15170 bytes (160 objects)
    Total retained:  0 bytes (0 objects)

    allocated memory by gem
    -----------------------------------
         15010  publicsuffix-ruby/lib
           160  other

    allocated memory by class
    -----------------------------------
          8076  String
          2560  Hash
          2134  Regexp
          1120  Array
          1120  MatchData
            80  PublicSuffix::Domain
            80  PublicSuffix::Rule::Wildcard

    allocated objects by gem
    -----------------------------------
           156  publicsuffix-ruby/lib
             4  other

    allocated objects by class
    -----------------------------------
           108  String
            24  Array
            16  Hash
             4  MatchData
             4  Regexp
             2  PublicSuffix::Domain
             2  PublicSuffix::Rule::Wildcard

    retained memory by gem
    -----------------------------------
    NO DATA

    retained memory by file
    -----------------------------------
    NO DATA

    retained memory by location
    -----------------------------------
    NO DATA

    retained memory by class
    -----------------------------------
    NO DATA

    retained objects by gem
    -----------------------------------
    NO DATA

    retained objects by file
    -----------------------------------
    NO DATA

    retained objects by location
    -----------------------------------
    NO DATA

    retained objects by class
    -----------------------------------
    NO DATA
When the rule is stored, we can remove the value from the Rule as
the value if effectively the key of the Hash.

    ➜  publicsuffix-ruby git:(before) ruby test/profilers/initialization_profiler.rb
    Total allocated: 5882690 bytes (52219 objects)
    Total retained:  1375819 bytes (24188 objects)

    ➜  publicsuffix-ruby git:(before) ruby test/profilers/execution_profiler.rb
    Total allocated: 15170 bytes (160 objects)
    Total retained:  0 bytes (0 objects)

    ➜  publicsuffix-ruby git:(after) ✗ ruby test/profilers/initialization_profiler.rb
    Total allocated: 6205130 bytes (60280 objects)
    Total retained:  1052404 bytes (16127 objects)

    ➜  publicsuffix-ruby git:(after) ✗ ruby test/profilers/execution_profiler.rb
    Total allocated: 15330 bytes (164 objects)
    Total retained:  0 bytes (0 objects)

compared to master

    ➜  publicsuffix-ruby git:(master) ruby test/profilers/initialization_profiler.rb
    Total allocated: 6525758 bytes (72086 objects)
    Total retained:  1020387 bytes (19234 objects)

    ➜  publicsuffix-ruby git:(master) ruby test/profilers/execution_profiler.rb
    Total allocated: 204162 bytes (4420 objects)
    Total retained:  0 bytes (0 objects)

Execution time is unchanged.

    ➜  publicsuffix-ruby git:(before) ruby test/benchmarks/bm_find.rb

                                   user     system      total        real
    NAME_SHORT                  0.260000   0.000000   0.260000 (  0.262684)
    NAME_SHORT (noprivate)      0.370000   0.010000   0.380000 (  0.372534)
    NAME_MEDIUM                 0.330000   0.000000   0.330000 (  0.335683)
    NAME_MEDIUM (noprivate)     0.490000   0.000000   0.490000 (  0.494590)
    NAME_LONG                   0.510000   0.010000   0.520000 (  0.519750)
    NAME_LONG (noprivate)       0.590000   0.000000   0.590000 (  0.594626)
    NAME_WILD                   0.480000   0.000000   0.480000 (  0.490432)
    NAME_WILD (noprivate)       0.580000   0.010000   0.590000 (  0.594776)
    NAME_EXCP                   0.460000   0.000000   0.460000 (  0.470119)
    NAME_EXCP (noprivate)       0.590000   0.010000   0.600000 (  0.601316)
    IAAA                        0.300000   0.000000   0.300000 (  0.305301)
    IAAA (noprivate)            0.400000   0.000000   0.400000 (  0.410586)
    IZZZ                        0.280000   0.000000   0.280000 (  0.283711)
    IZZZ (noprivate)            0.400000   0.010000   0.410000 (  0.408137)
    PAAA                        0.490000   0.000000   0.490000 (  0.501869)
    PAAA (noprivate)            0.600000   0.000000   0.600000 (  0.612187)
    PZZZ                        0.510000   0.010000   0.520000 (  0.519206)
    PZZZ (noprivate)            0.590000   0.000000   0.590000 (  0.600264)
    JP                          0.390000   0.000000   0.390000 (  0.404432)
    JP (noprivate)              0.540000   0.010000   0.550000 (  0.558351)
    IT                          0.290000   0.000000   0.290000 (  0.298931)
    IT (noprivate)              0.410000   0.000000   0.410000 (  0.420742)
    COM                         0.290000   0.010000   0.300000 (  0.300935)
    COM (noprivate)             0.400000   0.000000   0.400000 (  0.409309)

    ➜  publicsuffix-ruby git:(after) ✗ ruby test/benchmarks/bm_find.rb

                                   user     system      total        real
    NAME_SHORT                  0.320000   0.000000   0.320000 (  0.320201)
    NAME_SHORT (noprivate)      0.430000   0.000000   0.430000 (  0.443678)
    NAME_MEDIUM                 0.380000   0.000000   0.380000 (  0.388169)
    NAME_MEDIUM (noprivate)     0.490000   0.010000   0.500000 (  0.491073)
    NAME_LONG                   0.480000   0.000000   0.480000 (  0.483376)
    NAME_LONG (noprivate)       0.620000   0.010000   0.630000 (  0.634896)
    NAME_WILD                   0.570000   0.020000   0.590000 (  0.628489)
    NAME_WILD (noprivate)       0.700000   0.030000   0.730000 (  0.769070)
    NAME_EXCP                   0.580000   0.020000   0.600000 (  0.618683)
    NAME_EXCP (noprivate)       0.740000   0.030000   0.770000 (  0.799244)
    IAAA                        0.410000   0.030000   0.440000 (  0.474761)
    IAAA (noprivate)            0.550000   0.040000   0.590000 (  0.645329)
    IZZZ                        0.380000   0.020000   0.400000 (  0.432898)
    IZZZ (noprivate)            0.520000   0.020000   0.540000 (  0.579073)
    PAAA                        0.680000   0.040000   0.720000 (  0.760276)
    PAAA (noprivate)            0.720000   0.020000   0.740000 (  0.773864)
    PZZZ                        0.700000   0.040000   0.740000 (  0.782113)
    PZZZ (noprivate)            0.650000   0.010000   0.660000 (  0.664647)
    JP                          0.470000   0.000000   0.470000 (  0.478473)
    JP (noprivate)              0.580000   0.010000   0.590000 (  0.589827)
    IT                          0.360000   0.000000   0.360000 (  0.379309)
    IT (noprivate)              0.450000   0.010000   0.460000 (  0.471794)
    COM                         0.330000   0.010000   0.340000 (  0.334253)
    COM (noprivate)             0.530000   0.030000   0.560000 (  0.592813)
Using the new benchmarks introduced in dec53e6,
the allocation is clearly lower even during execution time.

    ➜  publicsuffix-ruby git:(master) ✗ ruby test/profilers/find_profiler.rb
    Total allocated: 31472 bytes (691 objects)
    Total retained:  0 bytes (0 objects)

    ➜  publicsuffix-ruby git:(master) ✗ ruby test/profilers/domain_profiler.rb
    Total allocated: 37410 bytes (744 objects)
    Total retained:  0 bytes (0 objects)

vs

    ➜  publicsuffix-ruby git:(thesis-hash) ruby test/profilers/find_profiler.rb
    Total allocated: 1264 bytes (22 objects)
    Total retained:  0 bytes (0 objects)

    ➜  publicsuffix-ruby git:(thesis-hash) ruby test/profilers/domain_profiler.rb
    Total allocated: 7202 bytes (75 objects)
    Total retained:  0 bytes (0 objects)
.new now takes all parameters, as you would create a completely new
instance when you have the data.

A new method called .build is used to create a new Rule from a rule
content.
Better distinguish between a Rule (public API) and an Entry (internal
API).
It doesn't support keyword arguments with no default, and proper memory
profiling.
@casperisfine
Copy link
Contributor

I tested it on my app. The gem now loads 3 times faster (~95ms -> ~31ms).

👏

cc @burke

@weppos
Copy link
Owner Author

weppos commented Feb 10, 2017

Thanks for the feedback @casperisfine. I have some more research going on to use a modification of a Trie or a DAFSA to reduce the memory allocation. That said, I'm quite happy with the speed right now.

@weppos weppos merged commit c45dc94 into master Feb 10, 2017
@weppos weppos deleted the thesis-hash branch February 10, 2017 09:51
weppos added a commit that referenced this pull request Aug 4, 2017
roback added a commit to twingly/twingly-url that referenced this pull request Feb 9, 2018
Unfortunately it doesn't look like this fixes any of our issues,
but since it made the profiling run a bit faster (and the fact that
the tests didn't break) I made a PR of this anyway.

(Profiling total run: 1.6663s -> 1.4801s).

Some related links:
* weppos/publicsuffix-ruby#130
* weppos/publicsuffix-ruby#133
* sporkmonger/addressable#267
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants