Let's try adding some new methods to MRI. This document shows you how to add a new method, step by step. Please follow along in your own environment.
Let's add a Array#second
method. Array#first
returns the first element of an Array.
Array#second
will return the second element of an Array.
Here is a definition in Ruby:
# specification written in Ruby
class Array
def second
self[1]
end
end
Steps:
- Open
array.c
in your editor. - Add a
ary_second()
function definition intoarray.c
. A good place to add it is beforeInit_Array()
. - Add the statement
rb_define_method(rb_cArray, "second", ary_second, 0);
to the body of theInit_Array()
function. - Write some sample code to try your new method in
ruby/test.rb
, then build and run withmake run
. - Add a test in
ruby/test/ruby/test_array.rb
. These tests are written in the minitest format. $ make test-all
will run the test code you wrote. However, it runs a tremendous number of ruby tests, so you may want to run only the Array-related tests.
$ make test-all TESTS='ruby/test_array.rb'
will test onlyruby/test/ruby/test_array.rb
.$ make test-all TESTS='-j8'
will run in parallel with 8 processes.
- Add rdoc documentation of
Array#second
by referencing the documentation of other methods inarray.c
.
One possible implementation of ary_second()
is shown below. Line numbers may differ because array.c
is likely to have changed since this document was written.
diff --git a/array.c b/array.c
index bd24216af3..79c1c1d334 100644
--- a/array.c
+++ b/array.c
@@ -6131,6 +6131,12 @@ rb_ary_sum(int argc, VALUE *argv, VALUE ary)
*
*/
+static VALUE
+ary_second(VALUE self)
+{
+ return rb_ary_entry(self, 1);
+}
+
void
Init_Array(void)
{
@@ -6251,6 +6257,8 @@ Init_Array(void)
rb_define_method(rb_cArray, "dig", rb_ary_dig, -1);
rb_define_method(rb_cArray, "sum", rb_ary_sum, -1);
+ rb_define_method(rb_cArray, "second", ary_second, 0);
+
id_cmp = rb_intern("<=>");
id_random = rb_intern("random");
id_div = rb_intern("div");
A brief explanation follows:
ary_second()
is the implementation of the method.VALUE
represents a type of Ruby object in C, andself
is the method's receiver (i.e. forary.second
, the receiver isary
). All Ruby methods return a Ruby object, so the type of the return value should also beVALUE
.rb_ary_entry(self, n)
does the same thing asself[n]
in Ruby. Therefore,rb_ary_entry(self, 1)
returns the second element (note: C uses 0-based index).- The function
Init_Array
is invoked by the interpreter at launch-time. - The statement
rb_define_method(rb_cArray, "second", ary_second, 0);
defines thesecond
method on theArray
class.rb_cArray
points to theArray
class object. Therb_
prefix is used to indicate it is something Ruby-related, and thec
means "Class". Therefore, we can infer thatrb_cArray
is Ruby's Array class object. BTW, the module object prefix ism
(e.g.rb_mEnumerable
==Enumerable
module object) and the error class prefix ise
(e.g.rb_eArgError
==ArgumentError
object).rb_define_method
is a function that defines instance methods.- This statement can be read as: "Define an instance method
second
onrb_cArray
. WhenArray#second
is called, then call theary_second
C function. This method accepts 0 arguments".
Let's define a method String#palindrome?
that checks if the string is a palindrome or not.
The following code is a sample Ruby implementation of String#palindrome?
along with some tests.
class String
def palindrome?
chars = self.gsub(/[^A-z0-9\p{hiragana}\p{katakana}]/, '').downcase
# p chars
!chars.empty? && chars == chars.reverse
end
end
# Small sample program
# Sample palindrome from https://en.wikipedia.org/wiki/Palindrome
[# OK
"Sator Arepo Tenet Opera Rotas",
"A man, a plan, a canal - Panama!",
"Madam, I'm Adam",
"NisiOisiN",
"わかみかものとかなかとのもかみかわ",
"アニマルマニア",
# NG
"",
"ab",
].each{|str|
p [str, str.palindrome?]
}
Translate the above Ruby code into C code.
Please recall the procedure for implementing Array#second
, and use this procedure to implement String#palindrome?
in MRI.
Below is one possible solution for implementing String#palindrome?
.
diff --git a/string.c b/string.c
index c140148778..0f170bd20b 100644
--- a/string.c
+++ b/string.c
@@ -10062,6 +10062,18 @@ rb_to_symbol(VALUE name)
return rb_str_intern(name);
}
+static VALUE
+str_palindrome_p(VALUE self)
+{
+ const char *pat = "[^A-z0-9\\p{hiragana}\\p{katakana}]";
+ VALUE argv[2] = {rb_reg_regcomp(rb_utf8_str_new_cstr(pat)),
+ rb_str_new_cstr("")};
+ VALUE filtered_str = rb_str_downcase(0, NULL, str_gsub(2, argv, self, FALSE));
+ return rb_str_empty(filtered_str) ? Qfalse :
+ rb_str_equal(filtered_str, rb_str_reverse(filtered_str));
+
+}
+
/*
* A <code>String</code> object holds and manipulates an arbitrary sequence of
* bytes, typically representing characters. String objects may be created
@@ -10223,6 +10235,8 @@ Init_String(void)
rb_define_method(rb_cString, "valid_encoding?", rb_str_valid_encoding_p, 0);
rb_define_method(rb_cString, "ascii_only?", rb_str_is_ascii_only_p, 0);
+ rb_define_method(rb_cString, "palindrome?", str_palindrome_p, 0);
+
rb_fs = Qnil;
rb_define_hooked_variable("$;", &rb_fs, 0, rb_fs_setter);
rb_define_hooked_variable("$-F", &rb_fs, 0, rb_fs_setter);
Explanation:
- The suffix
_p
indicates a predicate method that returns true or false. rb_reg_regcomp(pat)
compiles thepat
C string into a RegExp object.rb_str_new_cstr("")
generates an empty Ruby string.str_gsub()
does the same replacement asString#gsub
.rb_str_downcase()
does the same replacement asString#downcase
.rb_str_empty()
does the same checking asString#empty?
.rb_str_reverse()
does the same reordering asString#reverse
.rb_str_equal()
does the same comparison asString#==
.
Hopefully, you can see how the C implementation corresponds to the Ruby implementation.
Add a method Integer#add(n)
which returns the result when n
is added.
Ruby example definition:
class Integer
def add n
self + n
end
end
p 1.add(3) #=> 4
p 1.add(4.5) #=> 5.5
Below is one possible solution for implementing Integer#add
:
Index: numeric.c
===================================================================
--- numeric.c (Revision 59647)
+++ numeric.c (Working copy)
@@ -5238,6 +5238,12 @@
}
}
+static VALUE
+int_add(VALUE self, VALUE n)
+{
+ return rb_int_plus(self, n);
+}
+
/*
* Document-class: ZeroDivisionError
*
@@ -5449,6 +5455,8 @@
rb_define_method(rb_cInteger, "bit_length", rb_int_bit_length, 0);
rb_define_method(rb_cInteger, "digits", rb_int_digits, -1);
+ rb_define_method(rb_cInteger, "add", int_add, 1);
+
#ifndef RUBY_INTEGER_UNIFICATION
rb_cFixnum = rb_cInteger;
#endif
This method should accept 1 argument, so the last argument of rb_define_method()
is 1
and the definition of int_add()
accepts one parameter with VALUE n
.
The actual addition is performed in rb_int_plus()
so we don't need to write any complex code.
Let's try to modify this code to use our own implementation of addition if a given parameter is a Fixnum
(numbers represented by Fixnum
are small and can be easily translated both to and from a C int
).
Note that Ruby 2.4 removed the Fixnum
and Bignum
classes. They are now unified into a single Integer
class. However, MRI still uses Fixnum and Bignum as internal data structures for performance reasons. For example, FIXNUM_P(bignum)
returns false.
Index: numeric.c
===================================================================
--- numeric.c (Revision 59647)
+++ numeric.c (Working copy)
@@ -5238,6 +5238,22 @@
}
}
+static VALUE
+int_add(VALUE self, VALUE n)
+{
+ if (FIXNUM_P(self) && FIXNUM_P(n)) {
+ /* c = a + b */
+ int a = FIX2INT(self);
+ int b = FIX2INT(n);
+ int c = a + b;
+ VALUE result = INT2NUM(c);
+ return result;
+ }
+ else {
+ return rb_int_plus(self, n);
+ }
+}
+
/*
* Document-class: ZeroDivisionError
*
FIXNUM_P(self) && FIXNUM_P(n)
checks to see if self
and n
are both Fixnum
.
If they are Fixnum
, they are converted into C int
values with FIX2INT()
, and then addition is performed using C int
values. The result is then converted from a C integer value back into Ruby's Integer value with FIX2NUM()
.
Note: This definition has a bug. See the next document.
Add a method to the Time class to return the time from n
days ago (with a default value for n
of 1).
Here is an example definition in Ruby. It returns a result with time reduced by the number of seconds in 24 hours * n
. This is not a complete solution because it will occasionally be incorrect (e.g. when there are leap seconds, daylight saving time, etc). We'll ignore these problems here because this is simply an illustrative example.
class Time
def day_before n = 1
Time.at(self.to_i - (24 * 60 * 60 * n))
end
end
p Time.now #=> 2017-08-24 14:48:44 +0900
p Time.now.day_before #=> 2017-08-23 14:48:44 +0900
p Time.now.day_before(3) #=> 2017-08-21 14:48:44 +0900
Here is a definition written in C:
Index: time.c
===================================================================
--- time.c (Revision 59647)
+++ time.c (Working copy)
@@ -4717,6 +4717,22 @@
return time;
}
+static VALUE
+time_day_before(int argc, VALUE *argv, VALUE self)
+{
+ VALUE nth;
+ int n, sec, day_before_sec;
+
+ rb_scan_args(argc, argv, "01", &nth);
+ if (nth == Qnil) nth = INT2FIX(1);
+ n = NUM2INT(nth);
+
+ sec = NUM2INT(time_to_i(self));
+ day_before_sec = sec - (60 * 60 * 24 * n);
+
+ return rb_funcall(rb_cTime, rb_intern("at"), 1, INT2NUM(day_before_sec));
+}
+
/*
* Time is an abstraction of dates and times. Time is stored internally as
* the number of seconds with fraction since the _Epoch_, January 1, 1970
@@ -4896,6 +4912,8 @@
rb_define_method(rb_cTime, "strftime", time_strftime, 1);
+ rb_define_method(rb_cTime, "day_before", time_day_before, -1);
+
/* methods for marshaling */
rb_define_private_method(rb_cTime, "_dump", time_dump, -1);
rb_define_private_method(rb_singleton_class(rb_cTime), "_load", time_load, 1);
Explanation:
- To define a method that accepts optional arguments,
-1
is specified as the last argument ofrb_define_method()
. This means this function does not know how many methods it will receive until it is called. - The function
time_day_before(int argc, VALUE *argv, VALUE self)
is the definition of the method.argc
is the number of arguments given when it was called, andargv
is a pointer to a C array of sizeargc
objects of typeVALUE
. rb_scan_args()
is called to check the method arguments."01"
means that the number of required parameters is 0 and optional parameters is 1. This means that this method accepts 0 or 1 parameters. If 1 argument is passed, then it is stored innth
. If there are no arguments, thennth
will containQnil
(the C representation of Ruby'snil
).- To call Ruby's method
Time.at()
,rb_funcall(recv, mid, argc, ...)
is used.- The first argument is the method's receiver (
recv
inrecv.mid(...)
). In the case ofTime.at
, the receiver isTime
. - The name of the method called by
rb_funcall
is specified by itsID
, a Symbol. To generate theID
in C, we userb_intern("...")
. AnID
is a unique value for a C string in a Ruby process. In Ruby it is called a Symbol, and in Java it is anintern
ed string. - We want to call
Time.at
with 1 argument, so we specify1
and pass the actual argumentINT2NUM(day_before_sec)
as the final parameter.
- The first argument is the method's receiver (
There are a number of problems with this implementation. Try comparing it with Ruby's actual implementation and see if you can understand the differences.
C extension libraries allow us to extend the functionality of MRI without modifying MRI itself. We can make C extension libraries using almost the same process as we use to hack on MRI internals.
For example, let's make an extension library to add the Array#second
method instead of modifying MRI itself.
Steps to make an .so
extension library file (or .bundle
in MacOS):
- Make a directory named
array_second/
. - Make a file named
array_second/extconf.rb
.
- In this file,
require 'mkmf'
to enable the mkmf library. We can use mkmf to generate a Makefile and perform any configuration needed for the library. - After adding configuration (in this case, we don't have any configuration), call
create_makefile('array_second')
. This method creates a Makefile.
- Make a file named
array_second.c
.
- Add the line
#include <ruby/ruby.h>
to the top of the file to enable the MRI C-API. - This file should contain (1) method body and (2) code that adds the method into
Array
. - (1) is the same as
ary_second()
written earlier. - (2) should be the
Init_array_second()
function, which callsrb_define_method()
. The nameInit_array_second
is inferred from the argument passed tocreate_makefile
inextconf.rb
.
- Run
$ ruby extconf.rb
to generate the Makefile. - Run
$ make
to buildarray_second.so
(orarray_second.bundle
in MacOS). You will then be able torequire
this file. Example:$ ruby -r ./array_second -e 'p [1, 2].second'
will show2
. In MacOS:$ ruby -r ./array_second/array_second.bundle -e 'p [1, 2].second'
$ make install
installs .so file into install directory.
A sample array_second
directory is available in this repository for you to reference.
Except for the extconf.rb
and the installation steps, the Ruby extensions are defined in exactly the same way as Ruby's embedded methods and classes.
To distribute extension libraries, the minimum requirement is to create a package with the files made in step 2 and 3. It's probably more convenient for your users if you package your extension as a RubyGem.
Please refer to https://docs.ruby-lang.org/en/2.5.0/extension_rdoc.html for a detailed explanation of writing Ruby extensions.
Browse through the MRI source code to find methods which perform functions that are similar to what you want to add.
When you write Ruby programs, you probably already use p(obj)
to inspect objects. In C, you can use rb_p(obj)
to perform the equivalent function.
If you can use gdb, breakpoints will help you.
If you add the line #include "vm_debug.h"
, you will be able to use the bp()
macro to set a breakpoint. make gdb
will stop on this macro, similar to when you use binding.pry
or binding.irb
.
gdb allows you to use p expr
to show the value of expr
(for example, you can see a value of a variable foo
with p foo
). The type VALUE
is just an integer value in C, so it may be difficult to determine what kind of Object it is and what data it represents. The special command rp
for gdb (defined in ruby/.gdbinit
) is provided to give a human-readable representation for VALUE-type data.
Try solving the following challenges. grep
will help you to find similar implementations in the source code of MRI.
- Implement
Integer#sub(n)
which subtracts n from an integer value. Array#second
returnsnil
if there is no second element. This is becauserb_ary_entry()
returnsnil
when the specified index exceeds the size of an array. Instead, raise an exception when there is no second element. Userb_raise()
function to raise an error.String#palindrome?
is an inefficient implementation. Identify which part is inefficient and consider how to resolve the inefficiency. Try implementing a solution to improve its performance.Time#day_before
is an awkward name. Think of a better method name.- Let's play a trick on MRI. For example, change the behaviour of
Integer#+
to perform subtraction instead. This hack will break your ruby build, so make a new git branch and experiment to see what happens. - Use your imagination and try to add an interesting new method.
The following topics are discussed in the next chapter, but try to explore them yourself before proceeding:
- I described that
Integer#add(n)
had a bug.- Write a test which fails due to this bug.
- Solve the issue and make the test pass.
- What is a problem with our implementation of
Time#day_before
? There is a similar problem inInteger#add(n)
.