-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Julia interface for custom Unicode normalization #30275
Conversation
You seem to have been developing on the |
@maleadt the CI seems to fail on a unrelated test, I guess there might be no other problem in the changes. |
`pointer_from_objref(x)` require the `x` is a mutable. wrap the immutable object as a list before passing to the `custom_data`
base/strings/unicode.jl
Outdated
lump::Bool=false, | ||
stripmark::Bool=false, | ||
|
||
function utf8proc_map(str::String, options::Integer, custom_func, custom_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Julia side you should just pass custom_func
— there is no need for custom_data
in Julia because that is just a crude way to simulate a closure in C.
Then write
custom_func_wrapper(codepoint::Int32, custom_func::Any) = Int32(custom_func(codepoint))
and in utf8proc_map
do
ccustom_func = @cfunction(custom_func_wrapper, Int32, (Int32, Any))
and pass ccustom_func, custom_func
to utf8proc_decompose_custom
. (That is, the Julia custom_func
is actually passed as the data to utf8proc_decompose_custom
. The corresponding argument in the ccall
should be defined as Any
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, That is surprising. If it is the case, do we really need to make custom_func
a closure?
But beside the origin of the custom_data
, will it be more convenient to pass extra data for the custom_func
? I read the c code at utf8proc. It seems that the custom_func
can only do a codepoint mapping? I original thought that the custom_data
is something like a pre-define table for mapping the codepoint and can change the behavior of utf8proc_iterate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don’t need an extra argument to pass extra data to a function argument in Julia. You just pass an anonymous function that references whatever local data you want.
In general, these “data” arguments are only for languages like C that don’t have closures. Google “closure computer science” if you don’t know what I’m talking about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wondering whether using closure is a good design. Suppose we have a struct called MappingRule
and a method mapping(m::MappingRule, codepoing)
. Then every time you call the normalize
you will need to pass a x->mapping(m, x)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Or you just make the MappingRule
a callable object.
base/strings/unicode.jl
Outdated
(Ptr{UInt8}, Int, Ptr{UInt8}, Int, Cint, Ptr{Cvoid}, Ptr{Cvoid}), | ||
str, sizeof(str), C_NULL, 0, options, ccustom_func, data_ptr) | ||
nwords < 0 && utf8proc_error(nwords) | ||
buffer = Base.StringVector(nwords*4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*sizeof(Int32)
would be clearer.
closed by #42561 |
I found there was a
utf8proc_decompose_custom
function under the c call ofutf8proc_map
insideBase.Unicode.utf8proc_map
but not expose to the Julia interface. So I made anotherBase.Unicode.utf8proc_map
method that support thecustom func
andcustom data
with native Julia function and data type*there is another PR that seems to make a custom unicode normalization, but I didn't found a Julia function to it.