-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ucs4/ISO10646 characters #1
Comments
Zaak, you have anticipated me, I'll ask to you and Jacob how to support non-ascii characters... Wonderful references! Cheers P.S. today I cannot send you the log of the install script of OpenCoarrays: Friday I have a test for a stable position, cross the fingers! |
Zaak, what do you think about a generic interface along the following lines? module char_module
implicit none
private
public :: ascii
public :: ucs4
public :: echo
integer, parameter :: ascii = selected_char_kind('ascii')
#ifdef UCS4
integer, parameter :: ucs4 = selected_char_kind('iso_10646')
#else
integer, parameter :: ucs4 = selected_char_kind('ascii')
#endif
interface echo
module procedure echo_ascii
#ifdef UCS4
module procedure echo_ucs4
#endif
endinterface echo
contains
subroutine echo_ascii(string)
character(len=*, kind=ascii), intent(in) :: string
print '(A)', 'I am echo_ascii'
print '(A)', string
endsubroutine echo_ascii
subroutine echo_ucs4(string)
character(len=*, kind=ucs4), intent(in) :: string
print '(A)', 'I am echo_ucs4'
print '(A)', string
endsubroutine echo_ucs4
endmodule char_module
program test
use char_module
implicit none
character(len=3, kind=ascii) :: string_ascii
character(len=3, kind=ucs4) :: string_ucs4
string_ascii = 'abc' ; call echo(string_ascii)
string_ucs4 = 'ABC' ; call echo(string_ucs4 )
endprogram test Upon execution: stefano@thor(06:17 AM Thu Jun 08)
~ 21 files, 28Mb
→ gfortran -fcheck=all -W ucs4.F90 -std=f2008 -fall-intrinsics
stefano@thor(06:17 AM Thu Jun 08)
~ 21 files, 28Mb
→ a.out
I am echo_ascii
abc
I am echo_ascii
ABC
stefano@thor(06:17 AM Thu Jun 08)
~ 21 files, 28Mb
→ gfortran -fcheck=all -W ucs4.F90 -std=f2008 -fall-intrinsics -DUCS4
stefano@thor(06:17 AM Thu Jun 08)
~ 21 files, 28Mb
→ a.out
I am echo_ascii
abc
I am echo_ucs4
ABC Do you think such an approach could be viable? Cheers |
Sorry... the following is more tailored to what I have in mind module char_module
implicit none
private
public :: ascii
public :: ucs4
public :: ck
public :: convert
integer, parameter :: ascii = selected_char_kind('ascii')
#ifdef UCS4
integer, parameter :: ucs4 = selected_char_kind('iso_10646')
#else
integer, parameter :: ucs4 = selected_char_kind('ascii')
#endif
integer, parameter :: ck = ucs4
interface convert
module procedure convert_from_ascii
#ifdef UCS4
module procedure convert_from_ucs4
#endif
endinterface convert
contains
function convert_from_ascii(string) result(conv)
character(len=*, kind=ascii), intent(in) :: string
character(len=len(string), kind=ck) :: conv
print '(A)', 'I am convert_from_ascii'
conv = string
endfunction convert_from_ascii
function convert_from_ucs4(string) result(conv)
character(len=*, kind=ucs4), intent(in) :: string
character(len=len(string), kind=ck) :: conv
print '(A)', 'I am convert_from_ucs4'
conv = string
endfunction convert_from_ucs4
endmodule char_module
program test
use char_module
implicit none
character(len=3, kind=ascii) :: string_ascii
character(len=3, kind=ucs4) :: string_ucs4
string_ascii = 'abc' ; print '(A)', convert(string_ascii)
string_ucs4 = 'ABC' ; print '(A)', convert(string_ucs4 )
endprogram test stefano@thor(06:35 AM Thu Jun 08)
~ 21 files, 28Mb
→ gfortran -fcheck=all -W ucs4.F90 -std=f2008 -fall-intrinsics
stefano@thor(06:36 AM Thu Jun 08)
~ 21 files, 28Mb
→ a.out
I am convert_from_ascii
abc
I am convert_from_ascii
ABC
stefano@thor(06:36 AM Thu Jun 08)
~ 21 files, 28Mb
→ gfortran -fcheck=all -W ucs4.F90 -std=f2008 -fall-intrinsics -DUCS4
stefano@thor(06:36 AM Thu Jun 08)
~ 21 files, 28Mb
→ a.out
I am convert_from_ascii
abc
I am convert_from_ucs4
ABC |
I realized that for the aim to make forbear ucs4-enabled I have only to catch the characters kind into the module char_module
implicit none
private
public :: ascii
public :: ucs4
public :: ck
public :: initialize
integer, parameter :: ascii = selected_char_kind('ascii')
#ifdef UCS4
integer, parameter :: ucs4 = selected_char_kind('iso_10646')
#else
integer, parameter :: ucs4 = selected_char_kind('ascii')
#endif
integer, parameter :: ck = ucs4
contains
subroutine initialize(input, output)
class(*), intent(in) :: input
character(len=*, kind=ck), intent(inout) :: output
select type(input)
type is(character(len=*, kind=ascii))
print '(A)', 'ascii input'
output = input
#ifdef UCS4
type is(character(len=*, kind=ucs4))
print '(A)', 'ucs4 input'
output = input
#endif
class default
error stop 'error: input must be of class character'
endselect
endsubroutine initialize
endmodule char_module
program test
use char_module
implicit none
character(len=3, kind=ascii) :: string_ascii
character(len=3, kind=ucs4) :: string_ucs4
character(len=3, kind=ck) :: string_ck
character(len=3) :: string_nk
character(len=3, kind=ck) :: string
string_ascii = 'abc'
call initialize(input=string_ascii, output=string)
print '(A)', string
string_ucs4 = 'ABC'
call initialize(input=string_ucs4, output=string)
print '(A)', string
string_ck = 'aBc'
call initialize(input=string_ck, output=string)
print '(A)', string
string_nk = 'AbC'
call initialize(input=string_nk, output=string)
print '(A)', string
call initialize(input=1, output=string)
endprogram test gfortran -fcheck=all -W ucs4.F90 -std=f2008 -fall-intrinsics
ascii input
abc
ascii input
ABC
ascii input
aBc
ascii input
AbC
ERROR STOP error: input must be of class character
gfortran -fcheck=all -W ucs4.F90 -std=f2008 -fall-intrinsics -DUCS4
ascii input
abc
ucs4 input
ABC
ucs4 input
aBc
ascii input
AbC
ERROR STOP error: input must be of class character Cheers |
Stefano, this looks perfect! The only reason we bothered with the complicated wrappers, etc. with JSON-Fortran was to try to eliminate redundant code as much as possible, because the library does some heavy text processing, parsing and manipulation. The tricky part is how to handle user inputs. If at all possible, you should allow arbitrary user string inputs. If the only possible character user inputs are in an Also, FYI, I think conversion from ASCII to UCS4/ISO 10646 happens automatically on assignment. I'm not sure if this is part of the standard, or just common practice for compiler vendors. And, obviously, conversion from UCS4/ISO 10646 to ASCII is, in general, not safe since ASCII is a subset of UCS4/ISO 10646. One could create a routine to check if the UCS4 character exists in ASCII and then perform the conversion, throwing an error if the character in question was not in the ASCII set. Here is a relevant excerpt from MRC:
|
Zaak, thank you very much for your insight, it is very appreciated.
Exactly, this is why I end up with the last FYI, I am planning to add a better support for introspective tests about kindness compilers support into FoBiS, see this. Edit: I just see that you see the FoBiS proposal... |
Dear Zaak, I added support for UCS4 and now forbear provides 40 different spinners. Other will be very easy to add, feel free to suggest new ones. A taste I have only one concern for now: I added spinners via a quick and dirty encoding on the sources, namely the forbear.F90 source contains unicode characters... and I think this is illegal, although GNU gfortran does not complain... what do you think? Cheers |
I'm guessing it works because your terminal is UTF-8... not 100% sure. I think GFortran has a flag to specify special characters via Yes, I just check
This is probably a safer way to do this, but will be a pain to convert... I would assume Intel provides a similar flag, but can't confirm right now what it may be... |
Hi Stefano,
Very cool project!
I noticed you were interested in adding a spinner (in the README.md TODO list) and I was thinking it would be very nice to have ISO 10646 character support, then you can have fancy characters in your spinners and progress bars like this:
It's a bit of a pain though, because you may need to detect if ISO 10646 support is available on the "system" (compiler + machine) and then create overloaded interface wrappers that will accept ASCII, and convert to ISO 10646 characters and pass them to the routines. (Or you could just force everyone to pass in ISO 10646 chars but that may not be practical.)
Here is some code to declare an ISO 10646 variable:
See also:
The text was updated successfully, but these errors were encountered: