Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Printing unicode block elements using syscall #28

Closed
TaylorZowtuk opened this issue May 16, 2019 · 6 comments
Closed

Printing unicode block elements using syscall #28

TaylorZowtuk opened this issue May 16, 2019 · 6 comments

Comments

@TaylorZowtuk
Copy link
Contributor

TaylorZowtuk commented May 16, 2019

I am porting a graphics library from MIPS to RISC-V and I need to print the unicode full block element using syscalls. Does RARS not support printing unicode characters? I load the character into memory using: .asciz "█", perform a printString ecall, and in debugging I get a generic looking outline of a box for any non-printable ascii characters. Do you have any ideas on how to fix this? I apologize if this is the wrong place to ask this.

@TheThirdOne
Copy link
Owner

Does RARS not support printing unicode characters?

I didn't intentionally make any changes to unicode support since I built off of MARS. I don't remember seeing anything explicitly supporting unicode so there is a good chance unicode just isn't supported.

Do you have any ideas on how to fix this?

Have you tried MARS to see if the behavior is present there too? If it isn't, its probably pretty easy for me to fix unicode support.

I apologize if this is the wrong place to ask this.

This is a fine place to ask this.

@TheThirdOne
Copy link
Owner

I looked into this and the character is not fully put into memory when the program loads so it would be impossible to print the correct character later on. This is behavior inherited from MARS.

I am unsure if it really makes sense to support unicode. I'll check to see what other simulators do.

@TaylorZowtuk
Copy link
Contributor Author

I haven't had a chance to look further into this since yesterday but I will continue looking today.

The MIPS version of my library uses SPIM and it is fully functioning with that simulator so maybe that could be a place to start looking at how they handle the unicode?

I guess I never really looked at what was being put into memory at program start which I should have. I was focused on what was being printed out (silly of me). When I get to work today I will see exactly what you mean by it is not being fully put into memory. I had also tried loading the hexadecimal translation of the unicode as bytes followed by a null-terminated 0. The MIPS/SPIM version also was able to print using this. Maybe that is a way around RARS not fully loading the character? When trying this in RARS printed out a french u character followed by two generic boxes (as i mentioned earlier).

Thanks for all your help at looking into this.

@TheThirdOne
Copy link
Owner

TheThirdOne commented May 17, 2019

I took a look at the relevant code and strings loaded in directives assume every character is 1 byte and so only loads the bottom byte from each character into memory. The write syscall looks like it is correct if the default encoding for Strings is UTF-8. The printString syscall also makes the assumption that every character is 1 byte.

Maybe that is a way around RARS not fully loading the character?

Yeah, manually moving the correct bytes into place would get around the problems with the directives.

The way strings are stored is handled in:

private void storeStrings(TokenList tokens, Directives direct, ErrorList errors) {
Token token;
// Correctly handles case where this is a "directive continuation" line.
int tokenStart = 0;
if (tokens.get(0).getType() == TokenTypes.DIRECTIVE) {
tokenStart = 1;
}
for (int i = tokenStart; i < tokens.size(); i++) {
token = tokens.get(i);
if (token.getType() != TokenTypes.QUOTED_STRING) {
errors.add(new ErrorMessage(token.getSourceProgram(), token.getSourceLine(),
token.getStartPos(), "\"" + token.getValue()
+ "\" is not a valid character string"));
} else {
String quote = token.getValue();
char theChar;
for (int j = 1; j < quote.length() - 1; j++) {
theChar = quote.charAt(j);
if (theChar == '\\') {
theChar = quote.charAt(++j);
switch (theChar) {
case 'n':
theChar = '\n';
break;
case 't':
theChar = '\t';
break;
case 'r':
theChar = '\r';
break;
case '\\':
theChar = '\\';
break;
case '\'':
theChar = '\'';
break;
case '"':
theChar = '"';
break;
case 'b':
theChar = '\b';
break;
case 'f':
theChar = '\f';
break;
case '0':
theChar = '\0';
break;
// Not implemented: \ n = octal character (n is number)
// \ x n = hex character (n is number)
// \ u n = unicode character (n is number)
// There are of course no spaces in these escape
// codes...
}
}
try {
Globals.memory.set(this.dataAddress.get(), (int) theChar,
DataTypes.CHAR_SIZE);
} catch (AddressErrorException e) {
errors.add(new ErrorMessage(token.getSourceProgram(), token
.getSourceLine(), token.getStartPos(), "\""
+ this.dataAddress.get() + "\" is not a valid data segment address"));
}
this.dataAddress.increment(DataTypes.CHAR_SIZE);
}
if (direct == Directives.ASCIZ || direct == Directives.STRING) {
try {
Globals.memory.set(this.dataAddress.get(), 0, DataTypes.CHAR_SIZE);
} catch (AddressErrorException e) {
errors.add(new ErrorMessage(token.getSourceProgram(), token
.getSourceLine(), token.getStartPos(), "\""
+ this.dataAddress.get() + "\" is not a valid data segment address"));
}
this.dataAddress.increment(DataTypes.CHAR_SIZE);
}
}
}
} // storeStrings()

Strings are written by the write syscall in:

public static int writeToFile(int fd, byte[] myBuffer, int lengthRequested) {
/////////////// DPS 8-Jan-2013 ////////////////////////////////////////////////////
/// Write to STDOUT or STDERR file descriptor while using IDE - write to Messages pane.
if ((fd == STDOUT || fd == STDERR) && Globals.getGui() != null) {
String data = new String(myBuffer);
Globals.getGui().getMessagesPane().postRunMessage(data);
return data.length();
}
///////////////////////////////////////////////////////////////////////////////////
//// When running in command mode, code below works for either regular file or STDOUT/STDERR
if (!FileIOData.fdInUse(fd, 1)) // Check the existence of the "write" fd
{
fileErrorString = "File descriptor " + fd + " is not open for writing";
return -1;
}
// retrieve FileOutputStream from storage
OutputStream outputStream = (OutputStream) FileIOData.getStreamInUse(fd);
try {
// Oct. 9 2005 Ken Vollmar
// Observation: made a call to outputStream.write(myBuffer, 0, lengthRequested)
// with myBuffer containing 6(ten) 32-bit-words <---> 24(ten) bytes, where the
// words are MIPS integers with values such that many of the bytes are ZEROES.
// The effect is apparently that the write stops after encountering a zero-valued
// byte. (The method write does not return a value and so this can't be verified
// by the return value.)
// Writes up to lengthRequested bytes of data to this output stream from an array of bytes.
// outputStream.write(myBuffer, 0, lengthRequested); // write is a void method -- no verification value returned
// Oct. 9 2005 Ken Vollmar Force the write statement to write exactly
// the number of bytes requested, even though those bytes include many ZERO values.
for (int ii = 0; ii < lengthRequested; ii++) {
outputStream.write(myBuffer[ii]);
}
outputStream.flush();// DPS 7-Jan-2013
} catch (IOException e) {
fileErrorString = "IO Exception on write of file with fd " + fd;
return -1;
} catch (IndexOutOfBoundsException e) {
fileErrorString = "IndexOutOfBoundsException on write of file with fd" + fd;
return -1;
}
return lengthRequested;
} // end writeToFile

Strings are written by the printString syscall in:

public static String get(ProgramStatement statement, String reg) throws ExitingException {
String message = "";
int byteAddress = RegisterFile.getValue(reg);
char ch[] = {' '}; // Need an array to convert to String
try {
ch[0] = (char) Globals.memory.getByte(byteAddress);
while (ch[0] != 0) // only uses single location ch[0]
{
message = message.concat(new String(ch)); // parameter to String constructor is a char[] array
byteAddress++;
ch[0] = (char) Globals.memory.getByte(byteAddress);
}
} catch (AddressErrorException e) {
throw new ExitingException(statement, e);
}
return message;
}

@TheThirdOne
Copy link
Owner

Ripes and Venus both also don't support unicode. Ripes puts UTF-8 encoded bytes into memory, but has trouble outputting correctly. Venus detects and errors unicode in .string directives, and doesn't output correctly if manually loaded.

The minimum I will accept to close this issue is an error for trying to load unicode in a directive. A pull request adding UTF-8 support would be greatly appreciated.

@TheThirdOne
Copy link
Owner

A few issues not yet mentioned were found and todo'd in 1d5cdc3. They should all follow a similar structure to the fixes made in #29.

TheThirdOne added a commit that referenced this issue Jul 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants