Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggestion how to read binary input directly with gawk #7

Open
mogando668 opened this issue Feb 3, 2022 · 1 comment
Open

suggestion how to read binary input directly with gawk #7

mogando668 opened this issue Feb 3, 2022 · 1 comment

Comments

@mogando668
Copy link

hmmm what do you mean by that ? here's a fully-functional hex-encoder for gawk (sorry for the poor formatting - i dug it up from my pile)

even in gawk unicode-byte, i got it to hex encode 2 different binary mp3 files with ease, and without any error messages popping up (try not to use it in gawk -P posix mode - all kinds of weird behavior may bubble up. I think the octal encoder also works, but haven't tested it lately. lemme know if this works or not ?

if that offset 8^8 doesn't work, use 0xDC00 instead. if that also fails, then try the last resort of -4^4.

gawk -e 'function hexencode(str,chr) { for(chr in b2hex) { if (chr!~/[[:alnum:]%\\]/) { gsub(chr,b2hex[chr],str) } }; return str } function octencode(str,chr) { gsub(/\\/,b2oct["\\"],str); gsub(/[0-7]/,"\06&",str); for(chr in b2oct) { if(chr!~/[0-7\\]/) { gsub(chr,b2oct[chr],str) str } }; return str } BEGIN { offset=8^8;for(x=0;x<256;x++) { byte=sprintf("%c",x+offset);b2hex[byte]=sprintf("\\x%.2X",x);b2oct[byte]=sprintf("\\%03o",x) }; spc1="/\\^[]";spc2="~!@#%&_-{}:;\42\47\140 <>,$.|()*+=?"; for(x=length(spc1);x;x-=1) { byte=substr(spc1,x,1); b2hex[("\\"(byte))]=b2hex[byte]; b2oct[("\\"(byte))]=b2oct[byte]; delete b2hex[byte]; delete b2oct[byte] }; for(x=length(spc2);x;x--) { byte=substr(spc2,x,1); b2hex[("["(byte)"]")]=b2hex[byte]; b2oct[("["(byte)"]")]=b2oct[byte]; delete b2hex[byte]; delete b2oct[byte] } } BEGIN { RS=FS="^$"; OFS=""; ORS=""; } END { print hexencode($0) }'

this encoder may not be 100% to URL-encoding spec per se - it was simply i quickly slabbed together another time before. it's currently instructed to only skip encoding the alphanumeric ones, but will encode the other punctuation symbols that aren't part of the spec. feel free to modify it.

@rethab rethab changed the title awk cannot read binary ?? suggestion how to read binary input directly with gawk Feb 7, 2022
@rethab
Copy link
Owner

rethab commented Feb 7, 2022

thanks for the hint @mogando668 I'll take a look when I find some time 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants