-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV/TSV parsing issues #3
Comments
So I did some digging with nodobjc on osx (https://npmjs.org/package/NodObjC). Basically, there are multiple types of clipboard data (and excel and friends add to multiple clipboard types). On OSX at least, you can get/set the content for various types. You could theoretically load an image and the corresponding ALT text so that pasting on the command line gives you the text but pasting in photoshop gives you the image. Sample sheet is https://www.dropbox.com/s/khrr6l0g7s3dho4/CopyPaste.xlsx and I tried copying all 8 cells
The TSV part looked fine (I replaced the windows line ending with a normal \n): 1581:36:00 [h]:mm:ss
94896:00 [mm]:ss
5693760 [ss]
09 PM 9 hh AM/PM h Then there's the HTML monstrosity. In this case, I was testing with various number formats and those are explicitly plugged in the style section: .xl66
{font-weight:700;
mso-number-format:"hh\\ AM\/PM\\ h";} I have a very basic component to handle those number formats (handles date formats but is missing some stuff), so those formats can be parsed explicitly. (https://github.com/SheetJS/ssf) <html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 14">
<link id=Main-File rel=Main-File
href="file://localhost/Users/SheetJS/Library/Caches/TemporaryItems/msoclip/0/clip.htm">
<link rel=File-List
href="file://localhost/Users/SheetJS/Library/Caches/TemporaryItems/msoclip/0/clip_filelist.xml">
<style>
<!--table
{mso-displayed-decimal-separator:"\.";
mso-displayed-thousand-separator:"\,";}
@page
{margin:1.0in .75in 1.0in .75in;
mso-header-margin:.5in;
mso-footer-margin:.5in;}
td
{padding-top:1px;
padding-right:1px;
padding-left:1px;
mso-ignore:padding;
color:black;
font-size:12.0pt;
font-weight:400;
font-style:normal;
text-decoration:none;
font-family:Calibri, sans-serif;
mso-font-charset:0;
mso-number-format:General;
text-align:general;
vertical-align:bottom;
border:none;
mso-background-source:auto;
mso-pattern:auto;
mso-protection:locked visible;
white-space:nowrap;
mso-rotate:0;}
.xl63
{font-weight:700;
mso-number-format:"\[h\]\:mm\:ss";}
.xl64
{font-weight:700;
mso-number-format:"\[mm\]\:ss";}
.xl65
{font-weight:700;
mso-number-format:"\[ss\]";}
.xl66
{font-weight:700;
mso-number-format:"hh\\ AM\/PM\\ h";}
-->
</style>
</head>
<body link=blue vlink=purple>
<table border=0 cellpadding=0 cellspacing=0 width=130 style='border-collapse:
collapse;width:130pt'>
<!--StartFragment-->
<col width=65 span=2 style='width:65pt'>
<tr height=15 style='height:15.0pt'>
<td height=15 class=xl63 align=right width=65 style='height:15.0pt;
width:65pt'>1581:36:00</td>
<td width=65 style='width:65pt'>[h]:mm:ss</td>
</tr>
<tr height=15 style='height:15.0pt'>
<td height=15 class=xl64 align=right style='height:15.0pt'>94896:00</td>
<td>[mm]:ss</td>
</tr>
<tr height=15 style='height:15.0pt'>
<td height=15 class=xl65 align=right style='height:15.0pt'>5693760</td>
<td>[ss]</td>
</tr>
<tr height=15 style='height:15.0pt'>
<td height=15 class=xl66 align=right style='height:15.0pt'>09 PM 9</td>
<td>hh AM/PM h</td>
</tr>
<!--EndFragment-->
</table>
</body>
</html> |
I put up the code I used to grab that content in a separate repo: https://github.com/SheetJS/pb |
Exactly, also on Windows Excel knows more about the copied cells than there is transported over TSV in plain text clipboard. Unfortunately, in web browser I only have access to the text clipboard, AFAIK... |
So on chrome, you can look at event.clipboardData: Copy from a spreadsheet and paste into the box. Even though the content pasted is in text form, you can see both the text and the html version in the log |
This appears to be a far simpler exercise: just generate/parse the HTML output. http://jsbin.com/uwuvan/5 shows you the HTML clipboard data in an iframe As for loading data, the following seems to work:
(this populates values 1,2 in a row). The right approach would be
|
There are some issues when parsing strings copied from Excel that contain double quotes.
For example, when you write
"A
in one cell and"B
in the next one, the text copied by Excel is"A "B
. When you copy that to Notepad and then back to Excel, even Excel does not know what to do!I have tried other JavaScript CSV parsing engines and they are also confused:
I haven't tried this one because it seems NodeJS only:
I am writing this issue because I want to get back to this at some point. Anyone cares to help? Pull request #2 was a good start but more work is requied apparently.
The text was updated successfully, but these errors were encountered: