forked from apachecn/pandas-doc-zh
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathio.html
5435 lines (5044 loc) · 634 KB
/
io.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<span id="io"></span><h1><span class="yiyi-st" id="yiyi-176">IO Tools (Text, CSV, HDF5, ...)</span></h1>
<blockquote>
<p>原文:<a href="http://pandas.pydata.org/pandas-docs/stable/io.html">http://pandas.pydata.org/pandas-docs/stable/io.html</a></p>
<p>译者:<a href="https://github.com/wizardforcel">飞龙</a> <a href="http://usyiyi.cn/">UsyiyiCN</a></p>
<p>校对:(虚位以待)</p>
</blockquote>
<p><span class="yiyi-st" id="yiyi-177">pandas I / O API是一组顶级的<code class="docutils literal"><span class="pre">reader</span></code>函数,像<code class="docutils literal"><span class="pre">pd.read_csv()</span></code>访问,通常返回一个<code class="docutils literal"><span class="pre">pandas</span></code>对象。</span></p>
<blockquote>
<div><ul class="simple">
<li><span class="yiyi-st" id="yiyi-178"><a class="reference internal" href="#io-read-csv-table"><span class="std std-ref">read_csv</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-179"><a class="reference internal" href="#io-excel-reader"><span class="std std-ref">read_excel</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-180"><a class="reference internal" href="#io-hdf5"><span class="std std-ref">read_hdf</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-181"><a class="reference internal" href="#io-sql"><span class="std std-ref">read_sql</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-182"><a class="reference internal" href="#io-json-reader"><span class="std std-ref">read_json</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-183"><a class="reference internal" href="#io-msgpack"><span class="std std-ref">read_msgpack</span></a>(实验性)</span></li>
<li><span class="yiyi-st" id="yiyi-184"><a class="reference internal" href="#io-read-html"><span class="std std-ref">read_html</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-185"><a class="reference internal" href="#io-bigquery-reader"><span class="std std-ref">read_gbq</span></a>(实验性)</span></li>
<li><span class="yiyi-st" id="yiyi-186"><a class="reference internal" href="#io-stata-reader"><span class="std std-ref">read_stata</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-187"><a class="reference internal" href="#io-sas-reader"><span class="std std-ref">read_sas</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-188"><a class="reference internal" href="#io-clipboard"><span class="std std-ref">read_clipboard</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-189"><a class="reference internal" href="#io-pickle"><span class="std std-ref">read_pickle</span></a></span></li>
</ul>
</div></blockquote>
<p><span class="yiyi-st" id="yiyi-190">对应的<code class="docutils literal"><span class="pre">writer</span></code>函数是对象方法,像<code class="docutils literal"><span class="pre">df.to_csv()</span></code></span></p>
<blockquote>
<div><ul class="simple">
<li><span class="yiyi-st" id="yiyi-191"><a class="reference internal" href="#io-store-in-csv"><span class="std std-ref">to_csv</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-192"><a class="reference internal" href="#io-excel-writer"><span class="std std-ref">to_excel</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-193"><a class="reference internal" href="#io-hdf5"><span class="std std-ref">to_hdf</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-194"><a class="reference internal" href="#io-sql"><span class="std std-ref">to_sql</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-195"><a class="reference internal" href="#io-json-writer"><span class="std std-ref">to_json</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-196"><a class="reference internal" href="#io-msgpack"><span class="std std-ref">to_msgpack</span></a>(实验性)</span></li>
<li><span class="yiyi-st" id="yiyi-197"><a class="reference internal" href="#io-html"><span class="std std-ref">to_html</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-198"><a class="reference internal" href="#io-bigquery-writer"><span class="std std-ref">to_gbq</span></a>(实验)</span></li>
<li><span class="yiyi-st" id="yiyi-199"><a class="reference internal" href="#io-stata-writer"><span class="std std-ref">to_stata</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-200"><a class="reference internal" href="#io-clipboard"><span class="std std-ref">to_clipboard</span></a></span></li>
<li><span class="yiyi-st" id="yiyi-201"><a class="reference internal" href="#io-pickle"><span class="std std-ref">to_pickle</span></a></span></li>
</ul>
</div></blockquote>
<p><span class="yiyi-st" id="yiyi-202"><a class="reference internal" href="#io-perf"><span class="std std-ref">Here</span></a>是其中一些IO方法的非正式性能比较。</span></p>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-203">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-204">For examples that use the <code class="docutils literal"><span class="pre">StringIO</span></code> class, make sure you import it according to your Python version, i.e. <code class="docutils literal"><span class="pre">from</span> <span class="pre">StringIO</span> <span class="pre">import</span> <span class="pre">StringIO</span></code> for Python 2 and <code class="docutils literal"><span class="pre">from</span> <span class="pre">io</span> <span class="pre">import</span> <span class="pre">StringIO</span></code> for Python 3.</span></p>
</div>
<div class="section" id="csv-text-files">
<span id="io-read-csv-table"></span><h2><span class="yiyi-st" id="yiyi-205">CSV & Text files</span></h2>
<p><span class="yiyi-st" id="yiyi-206">用于读取文本文件的两个主力功能(a.k.a.</span><span class="yiyi-st" id="yiyi-207">平面文件)是<a class="reference internal" href="generated/pandas.read_csv.html#pandas.read_csv" title="pandas.read_csv"><code class="xref py py-func docutils literal"><span class="pre">read_csv()</span></code></a>和<a class="reference internal" href="generated/pandas.read_table.html#pandas.read_table" title="pandas.read_table"><code class="xref py py-func docutils literal"><span class="pre">read_table()</span></code></a>。</span><span class="yiyi-st" id="yiyi-208">他们都使用相同的解析代码智能地将表格数据转换为DataFrame对象。</span><span class="yiyi-st" id="yiyi-209">有关某些高级策略,请参阅<a class="reference internal" href="cookbook.html#cookbook-csv"><span class="std std-ref">cookbook</span></a>。</span></p>
<div class="section" id="parsing-options">
<h3><span class="yiyi-st" id="yiyi-210">Parsing options</span></h3>
<p><span class="yiyi-st" id="yiyi-211"><a class="reference internal" href="generated/pandas.read_csv.html#pandas.read_csv" title="pandas.read_csv"><code class="xref py py-func docutils literal"><span class="pre">read_csv()</span></code></a>和<a class="reference internal" href="generated/pandas.read_table.html#pandas.read_table" title="pandas.read_table"><code class="xref py py-func docutils literal"><span class="pre">read_table()</span></code></a></span></p>
<div class="section" id="basic">
<h4><span class="yiyi-st" id="yiyi-212">Basic</span></h4>
<dl class="docutils">
<dt><span class="yiyi-st" id="yiyi-213">filepath_or_buffer</span></dt><span class="yiyi-st" id="yiyi-227"><span class="classifier-delimiter">:</span> <span class="classifier">各种</span></span><dd><span class="yiyi-st" id="yiyi-214">要么是文件的路径(<a class="reference external" href="https://docs.python.org/3/library/stdtypes.html#str" title="(in Python v3.6)"><code class="docutils literal"><span class="pre">str</span></code></a>,<a class="reference external" href="https://docs.python.org/3/library/pathlib.html#pathlib.Path" title="(in Python v3.6)"><code class="docutils literal"><span class="pre">pathlib.Path</span></code></a>或<code class="xref py py-class docutils literal"><span class="pre">py._path.local.LocalPath</span></code>),URL(包括http, ftp和S3位置),或任何具有<code class="docutils literal"><span class="pre">read()</span></code>方法(例如打开的文件或<a class="reference external" href="https://docs.python.org/3/library/io.html#io.StringIO" title="(in Python v3.6)"><code class="xref py py-class docutils literal"><span class="pre">StringIO</span></code></a>)的对象。</span></dd>
<dt><span class="yiyi-st" id="yiyi-215">sep</span></dt><span class="yiyi-st" id="yiyi-228"> <span class="classifier-delimiter">:</span> <span class="classifier">str, defaults to <code class="docutils literal"><span class="pre">','</span></code> for <a class="reference internal" href="generated/pandas.read_csv.html#pandas.read_csv" title="pandas.read_csv"><code class="xref py py-func docutils literal"><span class="pre">read_csv()</span></code></a>, <code class="docutils literal"><span class="pre">\t</span></code> for <a class="reference internal" href="generated/pandas.read_table.html#pandas.read_table" title="pandas.read_table"><code class="xref py py-func docutils literal"><span class="pre">read_table()</span></code></a></span></span><dd><span class="yiyi-st" id="yiyi-216">分隔符使用。</span><span class="yiyi-st" id="yiyi-217">如果sep为<code class="docutils literal"><span class="pre">None</span></code>,将尝试自动确定这一点。</span><span class="yiyi-st" id="yiyi-218">长度大于1个字符且与<code class="docutils literal"><span class="pre">'\s+'</span></code>不同的分隔符将被解释为正则表达式,将强制使用Python解析引擎,并忽略数据中的引号。</span><span class="yiyi-st" id="yiyi-219">正则表达式示例:<code class="docutils literal"><span class="pre">'\\r\\t'</span></code>。</span></dd>
<dt><span class="yiyi-st" id="yiyi-220">delimiter</span></dt><span class="yiyi-st" id="yiyi-229"><span class="classifier-delimiter">:</span> <span class="classifier">str,默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-221">sep的备用参数名称。</span></dd>
<dt><span class="yiyi-st" id="yiyi-222">delim_whitespace</span></dt><span class="yiyi-st" id="yiyi-230"><span class="classifier-delimiter">:</span> <span class="classifier">boolean,default False</span></span><dd><p class="first"><span class="yiyi-st" id="yiyi-223">指定是否使用空格(例如<code class="docutils literal"><span class="pre">'</span> <span class="pre">'</span></code>或<code class="docutils literal"><span class="pre">'\t'</span></code>)作为分隔符。</span><span class="yiyi-st" id="yiyi-224">相当于设置<code class="docutils literal"><span class="pre">sep='\s+'</span></code>。</span><span class="yiyi-st" id="yiyi-225">如果此选项设置为True,则不应为<code class="docutils literal"><span class="pre">delimiter</span></code>参数传入任何内容。</span></p>
<div class="last versionadded">
<p><span class="yiyi-st" id="yiyi-226"><span class="versionmodified">版本0.18.1中的新功能:</span>支持Python解析器。</span></p>
</div>
</dd>
</dl>
</div>
<div class="section" id="column-and-index-locations-and-names">
<h4><span class="yiyi-st" id="yiyi-231">Column and Index Locations and Names</span></h4>
<dl class="docutils">
<dt><span class="yiyi-st" id="yiyi-232">标题</span></dt><span class="yiyi-st" id="yiyi-265"><span class="classifier-delimiter">:</span> <span class="classifier">int或ints列表,默认<code class="docutils literal"><span class="pre">'infer'</span></code></span></span><dd><span class="yiyi-st" id="yiyi-233">要用作列名称的行号,以及数据的开始。</span><span class="yiyi-st" id="yiyi-234">如果没有传递<code class="docutils literal"><span class="pre">names</span></code>,默认行为就像<code class="docutils literal"><span class="pre">header=0</span></code>,否则就像<code class="docutils literal"><span class="pre">header=None</span></code>。</span><span class="yiyi-st" id="yiyi-235">显式传递<code class="docutils literal"><span class="pre">header=0</span></code>,以便能够替换现有名称。</span><span class="yiyi-st" id="yiyi-236">头部可以是指定列的多索引的行位置的整数列表,例如<code class="docutils literal"><span class="pre">[0,1,3]</span></code>。</span><span class="yiyi-st" id="yiyi-237">未指定的插入行将被跳过(例如,在此示例中跳过2)。</span><span class="yiyi-st" id="yiyi-238">请注意,如果<code class="docutils literal"><span class="pre">skip_blank_lines=True</span></code>,此参数将忽略已注释的行和空行,因此header = 0表示数据的第一行,而不是文件的第一行。</span></dd>
<dt><span class="yiyi-st" id="yiyi-239">名称</span></dt><span class="yiyi-st" id="yiyi-266"><span class="classifier-delimiter">:</span> <span class="classifier">数组样,默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-240">要使用的列名称列表。</span><span class="yiyi-st" id="yiyi-241">如果文件不包含标题行,则应明确传递<code class="docutils literal"><span class="pre">header=None</span></code>。</span><span class="yiyi-st" id="yiyi-242">除非<code class="docutils literal"><span class="pre">mangle_dupe_cols=True</span></code>,此列表中的重复项是不允许的,这是默认值。</span></dd>
<dt><span class="yiyi-st" id="yiyi-243">index_col</span></dt><span class="yiyi-st" id="yiyi-267"><span class="classifier-delimiter">:</span> <span class="classifier">int或序列或<code class="docutils literal"><span class="pre">False</span></code>,默认<code class="docutils literal"><span class="pre">None</span></code> </span></span><dd><span class="yiyi-st" id="yiyi-244">用作DataFrame的行标签的列。</span><span class="yiyi-st" id="yiyi-245">如果给出序列,则使用MultiIndex。</span><span class="yiyi-st" id="yiyi-246">如果您在每行末尾都有带分隔符的格式不正确的文件,则可以考虑<code class="docutils literal"><span class="pre">index_col=False</span></code>强制将pandas强制为<em>而不是</em>使用第一列作为索引)。</span></dd>
<dt><span class="yiyi-st" id="yiyi-247">usecols</span></dt><span class="yiyi-st" id="yiyi-268"><span class="classifier-delimiter">:</span> <span class="classifier">数组样,默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-248">返回列的子集。</span><span class="yiyi-st" id="yiyi-249">此数组中的所有元素必须是位置(即,文档列中的整数索引)或对应于用户在<cite>名称</cite>中提供或从文档标题行推断的列名称的字符串。</span><span class="yiyi-st" id="yiyi-250">例如,有效的<cite>usecols</cite>参数将是[0,1,2]或['foo','bar','baz']。</span><span class="yiyi-st" id="yiyi-251">使用此参数会导致更快的解析时间和更低的内存使用率。</span></dd>
<dt><span class="yiyi-st" id="yiyi-252">as_recarray</span></dt><span class="yiyi-st" id="yiyi-269"><span class="classifier-delimiter">:</span> <span class="classifier">boolean,默认<code class="docutils literal"><span class="pre">False</span></code></span></span><dd><p class="first"><span class="yiyi-st" id="yiyi-253">DEPRECATED:此参数将在以后的版本中删除。</span><span class="yiyi-st" id="yiyi-254">请改用<code class="docutils literal"><span class="pre">pd.read_csv(...).to_records()</span></code>。</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-255">在解析数据后,返回NumPy recarray而不是DataFrame。</span><span class="yiyi-st" id="yiyi-256">如果设置为<code class="docutils literal"><span class="pre">True</span></code>,则此选项优先于<code class="docutils literal"><span class="pre">squeeze</span></code>参数。</span><span class="yiyi-st" id="yiyi-257">此外,由于行索引在此类格式中不可用,因此将忽略<code class="docutils literal"><span class="pre">index_col</span></code>参数。</span></p>
</dd>
<dt><span class="yiyi-st" id="yiyi-258">挤</span></dt><span class="yiyi-st" id="yiyi-270"><span class="classifier-delimiter">:</span> <span class="classifier">boolean,默认<code class="docutils literal"><span class="pre">False</span></code></span></span><dd><span class="yiyi-st" id="yiyi-259">如果解析的数据只包含一列,则返回一个Series。</span></dd>
<dt><span class="yiyi-st" id="yiyi-260">字首</span></dt><span class="yiyi-st" id="yiyi-271"><span class="classifier-delimiter">:</span> <span class="classifier">str,默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-261">在没有标题时添加到列号的前缀,例如'X'代表X0,X1,...</span></dd>
<dt><span class="yiyi-st" id="yiyi-262">mangle_dupe_cols</span></dt><span class="yiyi-st" id="yiyi-272"><span class="classifier-delimiter">:</span> <span class="classifier">布尔值,默认<code class="docutils literal"><span class="pre">True</span></code></span></span><dd><span class="yiyi-st" id="yiyi-263">重复的列将被指定为“X.0”...“X.N”,而不是“X”...“X”。</span><span class="yiyi-st" id="yiyi-264">如果在列中存在重复的名称,则传入False将导致覆盖数据。</span></dd>
</dl>
</div>
<div class="section" id="general-parsing-configuration">
<h4><span class="yiyi-st" id="yiyi-273">General Parsing Configuration</span></h4>
<dl class="docutils">
<dt><span class="yiyi-st" id="yiyi-274">dtype</span></dt><span class="yiyi-st" id="yiyi-315"><span class="classifier-delimiter">:</span> <span class="classifier">输入列的名称或字典 - &gt;类型,默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-275">数据或列的数据类型。</span><span class="yiyi-st" id="yiyi-276">例如。 <code class="docutils literal"><span class="pre">{'a':</span> <span class="pre">np.float64,</span> <span class="pre">'b':</span> <span class="pre">np.int32} t0>(不支持<code class="docutils literal"><span class="pre">engine='python'</span></code>)。</span></code></span><span class="yiyi-st" id="yiyi-277">使用<cite>str</cite>或<cite>对象</cite>来保留而不是解释dtype。</span></dd>
<dt><span class="yiyi-st" id="yiyi-278">驱动</span></dt><span class="yiyi-st" id="yiyi-316"><span class="classifier-delimiter">:</span> <span class="classifier">{<code class="docutils literal"><span class="pre">'c'</span></code>,<code class="docutils literal"><span class="pre">'python'</span></code>}</span></span><dd><span class="yiyi-st" id="yiyi-279">解析器引擎使用。</span><span class="yiyi-st" id="yiyi-280">C引擎速度更快,而python引擎目前更加完善。</span></dd>
<dt><span class="yiyi-st" id="yiyi-281">转换器</span></dt><span class="yiyi-st" id="yiyi-317"><span class="classifier-delimiter">:</span> <span class="classifier">dict,默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-282">说明转换某些列中的值的函数。</span><span class="yiyi-st" id="yiyi-283">键可以是整数或列标签。</span></dd>
<dt><span class="yiyi-st" id="yiyi-284">true_values</span></dt><span class="yiyi-st" id="yiyi-318"><span class="classifier-delimiter">:</span> <span class="classifier">列表,默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-285">要考虑的值为<code class="docutils literal"><span class="pre">True</span></code>。</span></dd>
<dt><span class="yiyi-st" id="yiyi-286">false_values</span></dt><span class="yiyi-st" id="yiyi-319"><span class="classifier-delimiter">:</span> <span class="classifier">列表,默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-287">要考虑的值为<code class="docutils literal"><span class="pre">False</span></code>。</span></dd>
<dt><span class="yiyi-st" id="yiyi-288">skipinitialspace</span></dt><span class="yiyi-st" id="yiyi-320"><span class="classifier-delimiter">:</span> <span class="classifier">boolean,默认<code class="docutils literal"><span class="pre">False</span></code></span></span><dd><span class="yiyi-st" id="yiyi-289">跳过分隔符后的空格。</span></dd>
<dt><span class="yiyi-st" id="yiyi-290">skiprows</span></dt><span class="yiyi-st" id="yiyi-321"><span class="classifier-delimiter">:</span> <span class="classifier">列表状或整数,默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-291">要跳过的行号(0索引)或要跳过的行数(int)在文件的开头。</span></dd>
<dt><span class="yiyi-st" id="yiyi-292">skipfooter</span></dt><span class="yiyi-st" id="yiyi-322"><span class="classifier-delimiter">:</span> <span class="classifier">int,默认<code class="docutils literal"><span class="pre">0</span></code></span></span><dd><span class="yiyi-st" id="yiyi-293">要跳过的文件底部的行数(不支持engine ='c')。</span></dd>
<dt><span class="yiyi-st" id="yiyi-294">skip_footer</span></dt><span class="yiyi-st" id="yiyi-323"><span class="classifier-delimiter">:</span> <span class="classifier">int,默认<code class="docutils literal"><span class="pre">0</span></code></span></span><dd><span class="yiyi-st" id="yiyi-295">DEPRECATED:使用<code class="docutils literal"><span class="pre">skipfooter</span></code>参数,因为它们是相同的</span></dd>
<dt><span class="yiyi-st" id="yiyi-296">nrows</span></dt><span class="yiyi-st" id="yiyi-324"><span class="classifier-delimiter">:</span> <span class="classifier">int,默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-297">要读取的文件的行数。</span><span class="yiyi-st" id="yiyi-298">适用于读取大文件的片段。</span></dd>
<dt><span class="yiyi-st" id="yiyi-299">内存不足</span></dt><span class="yiyi-st" id="yiyi-325"><span class="classifier-delimiter">:</span> <span class="classifier">布尔值,默认<code class="docutils literal"><span class="pre">True</span></code></span></span><dd><span class="yiyi-st" id="yiyi-300">在内部以块的方式处理文件,导致解析时内存使用较少,但可能是混合类型推断。</span><span class="yiyi-st" id="yiyi-301">要确保没有混合类型,请设置<code class="docutils literal"><span class="pre">False</span></code>,或使用<code class="docutils literal"><span class="pre">dtype</span></code>参数指定类型。</span><span class="yiyi-st" id="yiyi-302">请注意,无论如何,整个文件都读入单个DataFrame,请使用<code class="docutils literal"><span class="pre">chunksize</span></code>或<code class="docutils literal"><span class="pre">iterator</span></code>参数以块形式返回数据。</span><span class="yiyi-st" id="yiyi-303">(只有C解析器有效)</span></dd>
<dt><span class="yiyi-st" id="yiyi-304">buffer_lines</span></dt><span class="yiyi-st" id="yiyi-326"><span class="classifier-delimiter">:</span> <span class="classifier">int,默认无</span></span><dd><span class="yiyi-st" id="yiyi-305">DEPRECATED:此参数将在未来版本中删除,因为其值不受解析器的影响</span></dd>
<dt><span class="yiyi-st" id="yiyi-306">compact_ints</span></dt><span class="yiyi-st" id="yiyi-327"><span class="classifier-delimiter">:</span> <span class="classifier">boolean,default False</span></span><dd><p class="first"><span class="yiyi-st" id="yiyi-307">DEPRECATED:此参数将在以后的版本中删除</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-308">如果<code class="docutils literal"><span class="pre">compact_ints</span></code>是<code class="docutils literal"><span class="pre">True</span></code>,则对于任何整数为dtype的列,解析器将尝试将其作为最小整数<code class="docutils literal"><span class="pre">dtype</span></code>根据<code class="docutils literal"><span class="pre">use_unsigned</span></code>参数的规范,可以是有符号或无符号。</span></p>
</dd>
<dt><span class="yiyi-st" id="yiyi-309">use_unsigned</span></dt><span class="yiyi-st" id="yiyi-328"><span class="classifier-delimiter">:</span> <span class="classifier">boolean,default False</span></span><dd><p class="first"><span class="yiyi-st" id="yiyi-310">DEPRECATED:此参数将在以后的版本中删除</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-311">如果整数列被压缩(即<code class="docutils literal"><span class="pre">compact_ints=True</span></code>),请指定该列是否应压缩到最小有符号或无符号整数dtype。</span></p>
</dd>
<dt><span class="yiyi-st" id="yiyi-312">memory_map</span></dt><span class="yiyi-st" id="yiyi-329"><span class="classifier-delimiter">:</span> <span class="classifier">boolean,default False</span></span><dd><span class="yiyi-st" id="yiyi-313">如果为<code class="docutils literal"><span class="pre">filepath_or_buffer</span></code>提供了文件路径,则将文件对象直接映射到内存上,并从中直接访问数据。</span><span class="yiyi-st" id="yiyi-314">使用此选项可以提高性能,因为不再有任何I / O开销。</span></dd>
</dl>
</div>
<div class="section" id="na-and-missing-data-handling">
<h4><span class="yiyi-st" id="yiyi-330">NA and Missing Data Handling</span></h4>
<dl class="docutils">
<dt><span class="yiyi-st" id="yiyi-331">na_values</span></dt><span class="yiyi-st" id="yiyi-344"><span class="classifier-delimiter">:</span> <span class="classifier">标量,str,列表式或dict,默认<code class="docutils literal"><span class="pre">None</span></code> </span></span><dd><span class="yiyi-st" id="yiyi-332">可识别为NA / NaN的其他字符串。</span><span class="yiyi-st" id="yiyi-333">如果dict通过,特定的每列NA值。</span><span class="yiyi-st" id="yiyi-334">By default the following values are interpreted as NaN: <code class="docutils literal"><span class="pre">'-1.#IND',</span> <span class="pre">'1.#QNAN',</span> <span class="pre">'1.#IND',</span> <span class="pre">'-1.#QNAN',</span> <span class="pre">'#N/A</span> <span class="pre">N/A',</span> <span class="pre">'#N/A',</span> <span class="pre">'N/A',</span> <span class="pre">'NA',</span> <span class="pre">'#NA',</span> <span class="pre">'NULL',</span> <span class="pre">'NaN',</span> <span class="pre">'-NaN',</span> <span class="pre">'nan',</span> <span class="pre">'-nan',</span> <span class="pre">''</span></code>.</span></dd>
<dt><span class="yiyi-st" id="yiyi-335">keep_default_na</span></dt><span class="yiyi-st" id="yiyi-345"><span class="classifier-delimiter">:</span> <span class="classifier">布尔值,默认<code class="docutils literal"><span class="pre">True</span></code></span></span><dd><span class="yiyi-st" id="yiyi-336">如果指定了na_values且keep_default_na为<code class="docutils literal"><span class="pre">False</span></code>,则将覆盖默认NaN值,否则将追加到。</span></dd>
<dt><span class="yiyi-st" id="yiyi-337">na_filter</span></dt><span class="yiyi-st" id="yiyi-346"><span class="classifier-delimiter">:</span> <span class="classifier">布尔值,默认<code class="docutils literal"><span class="pre">True</span></code></span></span><dd><span class="yiyi-st" id="yiyi-338">检测缺失值标记(空字符串和na_values的值)。</span><span class="yiyi-st" id="yiyi-339">在没有任何NA的数据中,传递<code class="docutils literal"><span class="pre">na_filter=False</span></code>可以提高读取大文件的性能。</span></dd>
<dt><span class="yiyi-st" id="yiyi-340">详细</span></dt><span class="yiyi-st" id="yiyi-347"><span class="classifier-delimiter">:</span> <span class="classifier">boolean,默认<code class="docutils literal"><span class="pre">False</span></code></span></span><dd><span class="yiyi-st" id="yiyi-341">指示放置在非数字列中的NA值的数量。</span></dd>
<dt><span class="yiyi-st" id="yiyi-342">skip_blank_lines</span></dt><span class="yiyi-st" id="yiyi-348"><span class="classifier-delimiter">:</span> <span class="classifier">布尔值,默认<code class="docutils literal"><span class="pre">True</span></code></span></span><dd><span class="yiyi-st" id="yiyi-343">如果<code class="docutils literal"><span class="pre">True</span></code>,则跳过空白行,而不是解释为NaN值。</span></dd>
</dl>
</div>
<div class="section" id="datetime-handling">
<h4><span class="yiyi-st" id="yiyi-349">Datetime Handling</span></h4>
<dl class="docutils">
<dt><span class="yiyi-st" id="yiyi-350">parse_dates</span></dt><span class="yiyi-st" id="yiyi-366"><span class="classifier-delimiter">:</span> <span class="classifier">布尔值或整数或名称列表或列表或dict列表,默认<code class="docutils literal"><span class="pre">False</span></code>。</span></span><dd><ul class="first last simple">
<li><span class="yiyi-st" id="yiyi-351">如果<code class="docutils literal"><span class="pre">True</span></code> - >尝试解析索引。</span></li>
<li><span class="yiyi-st" id="yiyi-352">如果<code class="docutils literal"><span class="pre">[1,</span> <span class="pre">2,</span> <span class="pre">3]</span></code> - >尝试将列1,2,3分别解析为单独的日期列。</span></li>
<li><span class="yiyi-st" id="yiyi-353">如果<code class="docutils literal"><span class="pre">[[1,</span> <span class="pre">3]]</span></code> - >合并列1和3并解析为单个日期列。</span></li>
<li><span class="yiyi-st" id="yiyi-354">如果<code class="docutils literal"><span class="pre">{'foo'</span> <span class="pre">:</span> <span class="pre">[1,</span> <span class="pre">3]}</span> 列1,3作为日期和调用结果'foo'。</code></span><span class="yiyi-st" id="yiyi-355">iso8601格式的日期存在快速路径。</span></li>
</ul>
</dd>
<dt><span class="yiyi-st" id="yiyi-356">infer_datetime_format</span></dt><span class="yiyi-st" id="yiyi-367"><span class="classifier-delimiter">:</span> <span class="classifier">boolean,默认<code class="docutils literal"><span class="pre">False</span></code></span></span><dd><span class="yiyi-st" id="yiyi-357">If <code class="docutils literal"><span class="pre">True</span></code> and parse_dates is enabled for a column, attempt to infer the datetime format to speed up the processing.</span></dd>
<dt><span class="yiyi-st" id="yiyi-358">keep_date_col</span></dt><span class="yiyi-st" id="yiyi-368"><span class="classifier-delimiter">:</span> <span class="classifier">boolean,默认<code class="docutils literal"><span class="pre">False</span></code></span></span><dd><span class="yiyi-st" id="yiyi-359">如果<code class="docutils literal"><span class="pre">True</span></code>和parse_dates指定合并多个列,则保留原始列。</span></dd>
<dt><span class="yiyi-st" id="yiyi-360">date_parser</span></dt><span class="yiyi-st" id="yiyi-369"><span class="classifier-delimiter">:</span> <span class="classifier">功能,默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-361">用于将字符串列序列转换为datetime实例数组的函数。</span><span class="yiyi-st" id="yiyi-362">默认使用<code class="docutils literal"><span class="pre">dateutil.parser.parser</span></code>进行转换。</span><span class="yiyi-st" id="yiyi-363">Pandas将尝试以三种不同的方式调用date_parser,如果发生异常,则推进到下一个:1)将一个或多个数组(由parse_dates定义)作为参数传递; 2)将由parse_dates定义的列中的字符串值连接(逐行)到单个数组中,并传递;和3)对于每一行,使用一个或多个字符串(对应于由parse_dates定义的列)作为参数调用date_parser一次。</span></dd>
<dt><span class="yiyi-st" id="yiyi-364">日间</span></dt><span class="yiyi-st" id="yiyi-370"><span class="classifier-delimiter">:</span> <span class="classifier">boolean,默认<code class="docutils literal"><span class="pre">False</span></code></span></span><dd><span class="yiyi-st" id="yiyi-365">DD / MM格式日期,国际和欧洲格式。</span></dd>
</dl>
</div>
<div class="section" id="iteration">
<h4><span class="yiyi-st" id="yiyi-371">Iteration</span></h4>
<dl class="docutils">
<dt><span class="yiyi-st" id="yiyi-372">迭代器</span></dt><span class="yiyi-st" id="yiyi-377"><span class="classifier-delimiter">:</span> <span class="classifier">boolean,默认<code class="docutils literal"><span class="pre">False</span></code></span></span><dd><span class="yiyi-st" id="yiyi-373">返回<cite>TextFileReader</cite>对象以进行迭代或使用<code class="docutils literal"><span class="pre">get_chunk()</span></code>获取块。</span></dd>
<dt><span class="yiyi-st" id="yiyi-374">chunksize</span></dt><span class="yiyi-st" id="yiyi-378"><span class="classifier-delimiter">:</span> <span class="classifier">int,默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-375">返回<cite>TextFileReader</cite>对象以进行迭代。</span><span class="yiyi-st" id="yiyi-376">请参阅下面的<a class="reference internal" href="#io-chunking"><span class="std std-ref">iterating and chunking</span></a>。</span></dd>
</dl>
</div>
<div class="section" id="quoting-compression-and-file-format">
<h4><span class="yiyi-st" id="yiyi-379">Quoting, Compression, and File Format</span></h4>
<dl class="docutils">
<dt><span class="yiyi-st" id="yiyi-380">压缩</span></dt><span class="yiyi-st" id="yiyi-422"> <span class="classifier-delimiter">:</span> <span class="classifier">{<code class="docutils literal"><span class="pre">'infer'</span></code>, <code class="docutils literal"><span class="pre">'gzip'</span></code>, <code class="docutils literal"><span class="pre">'bz2'</span></code>, <code class="docutils literal"><span class="pre">'zip'</span></code>, <code class="docutils literal"><span class="pre">'xz'</span></code>, <code class="docutils literal"><span class="pre">None</span></code>}, default <code class="docutils literal"><span class="pre">'infer'</span></code></span></span><dd><p class="first"><span class="yiyi-st" id="yiyi-381">用于磁盘上数据的即时解压缩。</span><span class="yiyi-st" id="yiyi-382">如果'infer',则使用gzip,bz2,zip或xz,如果filepath_or_buffer是分别以'.gz','.bz2','.zip'或'.xz'结尾的字符串,否则不进行解压缩。</span><span class="yiyi-st" id="yiyi-383">如果使用'zip',ZIP文件必须只包含一个要读入的数据文件。</span><span class="yiyi-st" id="yiyi-384">设置为<code class="docutils literal"><span class="pre">None</span></code>,表示无解压缩。</span></p>
<div class="last versionadded">
<p><span class="yiyi-st" id="yiyi-385"><span class="versionmodified">新版本0.18.1:</span>支持'zip'和'xz'压缩。</span></p>
</div>
</dd>
<dt><span class="yiyi-st" id="yiyi-386">数千</span></dt><span class="yiyi-st" id="yiyi-423"><span class="classifier-delimiter">:</span> <span class="classifier">str,默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-387">千位分隔符。</span></dd>
<dt><span class="yiyi-st" id="yiyi-388">十进制</span></dt><span class="yiyi-st" id="yiyi-424"><span class="classifier-delimiter">:</span> <span class="classifier">str,default <code class="docutils literal"><span class="pre">'.'</span></code></span></span><dd><span class="yiyi-st" id="yiyi-389">识别为小数点的字符。</span><span class="yiyi-st" id="yiyi-390">例如。对欧洲数据使用<code class="docutils literal"><span class="pre">','</span></code>。</span></dd>
<dt><span class="yiyi-st" id="yiyi-391">float_precision</span></dt><span class="yiyi-st" id="yiyi-425"><span class="classifier-delimiter">:</span> <span class="classifier">字符串,默认为无</span></span><dd><span class="yiyi-st" id="yiyi-392">指定C引擎应该为浮点值使用哪个转换器。</span><span class="yiyi-st" id="yiyi-393">对于普通转换器,选项为<code class="docutils literal"><span class="pre">None</span></code>,对于高精度转换器,选项为<code class="docutils literal"><span class="pre">high</span></code>,对于往返转换器选项为<code class="docutils literal"><span class="pre">round_trip</span></code>。</span></dd>
<dt><span class="yiyi-st" id="yiyi-394">线性判定器</span></dt><span class="yiyi-st" id="yiyi-426"><span class="classifier-delimiter">:</span> <span class="classifier">str(长度1),默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-395">将文件拆分成行的字符。</span><span class="yiyi-st" id="yiyi-396">只有C解析器有效。</span></dd>
<dt><span class="yiyi-st" id="yiyi-397">匹配</span></dt><span class="yiyi-st" id="yiyi-427"><span class="classifier-delimiter">:</span> <span class="classifier">str(length 1)</span></span><dd><span class="yiyi-st" id="yiyi-398">用于表示带引号项目的开始和结束的字符。</span><span class="yiyi-st" id="yiyi-399">引号项可以包含分隔符,它将被忽略。</span></dd>
<dt><span class="yiyi-st" id="yiyi-400">引用</span></dt><span class="yiyi-st" id="yiyi-428"><span class="classifier-delimiter">:</span> <span class="classifier">int或<code class="docutils literal"><span class="pre">csv.QUOTE_*</span></code>实例,默认<code class="docutils literal"><span class="pre">0</span></code> </span></span><dd><span class="yiyi-st" id="yiyi-401">每个<code class="docutils literal"><span class="pre">csv.QUOTE_*</span></code>常量的控制字段引用行为。</span><span class="yiyi-st" id="yiyi-402">使用<code class="docutils literal"><span class="pre">QUOTE_MINIMAL</span></code>(0),<code class="docutils literal"><span class="pre">QUOTE_ALL</span></code>(1),<code class="docutils literal"><span class="pre">QUOTE_NONNUMERIC</span></code>(2)或<code class="docutils literal"><span class="pre">QUOTE_NONE</span></code> 。</span></dd>
<dt><span class="yiyi-st" id="yiyi-403">双引号</span></dt><span class="yiyi-st" id="yiyi-429"><span class="classifier-delimiter">:</span> <span class="classifier">布尔值,默认<code class="docutils literal"><span class="pre">True</span></code></span></span><dd><span class="yiyi-st" id="yiyi-404">When <code class="docutils literal"><span class="pre">quotechar</span></code> is specified and <code class="docutils literal"><span class="pre">quoting</span></code> is not <code class="docutils literal"><span class="pre">QUOTE_NONE</span></code>, indicate whether or not to interpret two consecutive <code class="docutils literal"><span class="pre">quotechar</span></code> elements <strong>inside</strong> a field as a single <code class="docutils literal"><span class="pre">quotechar</span></code> element.</span></dd>
<dt><span class="yiyi-st" id="yiyi-405">escapechar</span></dt><span class="yiyi-st" id="yiyi-430"><span class="classifier-delimiter">:</span> <span class="classifier">str(长度1),默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-406">引号时用于转义分隔符的单字符字符串为<code class="docutils literal"><span class="pre">QUOTE_NONE</span></code>。</span></dd>
<dt><span class="yiyi-st" id="yiyi-407">评论</span></dt><span class="yiyi-st" id="yiyi-431"><span class="classifier-delimiter">:</span> <span class="classifier">str,默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-408">表示不应解析行的剩余部分。</span><span class="yiyi-st" id="yiyi-409">如果在行的开头找到,则该行将完全被忽略。</span><span class="yiyi-st" id="yiyi-410">此参数必须为单个字符。</span><span class="yiyi-st" id="yiyi-411">与空行一样(只要<code class="docutils literal"><span class="pre">skip_blank_lines=True</span></code>),完全注释的行就会被参数<cite>头</cite>忽略,而不会被<cite>skiprows</cite>忽略。</span><span class="yiyi-st" id="yiyi-412">例如,如果<code class="docutils literal"><span class="pre">comment='#'</span></code>,用<cite>header = 0</cite>解析'#empty \ na,b,c \ n1,2,3' ,b,c'被当作报头。</span></dd>
<dt><span class="yiyi-st" id="yiyi-413">编码</span></dt><span class="yiyi-st" id="yiyi-432"><span class="classifier-delimiter">:</span> <span class="classifier">str,默认<code class="docutils literal"><span class="pre">None</span></code></span></span><dd><span class="yiyi-st" id="yiyi-414">在读/写时用于UTF的编码(例如<code class="docutils literal"><span class="pre">'utf-8'</span></code>)。</span><span class="yiyi-st" id="yiyi-415"><a class="reference external" href="https://docs.python.org/3/library/codecs.html#standard-encodings">Python标准编码列表</a>。</span></dd>
<dt><span class="yiyi-st" id="yiyi-416">方言</span></dt><span class="yiyi-st" id="yiyi-433"><span class="classifier-delimiter">:</span> <span class="classifier">str或<a class="reference external" href="https://docs.python.org/3/library/csv.html#csv.Dialect" title="(in Python v3.6)"><code class="docutils literal"><span class="pre">csv.Dialect</span></code></a>实例,默认<code class="docutils literal"><span class="pre">None</span></code> </span></span><dd><span class="yiyi-st" id="yiyi-417">如果<code class="docutils literal"><span class="pre">None</span></code>默认为Excel方言。</span><span class="yiyi-st" id="yiyi-418">如果sep长于1个字符,则忽略。</span><span class="yiyi-st" id="yiyi-419">有关详细信息,请参阅<a class="reference external" href="https://docs.python.org/3/library/csv.html#csv.Dialect" title="(in Python v3.6)"><code class="docutils literal"><span class="pre">csv.Dialect</span></code></a>文档。</span></dd>
<dt><span class="yiyi-st" id="yiyi-420">tupleize_cols</span></dt><span class="yiyi-st" id="yiyi-434"><span class="classifier-delimiter">:</span> <span class="classifier">boolean,默认<code class="docutils literal"><span class="pre">False</span></code></span></span><dd><span class="yiyi-st" id="yiyi-421">将列上的元组列表保留为原样(默认是将列转换为MultiIndex)。</span></dd>
</dl>
</div>
<div class="section" id="error-handling">
<h4><span class="yiyi-st" id="yiyi-435">Error Handling</span></h4>
<dl class="docutils">
<dt><span class="yiyi-st" id="yiyi-436">error_bad_lines</span></dt><span class="yiyi-st" id="yiyi-442"><span class="classifier-delimiter">:</span> <span class="classifier">布尔值,默认<code class="docutils literal"><span class="pre">True</span></code></span></span><dd><span class="yiyi-st" id="yiyi-437">默认情况下,具有太多字段的行(例如,具有太多逗号的csv行)将引发异常,并且不会返回DataFrame。</span><span class="yiyi-st" id="yiyi-438">如果<code class="docutils literal"><span class="pre">False</span></code>,那么这些“坏行”将从返回的DataFrame中删除(仅对C解析器有效)。</span><span class="yiyi-st" id="yiyi-439">请参阅下面的<a class="reference internal" href="#io-bad-lines"><span class="std std-ref">bad lines</span></a>。</span></dd>
<dt><span class="yiyi-st" id="yiyi-440">warn_bad_lines</span></dt><span class="yiyi-st" id="yiyi-443"><span class="classifier-delimiter">:</span> <span class="classifier">布尔值,默认<code class="docutils literal"><span class="pre">True</span></code></span></span><dd><span class="yiyi-st" id="yiyi-441">如果error_bad_lines为<code class="docutils literal"><span class="pre">False</span></code>,而warn_bad_lines为<code class="docutils literal"><span class="pre">True</span></code>,则会输出每个“坏行”的警告(仅对C解析器有效)。</span></dd>
</dl>
<p><span class="yiyi-st" id="yiyi-444">考虑一个典型的CSV文件,在这种情况下,包含一些时间序列数据:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [1]: </span><span class="k">print</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="s1">'foo.csv'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="go">date,A,B,C</span>
<span class="go">20090101,a,1,2</span>
<span class="go">20090102,b,3,4</span>
<span class="go">20090103,c,4,5</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-445"><cite>read_csv</cite>的默认值是创建具有简单编号行的DataFrame:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [2]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'foo.csv'</span><span class="p">)</span>
<span class="gr">Out[2]: </span>
<span class="go"> date A B C</span>
<span class="go">0 20090101 a 1 2</span>
<span class="go">1 20090102 b 3 4</span>
<span class="go">2 20090103 c 4 5</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-446">在索引数据的情况下,您可以传递要用作索引的列号或列名:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [3]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'foo.csv'</span><span class="p">,</span> <span class="n">index_col</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="gr">Out[3]: </span>
<span class="go"> A B C</span>
<span class="go">date </span>
<span class="go">20090101 a 1 2</span>
<span class="go">20090102 b 3 4</span>
<span class="go">20090103 c 4 5</span>
</pre></div>
</div>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [4]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'foo.csv'</span><span class="p">,</span> <span class="n">index_col</span><span class="o">=</span><span class="s1">'date'</span><span class="p">)</span>
<span class="gr">Out[4]: </span>
<span class="go"> A B C</span>
<span class="go">date </span>
<span class="go">20090101 a 1 2</span>
<span class="go">20090102 b 3 4</span>
<span class="go">20090103 c 4 5</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-447">您还可以使用列列表创建层次索引:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [5]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'foo.csv'</span><span class="p">,</span> <span class="n">index_col</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'A'</span><span class="p">])</span>
<span class="gr">Out[5]: </span>
<span class="go"> B C</span>
<span class="go">date A </span>
<span class="go">20090101 a 1 2</span>
<span class="go">20090102 b 3 4</span>
<span class="go">20090103 c 4 5</span>
</pre></div>
</div>
<p id="io-dialect"><span class="yiyi-st" id="yiyi-448"><code class="docutils literal"><span class="pre">dialect</span></code>关键字在指定文件格式时具有更大的灵活性。</span><span class="yiyi-st" id="yiyi-449">默认情况下,它使用Excel方言,但您可以指定方言名称或<a class="reference external" href="https://docs.python.org/3/library/csv.html#csv.Dialect" title="(in Python v3.6)"><code class="docutils literal"><span class="pre">csv.Dialect</span></code></a>实例。</span></p>
<p><span class="yiyi-st" id="yiyi-450">假设您有未封闭的引号的数据:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [6]: </span><span class="k">print</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="go">label1,label2,label3</span>
<span class="go">index1,"a,c,e</span>
<span class="go">index2,b,d,f</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-451">默认情况下,<code class="docutils literal"><span class="pre">read_csv</span></code>使用Excel方言,并将双引号作为引号字符,这会导致它在找到换行符之前找到换行符时失败。</span></p>
<p><span class="yiyi-st" id="yiyi-452">我们可以使用<code class="docutils literal"><span class="pre">dialect</span></code>解决这个问题</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [7]: </span><span class="n">dia</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">excel</span><span class="p">()</span>
<span class="gp">In [8]: </span><span class="n">dia</span><span class="o">.</span><span class="n">quoting</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">QUOTE_NONE</span>
<span class="gp">In [9]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">dialect</span><span class="o">=</span><span class="n">dia</span><span class="p">)</span>
<span class="gr">Out[9]: </span>
<span class="go"> label1 label2 label3</span>
<span class="go">index1 "a c e</span>
<span class="go">index2 b d f</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-453">所有的方言选项可以通过关键字参数单独指定:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [10]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'a,b,c~1,2,3~4,5,6'</span>
<span class="gp">In [11]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">lineterminator</span><span class="o">=</span><span class="s1">'~'</span><span class="p">)</span>
<span class="gr">Out[11]: </span>
<span class="go"> a b c</span>
<span class="go">0 1 2 3</span>
<span class="go">1 4 5 6</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-454">另一个常用的方言选项是<code class="docutils literal"><span class="pre">skipinitialspace</span></code>,跳过分隔符后的任何空格:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [12]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'a, b, c</span><span class="se">\n</span><span class="s1">1, 2, 3</span><span class="se">\n</span><span class="s1">4, 5, 6'</span>
<span class="gp">In [13]: </span><span class="k">print</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="go">a, b, c</span>
<span class="go">1, 2, 3</span>
<span class="go">4, 5, 6</span>
<span class="gp">In [14]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">skipinitialspace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="gr">Out[14]: </span>
<span class="go"> a b c</span>
<span class="go">0 1 2 3</span>
<span class="go">1 4 5 6</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-455">解析器每一次尝试“做正确的事情”,而不是非常脆弱。</span><span class="yiyi-st" id="yiyi-456">类型推理是一个相当大的交易。</span><span class="yiyi-st" id="yiyi-457">因此,如果列可以强制转换为整数dtype,而不改变内容,它将这样做。</span><span class="yiyi-st" id="yiyi-458">任何非数字列将通过作为对象dtype与其余的pandas对象。</span></p>
</div>
</div>
<div class="section" id="specifying-column-data-types">
<span id="io-dtypes"></span><h3><span class="yiyi-st" id="yiyi-459">Specifying column data types</span></h3>
<p><span class="yiyi-st" id="yiyi-460">从v0.10开始,可以指定整个DataFrame或单独列的数据类型:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [15]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'a,b,c</span><span class="se">\n</span><span class="s1">1,2,3</span><span class="se">\n</span><span class="s1">4,5,6</span><span class="se">\n</span><span class="s1">7,8,9'</span>
<span class="gp">In [16]: </span><span class="k">print</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="go">a,b,c</span>
<span class="go">1,2,3</span>
<span class="go">4,5,6</span>
<span class="go">7,8,9</span>
<span class="gp">In [17]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">object</span><span class="p">)</span>
<span class="gp">In [18]: </span><span class="n">df</span>
<span class="gr">Out[18]: </span>
<span class="go"> a b c</span>
<span class="go">0 1 2 3</span>
<span class="go">1 4 5 6</span>
<span class="go">2 7 8 9</span>
<span class="gp">In [19]: </span><span class="n">df</span><span class="p">[</span><span class="s1">'a'</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
<span class="gr">Out[19]: </span><span class="s1">'1'</span>
<span class="gp">In [20]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="p">{</span><span class="s1">'b'</span><span class="p">:</span> <span class="nb">object</span><span class="p">,</span> <span class="s1">'c'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">float64</span><span class="p">})</span>
<span class="gp">In [21]: </span><span class="n">df</span><span class="o">.</span><span class="n">dtypes</span>
<span class="gr">Out[21]: </span>
<span class="go">a int64</span>
<span class="go">b object</span>
<span class="go">c float64</span>
<span class="go">dtype: object</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-461">幸运的是,<code class="docutils literal"><span class="pre">pandas</span></code>提供了多种方法来确保您的列只包含一个<code class="docutils literal"><span class="pre">dtype</span></code>。</span><span class="yiyi-st" id="yiyi-462">If you’re unfamiliar with these concepts, you can see <a class="reference internal" href="basics.html#basics-dtypes"><span class="std std-ref">here</span></a> to learn more about dtypes, and <a class="reference internal" href="basics.html#basics-object-conversion"><span class="std std-ref">here</span></a> to learn more about <code class="docutils literal"><span class="pre">object</span></code> conversion in <code class="docutils literal"><span class="pre">pandas</span></code>.</span></p>
<p><span class="yiyi-st" id="yiyi-463">例如,您可以使用<a class="reference internal" href="generated/pandas.read_csv.html#pandas.read_csv" title="pandas.read_csv"><code class="xref py py-func docutils literal"><span class="pre">read_csv()</span></code></a>的<code class="docutils literal"><span class="pre">converters</span></code>参数:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [22]: </span><span class="n">data</span> <span class="o">=</span> <span class="s2">"col_1</span><span class="se">\n</span><span class="s2">1</span><span class="se">\n</span><span class="s2">2</span><span class="se">\n</span><span class="s2">'A'</span><span class="se">\n</span><span class="s2">4.22"</span>
<span class="gp">In [23]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">converters</span><span class="o">=</span><span class="p">{</span><span class="s1">'col_1'</span><span class="p">:</span><span class="nb">str</span><span class="p">})</span>
<span class="gp">In [24]: </span><span class="n">df</span>
<span class="gr">Out[24]: </span>
<span class="go"> col_1</span>
<span class="go">0 1</span>
<span class="go">1 2</span>
<span class="go">2 'A'</span>
<span class="go">3 4.22</span>
<span class="gp">In [25]: </span><span class="n">df</span><span class="p">[</span><span class="s1">'col_1'</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="nb">type</span><span class="p">)</span><span class="o">.</span><span class="n">value_counts</span><span class="p">()</span>
<span class="gr">Out[25]: </span>
<span class="go"><type 'str'> 4</span>
<span class="go">Name: col_1, dtype: int64</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-464">或者您可以使用<a class="reference internal" href="generated/pandas.to_numeric.html#pandas.to_numeric" title="pandas.to_numeric"><code class="xref py py-func docutils literal"><span class="pre">to_numeric()</span></code></a>函数在读取数据后强制dtypes,</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [26]: </span><span class="n">df2</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
<span class="gp">In [27]: </span><span class="n">df2</span><span class="p">[</span><span class="s1">'col_1'</span><span class="p">]</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">to_numeric</span><span class="p">(</span><span class="n">df2</span><span class="p">[</span><span class="s1">'col_1'</span><span class="p">],</span> <span class="n">errors</span><span class="o">=</span><span class="s1">'coerce'</span><span class="p">)</span>
<span class="gp">In [28]: </span><span class="n">df2</span>
<span class="gr">Out[28]: </span>
<span class="go"> col_1</span>
<span class="go">0 1.00</span>
<span class="go">1 2.00</span>
<span class="go">2 NaN</span>
<span class="go">3 4.22</span>
<span class="gp">In [29]: </span><span class="n">df2</span><span class="p">[</span><span class="s1">'col_1'</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="nb">type</span><span class="p">)</span><span class="o">.</span><span class="n">value_counts</span><span class="p">()</span>
<span class="gr">Out[29]: </span>
<span class="go"><type 'float'> 4</span>
<span class="go">Name: col_1, dtype: int64</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-465">它会将所有有效的解析转换为浮点数,将无效的解析作为<code class="docutils literal"><span class="pre">NaN</span></code>。</span></p>
<p><span class="yiyi-st" id="yiyi-466">最终,如何处理在包含混合dtypes的列中读取取决于您的具体需求。</span><span class="yiyi-st" id="yiyi-467">在上面的情况下,如果你想<code class="docutils literal"><span class="pre">NaN</span></code>输出数据异常,则<a class="reference internal" href="generated/pandas.to_numeric.html#pandas.to_numeric" title="pandas.to_numeric"><code class="xref py py-func docutils literal"><span class="pre">to_numeric()</span></code></a>可能是你最好的选择。</span><span class="yiyi-st" id="yiyi-468">然而,如果你想要强制所有的数据,不管类型,然后使用<a class="reference internal" href="generated/pandas.read_csv.html#pandas.read_csv" title="pandas.read_csv"><code class="xref py py-func docutils literal"><span class="pre">read_csv()</span></code></a>的<code class="docutils literal"><span class="pre">converters</span></code>参数肯定是值得尝试。</span></p>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-469">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-470"><code class="docutils literal"><span class="pre">dtype</span></code>选项目前仅受C引擎支持。</span><span class="yiyi-st" id="yiyi-471">使用<code class="docutils literal"><span class="pre">engine</span></code>指定<code class="docutils literal"><span class="pre">dtype</span></code>而不是“c”会引发<code class="docutils literal"><span class="pre">ValueError</span></code>。</span></p>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-472">注意</span></p>
<p><span class="yiyi-st" id="yiyi-473">在某些情况下,使用包含混合dtypes的列读取异常数据将导致不一致的数据集。</span><span class="yiyi-st" id="yiyi-474">如果你依靠pandas推断你的列的dtypes,解析引擎将去推断不同数据块的dtypes,而不是一次性的整个数据集。</span><span class="yiyi-st" id="yiyi-475">因此,你可以结束与混合dtypes的列。</span><span class="yiyi-st" id="yiyi-476">例如,</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [30]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s1">'col_1'</span><span class="p">:</span><span class="nb">range</span><span class="p">(</span><span class="mi">500000</span><span class="p">)</span> <span class="o">+</span> <span class="p">[</span><span class="s1">'a'</span><span class="p">,</span> <span class="s1">'b'</span><span class="p">]</span> <span class="o">+</span> <span class="nb">range</span><span class="p">(</span><span class="mi">500000</span><span class="p">)})</span>
<span class="gp">In [31]: </span><span class="n">df</span><span class="o">.</span><span class="n">to_csv</span><span class="p">(</span><span class="s1">'foo'</span><span class="p">)</span>
<span class="gp">In [32]: </span><span class="n">mixed_df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'foo'</span><span class="p">)</span>
<span class="gp">In [33]: </span><span class="n">mixed_df</span><span class="p">[</span><span class="s1">'col_1'</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="nb">type</span><span class="p">)</span><span class="o">.</span><span class="n">value_counts</span><span class="p">()</span>
<span class="gr">Out[33]: </span>
<span class="go"><type 'int'> 737858</span>
<span class="go"><type 'str'> 262144</span>
<span class="go">Name: col_1, dtype: int64</span>
<span class="gp">In [34]: </span><span class="n">mixed_df</span><span class="p">[</span><span class="s1">'col_1'</span><span class="p">]</span><span class="o">.</span><span class="n">dtype</span>
<span class="gr">Out[34]: </span><span class="n">dtype</span><span class="p">(</span><span class="s1">'O'</span><span class="p">)</span>
</pre></div>
</div>
<p class="last"><span class="yiyi-st" id="yiyi-477">将导致<cite>mixed_df</cite>对于某些块的块包含<code class="docutils literal"><span class="pre">int</span></code> dtype,对于其他块包含<code class="docutils literal"><span class="pre">str</span></code>,由于来自数据的混合dty读入。</span><span class="yiyi-st" id="yiyi-478">重要的是要注意,整个列将标记<code class="docutils literal"><span class="pre">object</span></code>的<code class="docutils literal"><span class="pre">dtype</span></code>,用于具有混合dtypes的列。</span></p>
</div>
</div>
<div class="section" id="specifying-categorical-dtype">
<span id="io-categorical"></span><h3><span class="yiyi-st" id="yiyi-479">Specifying Categorical dtype</span></h3>
<div class="versionadded">
<p><span class="yiyi-st" id="yiyi-480"><span class="versionmodified">版本0.19.0中的新功能。</span></span></p>
</div>
<p><span class="yiyi-st" id="yiyi-481"><code class="docutils literal"><span class="pre">Categorical</span></code>列可以直接通过指定<code class="docutils literal"><span class="pre">dtype='category'</span></code></span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [35]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'col1,col2,col3</span><span class="se">\n</span><span class="s1">a,b,1</span><span class="se">\n</span><span class="s1">a,b,2</span><span class="se">\n</span><span class="s1">c,d,3'</span>
<span class="gp">In [36]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
<span class="gr">Out[36]: </span>
<span class="go"> col1 col2 col3</span>
<span class="go">0 a b 1</span>
<span class="go">1 a b 2</span>
<span class="go">2 c d 3</span>
<span class="gp">In [37]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">))</span><span class="o">.</span><span class="n">dtypes</span>
<span class="gr">Out[37]: </span>
<span class="go">col1 object</span>
<span class="go">col2 object</span>
<span class="go">col3 int64</span>
<span class="go">dtype: object</span>
<span class="gp">In [38]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">'category'</span><span class="p">)</span><span class="o">.</span><span class="n">dtypes</span>
<span class="gr">Out[38]: </span>
<span class="go">col1 category</span>
<span class="go">col2 category</span>
<span class="go">col3 category</span>
<span class="go">dtype: object</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-482">可以使用dict规范将各个列解析为<code class="docutils literal"><span class="pre">Categorical</span></code></span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [39]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="p">{</span><span class="s1">'col1'</span><span class="p">:</span> <span class="s1">'category'</span><span class="p">})</span><span class="o">.</span><span class="n">dtypes</span>
<span class="gr">Out[39]: </span>
<span class="go">col1 category</span>
<span class="go">col2 object</span>
<span class="go">col3 int64</span>
<span class="go">dtype: object</span>
</pre></div>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-483">注意</span></p>
<p><span class="yiyi-st" id="yiyi-484">结果类别将始终被解析为字符串(对象dtype)。</span><span class="yiyi-st" id="yiyi-485">如果类别是数字,则可以使用<a class="reference internal" href="generated/pandas.to_numeric.html#pandas.to_numeric" title="pandas.to_numeric"><code class="xref py py-func docutils literal"><span class="pre">to_numeric()</span></code></a>函数或适当时使用另一个转换器(例如<a class="reference internal" href="generated/pandas.to_datetime.html#pandas.to_datetime" title="pandas.to_datetime"><code class="xref py py-func docutils literal"><span class="pre">to_datetime()</span></code></a>)转换类别。</span></p>
<div class="last highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [40]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">'category'</span><span class="p">)</span>
<span class="gp">In [41]: </span><span class="n">df</span><span class="o">.</span><span class="n">dtypes</span>
<span class="gr">Out[41]: </span>
<span class="go">col1 category</span>
<span class="go">col2 category</span>
<span class="go">col3 category</span>
<span class="go">dtype: object</span>
<span class="gp">In [42]: </span><span class="n">df</span><span class="p">[</span><span class="s1">'col3'</span><span class="p">]</span>
<span class="gr">Out[42]: </span>
<span class="go">0 1</span>
<span class="go">1 2</span>
<span class="go">2 3</span>
<span class="go">Name: col3, dtype: category</span>
<span class="go">Categories (3, object): [1, 2, 3]</span>
<span class="gp">In [43]: </span><span class="n">df</span><span class="p">[</span><span class="s1">'col3'</span><span class="p">]</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">categories</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">to_numeric</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="s1">'col3'</span><span class="p">]</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">categories</span><span class="p">)</span>
<span class="gp">In [44]: </span><span class="n">df</span><span class="p">[</span><span class="s1">'col3'</span><span class="p">]</span>
<span class="gr">Out[44]: </span>
<span class="go">0 1</span>
<span class="go">1 2</span>
<span class="go">2 3</span>
<span class="go">Name: col3, dtype: category</span>
<span class="go">Categories (3, int64): [1, 2, 3]</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="naming-and-using-columns">
<h3><span class="yiyi-st" id="yiyi-486">Naming and Using Columns</span></h3>
<div class="section" id="handling-column-names">
<span id="io-headers"></span><h4><span class="yiyi-st" id="yiyi-487">Handling column names</span></h4>
<p><span class="yiyi-st" id="yiyi-488">文件可以有也可以没有标题行。</span><span class="yiyi-st" id="yiyi-489">pandas假设第一行应该用作列名:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [45]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'a,b,c</span><span class="se">\n</span><span class="s1">1,2,3</span><span class="se">\n</span><span class="s1">4,5,6</span><span class="se">\n</span><span class="s1">7,8,9'</span>
<span class="gp">In [46]: </span><span class="k">print</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="go">a,b,c</span>
<span class="go">1,2,3</span>
<span class="go">4,5,6</span>
<span class="go">7,8,9</span>
<span class="gp">In [47]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
<span class="gr">Out[47]: </span>
<span class="go"> a b c</span>
<span class="go">0 1 2 3</span>
<span class="go">1 4 5 6</span>
<span class="go">2 7 8 9</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-490">通过与<code class="docutils literal"><span class="pre">header</span></code>结合指定<code class="docutils literal"><span class="pre">names</span></code>参数,您可以指示要使用的其他名称以及是否丢弃标题行(如果有):</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [48]: </span><span class="k">print</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="go">a,b,c</span>
<span class="go">1,2,3</span>
<span class="go">4,5,6</span>
<span class="go">7,8,9</span>
<span class="gp">In [49]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">names</span><span class="o">=</span><span class="p">[</span><span class="s1">'foo'</span><span class="p">,</span> <span class="s1">'bar'</span><span class="p">,</span> <span class="s1">'baz'</span><span class="p">],</span> <span class="n">header</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="gr">Out[49]: </span>
<span class="go"> foo bar baz</span>
<span class="go">0 1 2 3</span>
<span class="go">1 4 5 6</span>
<span class="go">2 7 8 9</span>
<span class="gp">In [50]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">names</span><span class="o">=</span><span class="p">[</span><span class="s1">'foo'</span><span class="p">,</span> <span class="s1">'bar'</span><span class="p">,</span> <span class="s1">'baz'</span><span class="p">],</span> <span class="n">header</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
<span class="gr">Out[50]: </span>
<span class="go"> foo bar baz</span>
<span class="go">0 a b c</span>
<span class="go">1 1 2 3</span>
<span class="go">2 4 5 6</span>
<span class="go">3 7 8 9</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-491">如果标题位于第一行以外的行,请将行号传递到<code class="docutils literal"><span class="pre">header</span></code>。</span><span class="yiyi-st" id="yiyi-492">这将跳过前面的行:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [51]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'skip this skip it</span><span class="se">\n</span><span class="s1">a,b,c</span><span class="se">\n</span><span class="s1">1,2,3</span><span class="se">\n</span><span class="s1">4,5,6</span><span class="se">\n</span><span class="s1">7,8,9'</span>
<span class="gp">In [52]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">header</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="gr">Out[52]: </span>
<span class="go"> a b c</span>
<span class="go">0 1 2 3</span>
<span class="go">1 4 5 6</span>
<span class="go">2 7 8 9</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="duplicate-names-parsing">
<span id="io-dupe-names"></span><h3><span class="yiyi-st" id="yiyi-493">Duplicate names parsing</span></h3>
<p><span class="yiyi-st" id="yiyi-494">如果文件或标题包含重复的名称,则pandas默认会对这些名称进行重复数据删除,以防止数据覆盖:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [53]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'a,b,a</span><span class="se">\n</span><span class="s1">0,1,2</span><span class="se">\n</span><span class="s1">3,4,5'</span>
<span class="gp">In [54]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
<span class="gr">Out[54]: </span>
<span class="go"> a b a.1</span>
<span class="go">0 0 1 2</span>
<span class="go">1 3 4 5</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-495">没有更多的重复数据,因为默认情况下,<code class="docutils literal"><span class="pre">mangle_dupe_cols=True</span></code>会修改一系列重复的列'X'...'X'变成'X.0'...'X.N '。</span><span class="yiyi-st" id="yiyi-496">如果<code class="docutils literal"><span class="pre">mangle_dupe_cols</span> <span class="pre">= False</span></code>,可能会出现重复的数据:</span></p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">In</span> <span class="p">[</span><span class="mi">2</span><span class="p">]:</span> <span class="n">data</span> <span class="o">=</span> <span class="s1">'a,b,a</span><span class="se">\n</span><span class="s1">0,1,2</span><span class="se">\n</span><span class="s1">3,4,5'</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">3</span><span class="p">]:</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">mangle_dupe_cols</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">Out</span><span class="p">[</span><span class="mi">3</span><span class="p">]:</span>
<span class="n">a</span> <span class="n">b</span> <span class="n">a</span>
<span class="mi">0</span> <span class="mi">2</span> <span class="mi">1</span> <span class="mi">2</span>
<span class="mi">1</span> <span class="mi">5</span> <span class="mi">4</span> <span class="mi">5</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-497">为了防止用户在重复数据中遇到此问题,如果<code class="docutils literal"><span class="pre">mangle_dupe_cols</span> <span class="pre">!=</span> <span class="pre">True,则会引发<code class="docutils literal"><span class="pre">ValueError</span></code> t5></span></code>:</span></p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">In</span> <span class="p">[</span><span class="mi">2</span><span class="p">]:</span> <span class="n">data</span> <span class="o">=</span> <span class="s1">'a,b,a</span><span class="se">\n</span><span class="s1">0,1,2</span><span class="se">\n</span><span class="s1">3,4,5'</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">3</span><span class="p">]:</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">mangle_dupe_cols</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="o">...</span>
<span class="ne">ValueError</span><span class="p">:</span> <span class="n">Setting</span> <span class="n">mangle_dupe_cols</span><span class="o">=</span><span class="bp">False</span> <span class="ow">is</span> <span class="ow">not</span> <span class="n">supported</span> <span class="n">yet</span>
</pre></div>
</div>
<div class="section" id="filtering-columns-usecols">
<span id="io-usecols"></span><h4><span class="yiyi-st" id="yiyi-498">Filtering columns (<code class="docutils literal"><span class="pre">usecols</span></code>)</span></h4>
<p><span class="yiyi-st" id="yiyi-499"><code class="docutils literal"><span class="pre">usecols</span></code>参数允许您使用列名称或位置数字选择文件中的任何列子集:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [55]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'a,b,c,d</span><span class="se">\n</span><span class="s1">1,2,3,foo</span><span class="se">\n</span><span class="s1">4,5,6,bar</span><span class="se">\n</span><span class="s1">7,8,9,baz'</span>
<span class="gp">In [56]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
<span class="gr">Out[56]: </span>
<span class="go"> a b c d</span>
<span class="go">0 1 2 3 foo</span>
<span class="go">1 4 5 6 bar</span>
<span class="go">2 7 8 9 baz</span>
<span class="gp">In [57]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">usecols</span><span class="o">=</span><span class="p">[</span><span class="s1">'b'</span><span class="p">,</span> <span class="s1">'d'</span><span class="p">])</span>
<span class="gr">Out[57]: </span>
<span class="go"> b d</span>
<span class="go">0 2 foo</span>
<span class="go">1 5 bar</span>
<span class="go">2 8 baz</span>
<span class="gp">In [58]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">usecols</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="gr">Out[58]: </span>
<span class="go"> a c d</span>
<span class="go">0 1 3 foo</span>
<span class="go">1 4 6 bar</span>
<span class="go">2 7 9 baz</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="comments-and-empty-lines">
<h3><span class="yiyi-st" id="yiyi-500">Comments and Empty Lines</span></h3>
<div class="section" id="ignoring-line-comments-and-empty-lines">
<span id="io-skiplines"></span><h4><span class="yiyi-st" id="yiyi-501">Ignoring line comments and empty lines</span></h4>
<p><span class="yiyi-st" id="yiyi-502">如果指定<code class="docutils literal"><span class="pre">comment</span></code>参数,则完全注释的行将被忽略。</span><span class="yiyi-st" id="yiyi-503">默认情况下,完全空白行也将被忽略。</span><span class="yiyi-st" id="yiyi-504">这两个都是版本0.15中引入的API更改。</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [59]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">a,b,c</span><span class="se">\n</span><span class="s1"> </span><span class="se">\n</span><span class="s1"># commented line</span><span class="se">\n</span><span class="s1">1,2,3</span><span class="se">\n\n</span><span class="s1">4,5,6'</span>
<span class="gp">In [60]: </span><span class="k">print</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="go">a,b,c</span>
<span class="go"> </span>
<span class="go">1,2,3</span>
<span class="go">4,5,6</span>
<span class="c"># commented line</span>
<span class="gp">In [61]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">comment</span><span class="o">=</span><span class="s1">'#'</span><span class="p">)</span>
<span class="gr">Out[61]: </span>
<span class="go"> a b c</span>
<span class="go">0 1 2 3</span>
<span class="go">1 4 5 6</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-505">如果<code class="docutils literal"><span class="pre">skip_blank_lines=False</span></code>,则<code class="docutils literal"><span class="pre">read_csv</span></code>将不会忽略空行:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [62]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'a,b,c</span><span class="se">\n\n</span><span class="s1">1,2,3</span><span class="se">\n\n\n</span><span class="s1">4,5,6'</span>
<span class="gp">In [63]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">skip_blank_lines</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="gr">Out[63]: </span>
<span class="go"> a b c</span>
<span class="go">0 NaN NaN NaN</span>
<span class="go">1 1.0 2.0 3.0</span>
<span class="go">2 NaN NaN NaN</span>
<span class="go">3 NaN NaN NaN</span>
<span class="go">4 4.0 5.0 6.0</span>
</pre></div>
</div>
<div class="admonition warning">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-506">警告</span></p>
<p><span class="yiyi-st" id="yiyi-507">忽略行的存在可能产生涉及行号的模糊性;参数<code class="docutils literal"><span class="pre">header</span></code>使用行号(忽略注释/空行),而<code class="docutils literal"><span class="pre">skiprows</span></code>使用行号(包括注释/空行):</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [64]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'#comment</span><span class="se">\n</span><span class="s1">a,b,c</span><span class="se">\n</span><span class="s1">A,B,C</span><span class="se">\n</span><span class="s1">1,2,3'</span>
<span class="gp">In [65]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">comment</span><span class="o">=</span><span class="s1">'#'</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="gr">Out[65]: </span>
<span class="go"> A B C</span>
<span class="go">0 1 2 3</span>
<span class="gp">In [66]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'A,B,C</span><span class="se">\n</span><span class="s1">#comment</span><span class="se">\n</span><span class="s1">a,b,c</span><span class="se">\n</span><span class="s1">1,2,3'</span>
<span class="gp">In [67]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">comment</span><span class="o">=</span><span class="s1">'#'</span><span class="p">,</span> <span class="n">skiprows</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="gr">Out[67]: </span>
<span class="go"> a b c</span>
<span class="go">0 1 2 3</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-508">如果指定<code class="docutils literal"><span class="pre">header</span></code>和<code class="docutils literal"><span class="pre">skiprows</span></code>,则<code class="docutils literal"><span class="pre">header</span></code>将相对于<code class="docutils literal"><span class="pre">skiprows</span></code>的结尾。</span><span class="yiyi-st" id="yiyi-509">例如:</span></p>
<div class="last highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [68]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'# empty</span><span class="se">\n</span><span class="s1"># second empty line</span><span class="se">\n</span><span class="s1"># third empty'</span> \
<span class="gp">In [68]: </span><span class="s1">'line</span><span class="se">\n</span><span class="s1">X,Y,Z</span><span class="se">\n</span><span class="s1">1,2,3</span><span class="se">\n</span><span class="s1">A,B,C</span><span class="se">\n</span><span class="s1">1,2.,4.</span><span class="se">\n</span><span class="s1">5.,NaN,10.0'</span>
<span class="gp">In [69]: </span><span class="k">print</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="c"># empty</span>
<span class="c"># second empty line</span>
<span class="c"># third emptyline</span>
<span class="go">X,Y,Z</span>
<span class="go">1,2,3</span>
<span class="go">A,B,C</span>
<span class="go">1,2.,4.</span>
<span class="go">5.,NaN,10.0</span>
<span class="gp">In [70]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">comment</span><span class="o">=</span><span class="s1">'#'</span><span class="p">,</span> <span class="n">skiprows</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="gr">Out[70]: </span>
<span class="go"> A B C</span>
<span class="go">0 1.0 2.0 4.0</span>
<span class="go">1 5.0 NaN 10.0</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="comments">
<span id="io-comments"></span><h4><span class="yiyi-st" id="yiyi-510">Comments</span></h4>
<p><span class="yiyi-st" id="yiyi-511">有时评论或元数据可能包含在文件中:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [71]: </span><span class="k">print</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="s1">'tmp.csv'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="go">ID,level,category</span>
<span class="go">Patient1,123000,x # really unpleasant</span>
<span class="go">Patient2,23000,y # wouldn't take his medicine</span>
<span class="go">Patient3,1234018,z # awesome</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-512">默认情况下,解析器在输出中包括注释:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [72]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'tmp.csv'</span><span class="p">)</span>
<span class="gp">In [73]: </span><span class="n">df</span>
<span class="gr">Out[73]: </span>
<span class="go"> ID level category</span>
<span class="go">0 Patient1 123000 x # really unpleasant</span>
<span class="go">1 Patient2 23000 y # wouldn't take his medicine</span>
<span class="go">2 Patient3 1234018 z # awesome</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-513">我们可以使用<code class="docutils literal"><span class="pre">comment</span></code>关键字取消注释:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [74]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'tmp.csv'</span><span class="p">,</span> <span class="n">comment</span><span class="o">=</span><span class="s1">'#'</span><span class="p">)</span>
<span class="gp">In [75]: </span><span class="n">df</span>
<span class="gr">Out[75]: </span>
<span class="go"> ID level category</span>
<span class="go">0 Patient1 123000 x </span>
<span class="go">1 Patient2 23000 y </span>
<span class="go">2 Patient3 1234018 z </span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="dealing-with-unicode-data">
<span id="io-unicode"></span><h3><span class="yiyi-st" id="yiyi-514">Dealing with Unicode Data</span></h3>
<p><span class="yiyi-st" id="yiyi-515">对于编码的unicode数据,应使用<code class="docutils literal"><span class="pre">encoding</span></code>参数,这将导致在结果中将字节字符串解码为unicode:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [76]: </span><span class="n">data</span> <span class="o">=</span> <span class="n">b</span><span class="s1">'word,length</span><span class="se">\n</span><span class="s1">Tr</span><span class="se">\xc3\xa4</span><span class="s1">umen,7</span><span class="se">\n</span><span class="s1">Gr</span><span class="se">\xc3\xbc\xc3\x9f</span><span class="s1">e,5'</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">'utf8'</span><span class="p">)</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">'latin-1'</span><span class="p">)</span>
<span class="gp">In [77]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">BytesIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">encoding</span><span class="o">=</span><span class="s1">'latin-1'</span><span class="p">)</span>
<span class="gp">In [78]: </span><span class="n">df</span>
<span class="gr">Out[78]: </span>
<span class="go"> word length</span>
<span class="go">0 Träumen 7</span>
<span class="go">1 Grüße 5</span>
<span class="gp">In [79]: </span><span class="n">df</span><span class="p">[</span><span class="s1">'word'</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span>
<span class="gr">Out[79]: </span><span class="s1">u'Gr</span><span class="se">\xfc\xdf</span><span class="s1">e'</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-516">一些将所有字符编码为多个字节(如UTF-16)的格式将不会正确解析,而不指定编码。</span><span class="yiyi-st" id="yiyi-517"><a class="reference external" href="https://docs.python.org/3/library/codecs.html#standard-encodings">Python标准编码的完整列表</a></span></p>
</div>
<div class="section" id="index-columns-and-trailing-delimiters">
<span id="io-index-col"></span><h3><span class="yiyi-st" id="yiyi-518">Index columns and trailing delimiters</span></h3>
<p><span class="yiyi-st" id="yiyi-519">如果一个文件还有一列数据而不是列名数,则第一列将被用作DataFrame的行名称:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [80]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'a,b,c</span><span class="se">\n</span><span class="s1">4,apple,bat,5.7</span><span class="se">\n</span><span class="s1">8,orange,cow,10'</span>
<span class="gp">In [81]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
<span class="gr">Out[81]: </span>
<span class="go"> a b c</span>
<span class="go">4 apple bat 5.7</span>
<span class="go">8 orange cow 10.0</span>
</pre></div>
</div>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [82]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'index,a,b,c</span><span class="se">\n</span><span class="s1">4,apple,bat,5.7</span><span class="se">\n</span><span class="s1">8,orange,cow,10'</span>
<span class="gp">In [83]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">index_col</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="gr">Out[83]: </span>
<span class="go"> a b c</span>
<span class="go">index </span>
<span class="go">4 apple bat 5.7</span>
<span class="go">8 orange cow 10.0</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-520">通常,您可以使用<code class="docutils literal"><span class="pre">index_col</span></code>选项来实现此行为。</span></p>
<p><span class="yiyi-st" id="yiyi-521">当在每个数据行的末尾使用定界符准备文件时,会出现一些异常情况,从而导致解析器混乱。</span><span class="yiyi-st" id="yiyi-522">要显式禁用索引列推断并放弃最后一列,请传递<code class="docutils literal"><span class="pre">index_col=False</span></code>:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [84]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'a,b,c</span><span class="se">\n</span><span class="s1">4,apple,bat,</span><span class="se">\n</span><span class="s1">8,orange,cow,'</span>
<span class="gp">In [85]: </span><span class="k">print</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="go">a,b,c</span>
<span class="go">4,apple,bat,</span>
<span class="go">8,orange,cow,</span>
<span class="gp">In [86]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
<span class="gr">Out[86]: </span>
<span class="go"> a b c</span>
<span class="go">4 apple bat NaN</span>
<span class="go">8 orange cow NaN</span>
<span class="gp">In [87]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">index_col</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="gr">Out[87]: </span>
<span class="go"> a b c</span>
<span class="go">0 4 apple bat</span>
<span class="go">1 8 orange cow</span>
</pre></div>
</div>
</div>
<div class="section" id="date-handling">
<span id="io-parse-dates"></span><h3><span class="yiyi-st" id="yiyi-523">Date Handling</span></h3>
<div class="section" id="specifying-date-columns">
<h4><span class="yiyi-st" id="yiyi-524">Specifying Date Columns</span></h4>
<p><span class="yiyi-st" id="yiyi-525">为了更好地使用datetime数据,<a class="reference internal" href="generated/pandas.read_csv.html#pandas.read_csv" title="pandas.read_csv"><code class="xref py py-func docutils literal"><span class="pre">read_csv()</span></code></a>和<a class="reference internal" href="generated/pandas.read_table.html#pandas.read_table" title="pandas.read_table"><code class="xref py py-func docutils literal"><span class="pre">read_table()</span></code></a>使用关键字参数<code class="docutils literal"><span class="pre">parse_dates</span></code>和<code class="docutils literal"><span class="pre">date_parser</span></code>以允许用户指定各种列和日期/时间格式将输入文本数据转换为<code class="docutils literal"><span class="pre">datetime</span></code>对象。</span></p>
<p><span class="yiyi-st" id="yiyi-526">最简单的情况是传入<code class="docutils literal"><span class="pre">parse_dates=True</span></code>:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="c"># Use a column as an index, and parse it as dates.</span>
<span class="gp">In [88]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'foo.csv'</span><span class="p">,</span> <span class="n">index_col</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">parse_dates</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="gp">In [89]: </span><span class="n">df</span>
<span class="gr">Out[89]: </span>
<span class="go"> A B C</span>
<span class="go">date </span>
<span class="go">2009-01-01 a 1 2</span>
<span class="go">2009-01-02 b 3 4</span>
<span class="go">2009-01-03 c 4 5</span>
<span class="c"># These are python datetime objects</span>
<span class="gp">In [90]: </span><span class="n">df</span><span class="o">.</span><span class="n">index</span>
<span class="gr">Out[90]: </span><span class="n">DatetimeIndex</span><span class="p">([</span><span class="s1">'2009-01-01'</span><span class="p">,</span> <span class="s1">'2009-01-02'</span><span class="p">,</span> <span class="s1">'2009-01-03'</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">'datetime64[ns]'</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">u'date'</span><span class="p">,</span> <span class="n">freq</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-527">通常情况下,我们可能要分开存储日期和时间数据,或单独存储各种日期字段。</span><span class="yiyi-st" id="yiyi-528"><code class="docutils literal"><span class="pre">parse_dates</span></code>关键字可用于指定解析日期和/或时间的列的组合。</span></p>
<p><span class="yiyi-st" id="yiyi-529">您可以将列列表指定为<code class="docutils literal"><span class="pre">parse_dates</span></code>,生成的日期列将预置到输出(以不影响现有列顺序),新的列名称将是组件列名称:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [91]: </span><span class="k">print</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="s1">'tmp.csv'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="go">KORD,19990127, 19:00:00, 18:56:00, 0.8100</span>
<span class="go">KORD,19990127, 20:00:00, 19:56:00, 0.0100</span>
<span class="go">KORD,19990127, 21:00:00, 20:56:00, -0.5900</span>
<span class="go">KORD,19990127, 21:00:00, 21:18:00, -0.9900</span>
<span class="go">KORD,19990127, 22:00:00, 21:56:00, -0.5900</span>
<span class="go">KORD,19990127, 23:00:00, 22:56:00, -0.5900</span>
<span class="gp">In [92]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'tmp.csv'</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">parse_dates</span><span class="o">=</span><span class="p">[[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">]])</span>
<span class="gp">In [93]: </span><span class="n">df</span>
<span class="gr">Out[93]: </span>
<span class="go"> 1_2 1_3 0 4</span>
<span class="go">0 1999-01-27 19:00:00 1999-01-27 18:56:00 KORD 0.81</span>
<span class="go">1 1999-01-27 20:00:00 1999-01-27 19:56:00 KORD 0.01</span>
<span class="go">2 1999-01-27 21:00:00 1999-01-27 20:56:00 KORD -0.59</span>
<span class="go">3 1999-01-27 21:00:00 1999-01-27 21:18:00 KORD -0.99</span>
<span class="go">4 1999-01-27 22:00:00 1999-01-27 21:56:00 KORD -0.59</span>
<span class="go">5 1999-01-27 23:00:00 1999-01-27 22:56:00 KORD -0.59</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-530">默认情况下,解析器会删除组件日期列,但您可以选择通过<code class="docutils literal"><span class="pre">keep_date_col</span></code>关键字保留它们:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [94]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'tmp.csv'</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">parse_dates</span><span class="o">=</span><span class="p">[[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">]],</span>
<span class="gp"> ....:</span> <span class="n">keep_date_col</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="gp"> ....:</span>
<span class="gp">In [95]: </span><span class="n">df</span>
<span class="gr">Out[95]: </span>
<span class="go"> 1_2 1_3 0 1 2 \</span>
<span class="go">0 1999-01-27 19:00:00 1999-01-27 18:56:00 KORD 19990127 19:00:00 </span>
<span class="go">1 1999-01-27 20:00:00 1999-01-27 19:56:00 KORD 19990127 20:00:00 </span>
<span class="go">2 1999-01-27 21:00:00 1999-01-27 20:56:00 KORD 19990127 21:00:00 </span>
<span class="go">3 1999-01-27 21:00:00 1999-01-27 21:18:00 KORD 19990127 21:00:00 </span>
<span class="go">4 1999-01-27 22:00:00 1999-01-27 21:56:00 KORD 19990127 22:00:00 </span>
<span class="go">5 1999-01-27 23:00:00 1999-01-27 22:56:00 KORD 19990127 23:00:00 </span>
<span class="go"> 3 4 </span>
<span class="go">0 18:56:00 0.81 </span>
<span class="go">1 19:56:00 0.01 </span>
<span class="go">2 20:56:00 -0.59 </span>
<span class="go">3 21:18:00 -0.99 </span>
<span class="go">4 21:56:00 -0.59 </span>
<span class="go">5 22:56:00 -0.59 </span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-531">请注意,如果您希望将多个列合并到单个日期列中,则必须使用嵌套列表。</span><span class="yiyi-st" id="yiyi-532">In other words, <code class="docutils literal"><span class="pre">parse_dates=[1,</span> <span class="pre">2]</span></code> indicates that the second and third columns should each be parsed as separate date columns while <code class="docutils literal"><span class="pre">parse_dates=[[1,</span> <span class="pre">2]]</span></code> means the two columns should be parsed into a single column.</span></p>
<p><span class="yiyi-st" id="yiyi-533">您还可以使用dict指定自定义名称列:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [96]: </span><span class="n">date_spec</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'nominal'</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="s1">'actual'</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">]}</span>
<span class="gp">In [97]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'tmp.csv'</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">parse_dates</span><span class="o">=</span><span class="n">date_spec</span><span class="p">)</span>
<span class="gp">In [98]: </span><span class="n">df</span>
<span class="gr">Out[98]: </span>
<span class="go"> nominal actual 0 4</span>
<span class="go">0 1999-01-27 19:00:00 1999-01-27 18:56:00 KORD 0.81</span>
<span class="go">1 1999-01-27 20:00:00 1999-01-27 19:56:00 KORD 0.01</span>
<span class="go">2 1999-01-27 21:00:00 1999-01-27 20:56:00 KORD -0.59</span>
<span class="go">3 1999-01-27 21:00:00 1999-01-27 21:18:00 KORD -0.99</span>
<span class="go">4 1999-01-27 22:00:00 1999-01-27 21:56:00 KORD -0.59</span>
<span class="go">5 1999-01-27 23:00:00 1999-01-27 22:56:00 KORD -0.59</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-534">重要的是要记住,如果要将多个文本列解析为单个日期列,则会在数据前面添加一个新列。</span><span class="yiyi-st" id="yiyi-535"><cite>index_col</cite>规范基于此新的列集合,而不是原始数据列:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [99]: </span><span class="n">date_spec</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'nominal'</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="s1">'actual'</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">]}</span>
<span class="gp">In [100]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'tmp.csv'</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">parse_dates</span><span class="o">=</span><span class="n">date_spec</span><span class="p">,</span>
<span class="gp"> .....:</span> <span class="n">index_col</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="c1">#index is the nominal column</span>
<span class="gp"> .....:</span>
<span class="gp">In [101]: </span><span class="n">df</span>
<span class="gr">Out[101]: </span>
<span class="go"> actual 0 4</span>
<span class="go">nominal </span>
<span class="go">1999-01-27 19:00:00 1999-01-27 18:56:00 KORD 0.81</span>
<span class="go">1999-01-27 20:00:00 1999-01-27 19:56:00 KORD 0.01</span>
<span class="go">1999-01-27 21:00:00 1999-01-27 20:56:00 KORD -0.59</span>
<span class="go">1999-01-27 21:00:00 1999-01-27 21:18:00 KORD -0.99</span>
<span class="go">1999-01-27 22:00:00 1999-01-27 21:56:00 KORD -0.59</span>
<span class="go">1999-01-27 23:00:00 1999-01-27 22:56:00 KORD -0.59</span>
</pre></div>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-536">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-537">read_csv有一个fast_path用于解析iso8601格式的日期时间字符串,例如“2000-01-01T00:01:02 + 00:00”和类似的变体。</span><span class="yiyi-st" id="yiyi-538">如果您可以安排您的数据以此格式存储数据时间,加载时间将显着更快,观察到约20倍。</span></p>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-539">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-540">当传递一个dict作为<cite>parse_dates</cite>参数时,不保证前置列的顺序,因为<cite>dict</cite>对象不对它们的键施加排序。</span><span class="yiyi-st" id="yiyi-541">在Python 2.7+上,如果这对你很重要,你可以使用<cite>collections.OrderedDict</cite>而不是普通的<cite>dict</cite>。</span><span class="yiyi-st" id="yiyi-542">因此,当对于'parse_dates'与<cite>index_col</cite>参数结合使用dict时,最好将<cite>index_col</cite>指定为列标签,而不是作为结果的索引帧。</span></p>
</div>
</div>
<div class="section" id="date-parsing-functions">
<h4><span class="yiyi-st" id="yiyi-543">Date Parsing Functions</span></h4>
<p><span class="yiyi-st" id="yiyi-544">最后,解析器允许您指定自定义<code class="docutils literal"><span class="pre">date_parser</span></code>函数,以充分利用日期解析API的灵活性:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [102]: </span><span class="kn">import</span> <span class="nn">pandas.io.date_converters</span> <span class="kn">as</span> <span class="nn">conv</span>
<span class="gp">In [103]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'tmp.csv'</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">parse_dates</span><span class="o">=</span><span class="n">date_spec</span><span class="p">,</span>
<span class="gp"> .....:</span> <span class="n">date_parser</span><span class="o">=</span><span class="n">conv</span><span class="o">.</span><span class="n">parse_date_time</span><span class="p">)</span>
<span class="gp"> .....:</span>
<span class="gp">In [104]: </span><span class="n">df</span>
<span class="gr">Out[104]: </span>
<span class="go"> nominal actual 0 4</span>
<span class="go">0 1999-01-27 19:00:00 1999-01-27 18:56:00 KORD 0.81</span>
<span class="go">1 1999-01-27 20:00:00 1999-01-27 19:56:00 KORD 0.01</span>
<span class="go">2 1999-01-27 21:00:00 1999-01-27 20:56:00 KORD -0.59</span>
<span class="go">3 1999-01-27 21:00:00 1999-01-27 21:18:00 KORD -0.99</span>
<span class="go">4 1999-01-27 22:00:00 1999-01-27 21:56:00 KORD -0.59</span>
<span class="go">5 1999-01-27 23:00:00 1999-01-27 22:56:00 KORD -0.59</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-545">Pandas将尝试以三种不同的方式调用<code class="docutils literal"><span class="pre">date_parser</span></code>函数。</span><span class="yiyi-st" id="yiyi-546">如果引发异常,则尝试下一个异常:</span></p>
<ol class="arabic simple">
<li><span class="yiyi-st" id="yiyi-547">首先使用<cite>parse_dates</cite>(例如,<code class="docutils literal"><span class="pre">date_parser(['2013',</span>])定义一个或多个数组作为参数调用<code class="docutils literal"><span class="pre">date_parser</span></code> <span class="pre">'2013'],</span> <span class="pre">['1',</span> <span class="pre">'2'])</span></code></span></li>
<li><span class="yiyi-st" id="yiyi-548">如果#1失败,则调用<code class="docutils literal"><span class="pre">date_parser</span></code>,所有列按行连接到单个数组中(例如,<code class="docutils literal"><span class="pre">date_parser(['2013</span> <span class="pre">1' ,</span> <span class="pre">'2013</span> <span class="pre">2'])</span></code>)</span></li>
<li><span class="yiyi-st" id="yiyi-549">如果#2失败,则对于具有来自<cite>parse_dates</cite>指示的列中的一个或多个字符串参数的每一行调用<code class="docutils literal"><span class="pre">date_parser</span></code>一次(例如,<code class="docutils literal"><span class="pre">date_parser 2013“,” <span class="pre">'2')的第一行(<code class="docutils literal"><span class="pre">2013',</span></code></span></span> <span class="pre">'1')</span> 用于第二个,等等)</code></span></li>
</ol>
<p><span class="yiyi-st" id="yiyi-550">注意,在性能方面,你应该尝试这些方法按顺序解析日期:</span></p>
<ol class="arabic simple">
<li><span class="yiyi-st" id="yiyi-551">尝试使用<code class="docutils literal"><span class="pre">infer_datetime_format=True</span></code>(请参见下面部分)推断格式</span></li>
<li><span class="yiyi-st" id="yiyi-552">如果您知道格式,请使用<code class="docutils literal"><span class="pre">pd.to_datetime()</span></code>:<code class="docutils literal"><span class="pre">date_parser = lambda</span> <span class="pre">x:</span> <span class="pre">pd.to_datetime ,</span> <span class="pre">format = ...)</span></code></span></li>
<li><span class="yiyi-st" id="yiyi-553">如果您有非标准格式,请使用自定义<code class="docutils literal"><span class="pre">date_parser</span></code>函数。</span><span class="yiyi-st" id="yiyi-554">为了获得最佳性能,这应该是向量化的,即它应该接受数组作为参数。</span></li>
</ol>
<p><span class="yiyi-st" id="yiyi-555">您可以在<code class="docutils literal"><span class="pre">date_converters.py</span></code>中探索日期解析功能,并添加自己的日期解析功能。</span><span class="yiyi-st" id="yiyi-556">我们希望将这个模块变成一个社区支持的日期/时间解析器集合。</span><span class="yiyi-st" id="yiyi-557">为了让您入门,<code class="docutils literal"><span class="pre">date_converters.py</span></code>包含用于解析双日期和时间列,年/月/日列和年/月/日/小时/分/秒列的功能。</span><span class="yiyi-st" id="yiyi-558">它还包含一个<code class="docutils literal"><span class="pre">generic_parser</span></code>函数,因此您可以使用处理单个日期而不是整个数组的函数来对其进行curry。</span></p>
</div>
<div class="section" id="inferring-datetime-format">
<span id="io-dayfirst"></span><h4><span class="yiyi-st" id="yiyi-559">Inferring Datetime Format</span></h4>
<p><span class="yiyi-st" id="yiyi-560">如果您为某些或所有列启用了<code class="docutils literal"><span class="pre">parse_dates</span></code>,并且datetime字符串都采用相同的格式,则可以通过设置<code class="docutils literal"><span class="pre">infer_datetime_format=True</span></code> 。</span><span class="yiyi-st" id="yiyi-561">如果设置,pandas将尝试猜测datetime字符串的格式,然后使用更快的方式解析字符串。</span><span class="yiyi-st" id="yiyi-562">观察到5-10x解析速度。</span><span class="yiyi-st" id="yiyi-563">pandas将回退到通常的解析,如果格式不能猜到或者猜测的格式不能正确解析整个字符串列。</span><span class="yiyi-st" id="yiyi-564">因此,一般来说,如果启用<code class="docutils literal"><span class="pre">infer_datetime_format</span></code>,则不应产生任何负面结果。</span></p>
<p><span class="yiyi-st" id="yiyi-565">以下是可以猜测的日期时间字符串的一些示例(全部表示2011年12月30日00:00:00)</span></p>
<ul class="simple">
<li><span class="yiyi-st" id="yiyi-566">“20111230”</span></li>
<li><span class="yiyi-st" id="yiyi-567">“2011/12/30”</span></li>
<li><span class="yiyi-st" id="yiyi-568">“20111230 00:00:00”</span></li>
<li><span class="yiyi-st" id="yiyi-569">“12/30/2011 00:00:00”</span></li>
<li><span class="yiyi-st" id="yiyi-570">“30 / Dec / 2011 00:00:00”</span></li>
<li><span class="yiyi-st" id="yiyi-571">“30 / December / 2011 00:00:00”</span></li>
</ul>
<p><span class="yiyi-st" id="yiyi-572"><code class="docutils literal"><span class="pre">infer_datetime_format</span></code>对<code class="docutils literal"><span class="pre">dayfirst</span></code>敏感。</span><span class="yiyi-st" id="yiyi-573">使用<code class="docutils literal"><span class="pre">dayfirst=True</span></code>,它会猜到“01/12/2011”为12月1日。</span><span class="yiyi-st" id="yiyi-574">使用<code class="docutils literal"><span class="pre">dayfirst=False</span></code>(默认),它会猜到“01/12/2011”为1月12日。</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="c"># Try to infer the format for the index column</span>
<span class="gp">In [105]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'foo.csv'</span><span class="p">,</span> <span class="n">index_col</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">parse_dates</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="gp"> .....:</span> <span class="n">infer_datetime_format</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="gp"> .....:</span>
<span class="gp">In [106]: </span><span class="n">df</span>
<span class="gr">Out[106]: </span>
<span class="go"> A B C</span>
<span class="go">date </span>
<span class="go">2009-01-01 a 1 2</span>
<span class="go">2009-01-02 b 3 4</span>
<span class="go">2009-01-03 c 4 5</span>
</pre></div>
</div>
</div>
<div class="section" id="international-date-formats">
<h4><span class="yiyi-st" id="yiyi-575">International Date Formats</span></h4>
<p><span class="yiyi-st" id="yiyi-576">虽然美国日期格式通常为MM / DD / YYYY,但许多国际格式使用DD / MM / YYYY。</span><span class="yiyi-st" id="yiyi-577">为方便起见,提供了<code class="docutils literal"><span class="pre">dayfirst</span></code>关键字:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [107]: </span><span class="k">print</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="s1">'tmp.csv'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="go">date,value,cat</span>
<span class="go">1/6/2000,5,a</span>
<span class="go">2/6/2000,10,b</span>
<span class="go">3/6/2000,15,c</span>
<span class="gp">In [108]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'tmp.csv'</span><span class="p">,</span> <span class="n">parse_dates</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="gr">Out[108]: </span>
<span class="go"> date value cat</span>
<span class="go">0 2000-01-06 5 a</span>
<span class="go">1 2000-02-06 10 b</span>
<span class="go">2 2000-03-06 15 c</span>
<span class="gp">In [109]: </span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'tmp.csv'</span><span class="p">,</span> <span class="n">dayfirst</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">parse_dates</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="gr">Out[109]: </span>
<span class="go"> date value cat</span>
<span class="go">0 2000-06-01 5 a</span>
<span class="go">1 2000-06-02 10 b</span>
<span class="go">2 2000-06-03 15 c</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="specifying-method-for-floating-point-conversion">
<span id="io-float-precision"></span><h3><span class="yiyi-st" id="yiyi-578">Specifying method for floating-point conversion</span></h3>
<p><span class="yiyi-st" id="yiyi-579">可以指定参数<code class="docutils literal"><span class="pre">float_precision</span></code>,以便在使用C引擎进行解析期间使用特定的浮点转换器。</span><span class="yiyi-st" id="yiyi-580">选项是普通转换器,高精度转换器和往返转换器(在写入文件后保证往返值)。</span><span class="yiyi-st" id="yiyi-581">例如:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [110]: </span><span class="n">val</span> <span class="o">=</span> <span class="s1">'0.3066101993807095471566981359501369297504425048828125'</span>
<span class="gp">In [111]: </span><span class="n">data</span> <span class="o">=</span> <span class="s1">'a,b,c</span><span class="se">\n</span><span class="s1">1,2,{0}'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">val</span><span class="p">)</span>
<span class="gp">In [112]: </span><span class="nb">abs</span><span class="p">(</span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">engine</span><span class="o">=</span><span class="s1">'c'</span><span class="p">,</span> <span class="n">float_precision</span><span class="o">=</span><span class="bp">None</span><span class="p">)[</span><span class="s1">'c'</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="nb">float</span><span class="p">(</span><span class="n">val</span><span class="p">))</span>
<span class="gr">Out[112]: </span><span class="mf">1.1102230246251565e-16</span>
<span class="gp">In [113]: </span><span class="nb">abs</span><span class="p">(</span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">engine</span><span class="o">=</span><span class="s1">'c'</span><span class="p">,</span> <span class="n">float_precision</span><span class="o">=</span><span class="s1">'high'</span><span class="p">)[</span><span class="s1">'c'</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="nb">float</span><span class="p">(</span><span class="n">val</span><span class="p">))</span>
<span class="gr">Out[113]: </span><span class="mf">5.5511151231257827e-17</span>
<span class="gp">In [114]: </span><span class="nb">abs</span><span class="p">(</span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="n">engine</span><span class="o">=</span><span class="s1">'c'</span><span class="p">,</span> <span class="n">float_precision</span><span class="o">=</span><span class="s1">'round_trip'</span><span class="p">)[</span><span class="s1">'c'</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="nb">float</span><span class="p">(</span><span class="n">val</span><span class="p">))</span>
<span class="gr">Out[114]: </span><span class="mf">0.0</span>
</pre></div>
</div>
</div>
<div class="section" id="thousand-separators">
<span id="io-thousands"></span><h3><span class="yiyi-st" id="yiyi-582">Thousand Separators</span></h3>
<p><span class="yiyi-st" id="yiyi-583">对于已使用千位分隔符写入的大数字,可以将<code class="docutils literal"><span class="pre">thousands</span></code>关键字设置为长度为1的字符串,以便整数将被正确解析:</span></p>
<p><span class="yiyi-st" id="yiyi-584">默认情况下,带有千位分隔符的数字将被解析为字符串</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [115]: </span><span class="k">print</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="s1">'tmp.csv'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="go">ID|level|category</span>
<span class="go">Patient1|123,000|x</span>
<span class="go">Patient2|23,000|y</span>
<span class="go">Patient3|1,234,018|z</span>
<span class="gp">In [116]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'tmp.csv'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">'|'</span><span class="p">)</span>
<span class="gp">In [117]: </span><span class="n">df</span>
<span class="gr">Out[117]: </span>
<span class="go"> ID level category</span>
<span class="go">0 Patient1 123,000 x</span>
<span class="go">1 Patient2 23,000 y</span>
<span class="go">2 Patient3 1,234,018 z</span>
<span class="gp">In [118]: </span><span class="n">df</span><span class="o">.</span><span class="n">level</span><span class="o">.</span><span class="n">dtype</span>
<span class="gr">Out[118]: </span><span class="n">dtype</span><span class="p">(</span><span class="s1">'O'</span><span class="p">)</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-585"><code class="docutils literal"><span class="pre">thousands</span></code>关键字允许正确解析整数</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [119]: </span><span class="k">print</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="s1">'tmp.csv'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="go">ID|level|category</span>
<span class="go">Patient1|123,000|x</span>
<span class="go">Patient2|23,000|y</span>
<span class="go">Patient3|1,234,018|z</span>
<span class="gp">In [120]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'tmp.csv'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">'|'</span><span class="p">,</span> <span class="n">thousands</span><span class="o">=</span><span class="s1">','</span><span class="p">)</span>
<span class="gp">In [121]: </span><span class="n">df</span>
<span class="gr">Out[121]: </span>
<span class="go"> ID level category</span>
<span class="go">0 Patient1 123000 x</span>
<span class="go">1 Patient2 23000 y</span>
<span class="go">2 Patient3 1234018 z</span>
<span class="gp">In [122]: </span><span class="n">df</span><span class="o">.</span><span class="n">level</span><span class="o">.</span><span class="n">dtype</span>
<span class="gr">Out[122]: </span><span class="n">dtype</span><span class="p">(</span><span class="s1">'int64'</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="section" id="na-values">
<span id="io-na-values"></span><h3><span class="yiyi-st" id="yiyi-586">NA Values</span></h3>
<p><span class="yiyi-st" id="yiyi-587">要控制哪些值被解析为缺失值(由<code class="docutils literal"><span class="pre">NaN</span></code>表示),请在<code class="docutils literal"><span class="pre">na_values</span></code>中指定一个字符串。</span><span class="yiyi-st" id="yiyi-588">如果指定字符串列表,则其中的所有值都将被视为缺少值。</span><span class="yiyi-st" id="yiyi-589">如果您指定一个数字(<code class="docutils literal"><span class="pre">float</span></code>,例如<code class="docutils literal"><span class="pre">5.0</span></code>或<code class="docutils literal"><span class="pre">integer</span></code>,例如<code class="docutils literal"><span class="pre">5</span></code>),也将意味着缺失值(在这种情况下,有效地<code class="docutils literal"><span class="pre">[5.0,5]</span></code>被识别为<code class="docutils literal"><span class="pre">NaN</span></code>。</span></p>
<p><span class="yiyi-st" id="yiyi-590">要完全覆盖被识别为缺少的默认值,请指定<code class="docutils literal"><span class="pre">keep_default_na=False</span></code>。</span><span class="yiyi-st" id="yiyi-591">The default <code class="docutils literal"><span class="pre">NaN</span></code> recognized values are <code class="docutils literal"><span class="pre">['-1.#IND',</span> <span class="pre">'1.#QNAN',</span> <span class="pre">'1.#IND',</span> <span class="pre">'-1.#QNAN',</span> <span class="pre">'#N/A','N/A',</span> <span class="pre">'NA',</span> <span class="pre">'#NA',</span> <span class="pre">'NULL',</span> <span class="pre">'NaN',</span> <span class="pre">'-NaN',</span> <span class="pre">'nan',</span> <span class="pre">'-nan']</span></code>. </span><span class="yiyi-st" id="yiyi-592">虽然长度为0的字符串<code class="docutils literal"><span class="pre">''</span></code>不包含在默认的<code class="docutils literal"><span class="pre">NaN</span></code>值列表中,但仍被视为缺失值。</span></p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">read_csv</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">na_values</span><span class="o">=</span><span class="p">[</span><span class="mi">5</span><span class="p">])</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-593"><code class="docutils literal"><span class="pre">5</span></code>,<code class="docutils literal"><span class="pre">5.0</span></code>被解释为数字被识别为<code class="docutils literal"><span class="pre">NaN</span></code></span></p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">read_csv</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">keep_default_na</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">na_values</span><span class="o">=</span><span class="p">[</span><span class="s2">""</span><span class="p">])</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-594">只有空字段为<code class="docutils literal"><span class="pre">NaN</span></code></span></p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">read_csv</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">keep_default_na</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">na_values</span><span class="o">=</span><span class="p">[</span><span class="s2">"NA"</span><span class="p">,</span> <span class="s2">"0"</span><span class="p">])</span>
</pre></div>