@@ -357,6 +357,146 @@ takes a list of columns to sort by.
357
357
tips = tips.sort_values([' sex' , ' total_bill' ])
358
358
tips.head()
359
359
360
+
361
+ String Processing
362
+ -----------------
363
+
364
+ Length
365
+ ~~~~~~
366
+
367
+ SAS determines the length of a character string with the
368
+ `LENGTHN <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002284668.htm >`__
369
+ and `LENGTHC <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002283942.htm >`__
370
+ functions. ``LENGTHN `` excludes trailing blanks and ``LENGTHC `` includes trailing blanks.
371
+
372
+ .. code-block :: none
373
+
374
+ data _null_;
375
+ set tips;
376
+ put(LENGTHN(time));
377
+ put(LENGTHC(time));
378
+ run;
379
+
380
+ Python determines the length of a character string with the ``len `` function.
381
+ ``len `` includes trailing blanks. Use ``len `` and ``rstrip `` to exclude
382
+ trailing blanks.
383
+
384
+ .. ipython :: python
385
+
386
+ tips[' time' ].str.len().head()
387
+ tips[' time' ].str.rstrip().str.len().head()
388
+
389
+
390
+ Find
391
+ ~~~~
392
+
393
+ SAS determines the position of a character in a string with the
394
+ `FINDW <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002978282.htm >`__ function.
395
+ ``FINDW `` takes the string defined by the first argument and searches for the first position of the substring
396
+ you supply as the second argument.
397
+
398
+ .. code-block :: none
399
+
400
+ data _null_;
401
+ set tips;
402
+ put(FINDW(sex,'ale'));
403
+ run;
404
+
405
+ Python determines the position of a character in a string with the
406
+ ``find `` function. ``find `` searches for the first position of the
407
+ substring. If the substring is found, the function returns its
408
+ position. Keep in mind that Python indexes are zero-based and
409
+ the function will return -1 if it fails to find the substring.
410
+
411
+ .. ipython :: python
412
+
413
+ tips[' sex' ].str.find(" ale" ).head()
414
+
415
+
416
+ Substring
417
+ ~~~~~~~~~
418
+
419
+ SAS extracts a substring from a string based on its position with the
420
+ `SUBSTR <http://www2.sas.com/proceedings/sugi25/25/cc/25p088.pdf >`__ function.
421
+
422
+ .. code-block :: none
423
+
424
+ data _null_;
425
+ set tips;
426
+ put(substr(sex,1,1));
427
+ run;
428
+
429
+ With pandas you can use ``[] `` notation to extract a substring
430
+ from a string by position locations. Keep in mind that Python
431
+ indexes are zero-based.
432
+
433
+ .. ipython :: python
434
+
435
+ tips[' sex' ].str[0 :1 ].head()
436
+
437
+
438
+ Scan
439
+ ~~~~
440
+
441
+ The SAS `SCAN <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000214639.htm >`__
442
+ function returns the nth word from a string. The first argument is the string you want to parse and the
443
+ second argument specifies which word you want to extract.
444
+
445
+ .. code-block :: none
446
+
447
+ data firstlast;
448
+ input String $60.;
449
+ First_Name = scan(string, 1);
450
+ Last_Name = scan(string, -1);
451
+ datalines2;
452
+ John Smith;
453
+ Jane Cook;
454
+ ;;;
455
+ run;
456
+
457
+ Python extracts a substring from a string based on its text
458
+ by using regular expressions. There are much more powerful
459
+ approaches, but this just shows a simple approach.
460
+
461
+ .. ipython :: python
462
+
463
+ firstlast = pd.DataFrame({' String' : [' John Smith' , ' Jane Cook' ]})
464
+ firstlast[' First_Name' ] = firstlast[' String' ].str.split(" " , expand = True )[0 ]
465
+ firstlast[' Last_Name' ] = firstlast[' String' ].str.rsplit(" " , expand = True )[0 ]
466
+ firstlast
467
+
468
+
469
+ Upcase, Lowcase, and Propcase
470
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
471
+
472
+ The SAS `UPCASE <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245965.htm >`__
473
+ `LOWCASE <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245912.htm >`__ and
474
+ `PROPCASE <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/a002598106.htm >`__
475
+ functions change the case of the argument.
476
+
477
+ .. code-block :: none
478
+
479
+ data firstlast;
480
+ input String $60.;
481
+ string_up = UPCASE(string);
482
+ string_low = LOWCASE(string);
483
+ string_prop = PROPCASE(string);
484
+ datalines2;
485
+ John Smith;
486
+ Jane Cook;
487
+ ;;;
488
+ run;
489
+
490
+ The equivalent Python functions are ``upper ``, ``lower ``, and ``title ``.
491
+
492
+ .. ipython :: python
493
+
494
+ firstlast = pd.DataFrame({' String' : [' John Smith' , ' Jane Cook' ]})
495
+ firstlast[' string_up' ] = firstlast[' String' ].str.upper()
496
+ firstlast[' string_low' ] = firstlast[' String' ].str.lower()
497
+ firstlast[' string_prop' ] = firstlast[' String' ].str.title()
498
+ firstlast
499
+
360
500
Merging
361
501
-------
362
502
0 commit comments