Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cudf::split on a String column returns a Column with non-empty nulls #13638

Closed
razajafri opened this issue Jun 29, 2023 · 0 comments · Fixed by #13647
Closed

[BUG] cudf::split on a String column returns a Column with non-empty nulls #13638

razajafri opened this issue Jun 29, 2023 · 0 comments · Fixed by #13647
Assignees
Labels
bug Something isn't working

Comments

@razajafri
Copy link
Contributor

Describe the bug
Creating a String column without non-empty nulls and calling splitAsViews on it returns at least one column_view with non-empty nulls

Steps/Code to reproduce bug

    @Test
    public void testColumnViewEmptyNulls() {
        try (ColumnVector vector = ColumnVector.fromStrings(null, "èzäCßy", "OeÔÙ%\u0000C*",
                "Z\u0000\u0000Ç\u0000õ", "黼£òÉÀ", null, "", "8Á\u0000.\u0000HbP", "YíÃr®\"E", null,
                "îÎ2\u0000ÿ{\u0000")) {
            ColumnView[] splits = vector.splitAsViews(2);
            assertFalse(vector.hasNonEmptyNulls());
            assertFalse(splits[0].hasNonEmptyNulls());
            assertFalse(splits[1].hasNonEmptyNulls());
        }
    }

Expected behavior
The above test should pass

Environment overview (please complete the following information)

  • Environment location: Bare-metal
  • Method of cuDF install: from source
    • If method of install is [Docker], provide docker pull & docker run commands used

Environment details

Click here to see environment details
 **git***
 commit 701c84e11bd5e3a0c442e3f602c612a90560a0d7 (HEAD, origin/branch-23.08)
 Author: Jenkins Automation <70000568+nvauto@users.noreply.github.com>
 Date:   Wed Jun 28 21:00:56 2023 +0800
 
 Update submodule cudf to 0a52c5211bedf82b81a37660ae94c998e596d475 (#1239)
 
 Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
 **git submodules***
 +5c615cc1325372a8041378b83be73f65142568ff cudf (v0.12.0-18098-g5c615cc132)
 
 ***OS Information***
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=22.04
 DISTRIB_CODENAME=jammy
 DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"
 PRETTY_NAME="Ubuntu 22.04.1 LTS"
 NAME="Ubuntu"
 VERSION_ID="22.04"
 VERSION="22.04.1 LTS (Jammy Jellyfish)"
 VERSION_CODENAME=jammy
 ID=ubuntu
 ID_LIKE=debian
 HOME_URL="https://www.ubuntu.com/"
 SUPPORT_URL="https://help.ubuntu.com/"
 BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
 PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
 UBUNTU_CODENAME=jammy
 Linux raza-linux-1 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
 
 ***GPU Information***
 Wed Jun 28 18:05:11 2023
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0     |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |                               |                      |               MIG M. |
 |===============================+======================+======================|
 |   0  Quadro RTX 6000     On   | 00000000:17:00.0 Off |                  Off |
 | 34%   35C    P8    23W / 260W |      6MiB / 24576MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 
 +-----------------------------------------------------------------------------+
 | Processes:                                                                  |
 |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
 |        ID   ID                                                   Usage      |
 |=============================================================================|
 |    0   N/A  N/A      1129      G   /usr/lib/xorg/Xorg                  4MiB |
 +-----------------------------------------------------------------------------+
 
 ***CPU***
 Architecture:                    x86_64
 CPU op-mode(s):                  32-bit, 64-bit
 Address sizes:                   46 bits physical, 48 bits virtual
 Byte Order:                      Little Endian
 CPU(s):                          12
 On-line CPU(s) list:             0-11
 Vendor ID:                       GenuineIntel
 Model name:                      Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz
 CPU family:                      6
 Model:                           85
 Thread(s) per core:              2
 Core(s) per socket:              6
 Socket(s):                       1
 Stepping:                        4
 CPU max MHz:                     4000.0000
 CPU min MHz:                     1200.0000
 BogoMIPS:                        6999.82
 Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req md_clear flush_l1d arch_capabilities
 Virtualization:                  VT-x
 L1d cache:                       192 KiB (6 instances)
 L1i cache:                       192 KiB (6 instances)
 L2 cache:                        6 MiB (6 instances)
 L3 cache:                        8.3 MiB (1 instance)
 NUMA node(s):                    1
 NUMA node0 CPU(s):               0-11
 Vulnerability Itlb multihit:     KVM: Mitigation: VMX disabled
 Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
 Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT vulnerable
 Vulnerability Meltdown:          Mitigation; PTI
 Vulnerability Mmio stale data:   Mitigation; Clear CPU buffers; SMT vulnerable
 Vulnerability Retbleed:          Mitigation; IBRS
 Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
 Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
 Vulnerability Spectre v2:        Mitigation; IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected
 Vulnerability Srbds:             Not affected
 Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT vulnerable
 
 ***CMake***
 /usr/local/bin/cmake
 cmake version 3.26.4
 
 CMake suite maintained and supported by Kitware (kitware.com/cmake).
 
 ***g++***
 /usr/bin/g++
 g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
 Copyright (C) 2021 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
 
 ***nvcc***
 
 ***Python***
 /usr/bin/python
 Python 3.10.6
 
 ***Environment Variables***
 PATH                            : /home/rjafri/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
 LD_LIBRARY_PATH                 :
 NUMBAPRO_NVVM                   :
 NUMBAPRO_LIBDEVICE              :
 CONDA_PREFIX                    :
 PYTHON_PATH                     :
 
 conda not found
 ***pip packages***
 /usr/bin/pip
 Package                 Version
 ----------------------- ---------------
 apturl                  0.5.2
 bcrypt                  3.2.0
 blinker                 1.4
 Brlapi                  0.8.3
 certifi                 2020.6.20
 cfgv                    3.3.1
 chardet                 4.0.0
 click                   8.0.3
 colorama                0.4.4
 command-not-found       0.3
 cryptography            3.4.8
 cupshelpers             1.0
 databricks-cli          0.17.7
 dbus-python             1.2.18
 defer                   1.0.6
 distlib                 0.3.4
 distro                  1.7.0
 distro-info             1.1build1
 duplicity               0.8.21
 exceptiongroup          1.1.1
 fasteners               0.14.1
 filelock                3.6.0
 findspark               2.0.1
 future                  0.18.2
 httplib2                0.20.2
 identify                2.4.10
 idna                    3.3
 importlib-metadata      4.6.4
 iniconfig               2.0.0
 jeepney                 0.7.1
 keyring                 23.5.0
 language-selector       0.1
 launchpadlib            1.10.16
 lazr.restfulclient      0.14.4
 lazr.uri                1.0.6
 lockfile                0.12.2
 louis                   3.20.0
 macaroonbakery          1.3.1
 Mako                    1.1.3
 MarkupSafe              2.0.1
 meld                    3.20.4
 monotonic               1.6
 more-itertools          8.10.0
 netifaces               0.11.0
 nodeenv                 0.13.4
 numpy                   1.24.3
 oauthlib                3.2.0
 olefile                 0.46
 packaging               23.1
 pandas                  2.0.2
 paramiko                2.9.3
 pexpect                 4.8.0
 Pillow                  9.0.1
 pip                     22.0.2
 platformdirs            2.5.1
 pluggy                  1.0.0
 pre-commit              3.3.3
 protobuf                3.12.4
 ptyprocess              0.7.0
 pyarrow                 12.0.0
 pycairo                 1.20.1
 pycups                  2.0.1
 PyGObject               3.42.1
 PyJWT                   2.4.0
 pymacaroons             0.13.0
 PyNaCl                  1.5.0
 pyparsing               2.4.7
 pyRFC3339               1.1
 pytest                  7.3.2
 python-apt              2.3.0+ubuntu2.1
 python-dateutil         2.8.2
 python-debian           0.1.43ubuntu1
 pytz                    2022.1
 pyxdg                   0.27
 PyYAML                  5.4.1
 reportlab               3.6.8
 requests                2.25.1
 screen-resolution-extra 0.0.0
 SecretStorage           3.3.1
 setuptools              59.6.0
 six                     1.16.0
 sre-yield               1.2
 ssh-import-id           5.11
 systemd-python          234
 tabulate                0.9.0
 toml                    0.10.2
 tomli                   2.0.1
 trash-cli               0.17.1.14
 tzdata                  2023.3
 ubuntu-advantage-tools  27.9
 ubuntu-drivers-common   0.0.0
 ufw                     0.36.1
 unattended-upgrades     0.1
 urllib3                 1.26.16
 usb-creator             0.3.7
 virtualenv              20.13.0+ds
 wadllib                 1.3.6
 wheel                   0.37.1
 xdg                     5
 xdist                   0.0.2
 xkit                    0.0.0
 zipp                    1.0.0

@razajafri razajafri added bug Something isn't working Needs Triage Need team to review and classify labels Jun 29, 2023
@ttnghia ttnghia self-assigned this Jun 29, 2023
rapids-bot bot pushed a commit that referenced this issue Jun 29, 2023
This fixes `has_nonempty_nulls` that always access the column offsets without considering the starting offset. As such, it may give wrong answer if the input column is sliced.

Closes #13638.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Raza Jafri (https://github.com/razajafri)
  - Mike Wilson (https://github.com/hyperbolic2346)

URL: #13647
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF/Dask/Numba/UCX Jun 29, 2023
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants