Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken output for a long BED record by intersectBed #1049

Open
hkawaji opened this issue May 8, 2023 · 0 comments
Open

Broken output for a long BED record by intersectBed #1049

hkawaji opened this issue May 8, 2023 · 0 comments

Comments

@hkawaji
Copy link

hkawaji commented May 8, 2023

Hi,

I found a broken BED record is printed by intersectBed if it is very long (probably longer than 1024bytes).
Here is an example:

# envirionment
% uname -sro ; bedtools --version
Linux 3.10.0-1160.el7.x86_64 GNU/Linux
bedtools v2.30.0

# ttn.bed12 has a single BED12 formatted transcript record
% cat ttn.bed12 | wc
      1      12    3319

# the BED12 formatted record become broken as 11 columns
% intersectBed -wa -a ttn.bed12 -b ttn.bed3 | wc
      1      11    1024

I expected to have 12 column BED record output in the last command, but the output was broken. It seems to occur when a record is longer than 1024 bytes. Notably, 'bed12ToBed6' handle the record properly and I guess it might
be an issue in printing of intersectBed, rather than the fundamental difficulties of the BEDtools suite.

The full data content of this example is below. Note that this is an actual record of human gene, not artificial example.

% cat ttn.bed3
chr2    178525988       178807421

% cat ttn.bed12
chr2    178525988       178807421       ENST00000342992.10:TTN  0       -       178527011 178804642        0       312     1319,303,154,692,157,5609,594,306,576,300,306,585,303,303,300,288,594,282,306,306,297,291,306,303,2067,300,288,294,1767,306,303,300,288,297,303,588,297,17106,303,588,297,198,105,588,288,291,288,306,303,297,288,300,303,300,276,303,300,285,321,2967,294,300,282,309,303,300,282,303,303,282,151,149,315,300,297,318,300,130,33,149,309,430,191,309,300,294,285,297,300,303,363,303,300,279,306,197,106,300,300,116,187,297,288,122,178,148,152,285,115,188,303,303,270,267,125,409,279,267,267,169,98,267,124,143,127,140,267,267,267,127,140,264,267,264,127,140,276,279,402,51,90,63,90,75,81,69,111,78,108,84,78,84,84,84,84,84,84,84,84,90,78,81,297,75,69,78,96,75,75,297,84,84,84,78,84,162,78,93,75,78,207,81,84,84,84,84,78,81,114,102,84,84,81,84,84,84,81,87,78,78,63,405,48,72,84,60,27,78,210,261,268,90,184,286,93,288,291,288,288,279,279,279,288,279,282,279,282,288,288,279,279,279,288,279,282,279,282,288,288,279,279,279,288,279,282,279,282,279,288,279,279,279,282,279,288,279,279,279,282,279,288,279,279,279,282,564,279,57,189,126,285,232,166,142,261,261,261,264,261,261,264,273,267,282,1694,169,165,272,245,234,206,143,216,64,259,66,282,123,294,138,138,138,126,138,153,331,245,86,288,204,104,210,   0,1457,2285,2539,3971,4252,9993,10949,11353,12551,12957,13393,14079,15293,16275,16673,17080,17845,18212,19399,19831,20223,20611,21014,21418,23581,23997,24978,25641,27513,27925,28464,28876,30859,31264,31659,32352,33322,50540,50934,51613,51999,52622,52817,53572,53950,54333,55510,55917,56307,56951,57618,58287,58680,59083,60516,61129,61527,61910,62549,65610,65989,66390,66786,67184,67579,67972,68355,69518,71549,71919,72517,72759,73157,73565,74865,75276,75669,75893,76013,76293,77887,78719,78998,79425,81032,81412,81796,82189,82617,83219,83695,84101,85004,85383,85769,86068,86288,86784,87172,87762,88063,88477,88858,89318,89652,90490,90740,91131,91336,91790,92200,92595,93632,93999,94228,94726,95113,95486,95851,96681,98476,99284,103312,104252,104815,105045,106158,106537,106929,107198,107424,107828,108377,108734,109176,109451,109974,110409,111380,113710,114059,114552,115252,116248,118559,119931,123828,124175,124762,125254,125464,125677,125895,126107,126275,126469,133016,133183,136977,144229,145101,145982,146418,146646,147644,148325,149050,149932,151632,152136,152425,152758,153350,153610,153905,154265,155090,155387,155672,156708,157222,158010,158341,158677,158917,159264,159529,162122,162688,163064,163301,163526,163824,166027,166508,167620,167933,168610,168840,169359,169876,171132,172854,175131,175539,176051,176179,176465,178158,178521,178888,179185,180465,180873,181537,183577,184646,185073,185955,186326,186708,187096,187908,188303,188997,189504,191106,191534,191954,192333,192706,193175,193565,193994,194396,194932,195858,196270,196670,197057,197429,197867,198271,199379,199779,201101,201596,202121,202511,202902,203299,203675,204104,204516,204936,205316,205704,206077,206451,206845,207250,207625,208339,208718,209522,212093,227135,232995,238188,238538,240392,241770,242025,242684,243690,244071,244423,245222,247120,247473,247849,248218,248932,249367,251160,251431,251715,252885,253240,254011,255132,256223,256550,256817,257731,258081,259631,259859,263371,263989,264719,266083,267415,268410,268933,273498,273836,274406,276149,278563,281223,

% intersectBed -a ttn.bed12 -b ttn.bed3
chr2    178525988       178807421       ENST00000342992.10:TTN  0       -       178527011 178804642        0       312     1319,303,154,692,157,5609,594,306,576,300,306,585,303,303,300,288,594,282,306,306,297,291,306,303,2067,300,288,294,1767,306,303,300,288,297,303,588,297,17106,303,588,297,198,105,588,288,291,288,306,303,297,288,300,303,300,276,303,300,285,321,2967,294,300,282,309,303,300,282,303,303,282,151,149,315,300,297,318,300,130,33,149,309,430,191,309,300,294,285,297,300,303,363,303,300,279,306,197,106,300,300,116,187,297,288,122,178,148,152,285,115,188,303,303,270,267,125,409,279,267,267,169,98,267,124,143,127,140,267,267,267,127,140,264,267,264,127,140,276,279,402,51,90,63,90,75,81,69,111,78,108,84,78,84,84,84,84,84,84,84,84,90,78,81,297,75,69,78,96,75,75,297,84,84,84,78,84,162,78,93,75,78,207,81,84,84,84,84,78,81,114,102,84,84,81,84,84,84,81,87,78,78,63,405,48,72,84,60,27,78,210,261,268,90,184,286,93,288,291,288,288,279,279,279,288,279,282,279,282,288,288,279,279,279,288,279,282,279,282,288,288,279,279,279,288,279,282,279,282,279,288,279,27
% cat ttn.bed12 |bed12ToBed6 |wc -l
312
hkawaji added a commit to hkawaji/bedtools2 that referenced this issue May 20, 2023
A naive workaround for the issue reported in arq5x#1049.
The size of printing buffer is increased from 1024 to 8192 to handle very long entry, for example a gene with exceptionally many exons.
hkawaji added a commit to hkawaji/bedtools2 that referenced this issue May 20, 2023
hkawaji added a commit to hkawaji/bedtools2 that referenced this issue May 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant