Setup problems #1

hbsong-03 opened this issue Aug 8, 2022 · 0 comments

Dear Developer,
I am a biomedical student and fresh to coding, thank you for offering this great software.

I met with multiple problems in ptm_data_import related to :

human_fasta = fasta.IndexedUniProt('../data/human_fasta/uniprot-filtered-organism__Homo+sapiens+(Human)+[9606]_.fasta')
I could not find this .fasta file anywhere, thus was unable to process the [import_ubi_library_data.ipynb] and [import_sugiyama_data.ipynb].

I met with a KeyError in [IDR_benchmark.ipynb]:

In: disordered_data = pd.read_csv('/Users/nc1/StructuremapDEV/structuremap/data/order/disordered_regions.csv',sep=";")
disordered_data = extract_region_boundaries(disordered_data)
disordered_data_annotation = get_disorder_annotation(df=disordered_data)
disordered_data_annotation = disordered_data_annotation.rename(columns={"structure": "disordered"})

KeyError Traceback (most recent call last)
File ~\anaconda3\envs\structuremap\lib\site-packages\pandas\core\indexes\, in Index.get_loc(self, key, method, tolerance)
3620 try:
-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:

File ~\anaconda3\envs\structuremap\lib\site-packages\pandas_libs\index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()

File ~\anaconda3\envs\structuremap\lib\site-packages\pandas_libs\index.pyx:163, in pandas._libs.index.IndexEngine.get_loc()

File pandas_libs\hashtable_class_helper.pxi:5198, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas_libs\hashtable_class_helper.pxi:5206, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'UniProt boundaries'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
Input In [6], in <cell line: 2>()
1 disordered_data = pd.read_csv('/Users/nc1/StructuremapDEV/structuremap/data/order/disordered_regions.csv',sep=";")
----> 2 disordered_data = extract_region_boundaries(disordered_data)
3 print(disordered_data[0:3])
4 disordered_data_annotation = get_disorder_annotation(df=disordered_data)

Input In [4], in extract_region_boundaries(df)
1 def extract_region_boundaries(df: pd.DataFrame) -> pd.DataFrame:
----> 2 start = [x.split('-')[0] for x in df["UniProt boundaries"]]
3 end = [x.split('-')[1] for x in df["UniProt boundaries"]]
4 df["start"] = start

File ~\anaconda3\envs\structuremap\lib\site-packages\pandas\core\, in DataFrame.getitem(self, key)
3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]

File ~\anaconda3\envs\structuremap\lib\site-packages\pandas\core\indexes\, in Index.get_loc(self, key, method, tolerance)
3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
-> 3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will raise
3626 # InvalidIndexError. Otherwise we fall through and re-raise
3627 # the TypeError.
3628 self._check_indexing_error(key)

KeyError: 'UniProt boundaries'

I also experienced multiple errors while running through data_anaylsys_structuremap.ipynb.

Prepare The Environment:

**In: from accessory_functions import ***

ModuleNotFoundError Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 from accessory_functions import *

ModuleNotFoundError: No module named 'accessory_functions'

Annotate IDRs >> IDR_benchmark notebook - Visualize pPAE cutoff

**In: pPSE_cut =, x='pPSE',y='count', color='cutoff',
color_discrete_map={'high exposure':'rgb(177, 63, 100)',
'low exposure':'grey'},
width=500, height=300)
pPSE_cut = pPSE_cut.update_layout(legend=dict(
config={'toImageButtonOptions': {'format': 'svg', 'filename':'pPAE_cutoff'}}**

KeyError Traceback (most recent call last)
Input In [30], in <cell line: 1>()
----> 1 pPSE_cut =, x='pPSE',y='count', color='cutoff',
2 color_discrete_map={'high exposure':'rgb(177, 63, 100)',
3 'low exposure':'grey'},
4 template="simple_white",
5 width=500, height=300)
6 pPSE_cut = pPSE_cut.update_layout(legend=dict(
7 title='',
8 yanchor="top",
11 x=0.99
12 ))
13 config={'toImageButtonOptions': {'format': 'svg', 'filename':'pPAE_cutoff'}}

File ~\anaconda3\envs\structuremap_analysis\lib\site-packages\plotly\, in bar(data_frame, x, y, color, facet_row, facet_col, facet_col_wrap, facet_row_spacing, facet_col_spacing, hover_name, hover_data, custom_data, text, base, error_x, error_x_minus, error_y, error_y_minus, animation_frame, animation_group, category_orders, labels, color_discrete_sequence, color_discrete_map, color_continuous_scale, range_color, color_continuous_midpoint, opacity, orientation, barmode, log_x, log_y, range_x, range_y, title, template, width, height)
306 def bar(
307 data_frame=None,
308 x=None,
344 height=None,
345 ):
346 """
347 In a bar plot, each row of data_frame is represented as a rectangular
348 mark.
349 """
--> 350 return make_figure(
351 args=locals(),
352 constructor=go.Bar,
353 trace_patch=dict(textposition="auto"),
354 layout_patch=dict(barmode=barmode),
355 )

File ~\anaconda3\envs\structuremap_analysis\lib\site-packages\plotly\, in make_figure(args, constructor, trace_patch, layout_patch)
1852 prefix = get_label(args, args["facet_row"]) + "="
1853 row_labels = [prefix + str(s) for s in sorted_group_values[m.grouper]]
-> 1854 for val in sorted_group_values[m.grouper]:
1855 if val not in m.val_map:
1856 m.val_map[val] = m.sequence[len(m.val_map) % len(m.sequence)]

KeyError: 'cutoff'

Find short unstructured regions within large folded domains

**In: proteins_with_pattern = alphafold_accessibility_smooth_pattern_ext[alphafold_accessibility_smooth_pattern_ext.flexible_pattern==1].protein_id.unique()

textfile = open("data/short_idrs/proteins_with_pattern.txt", "w")
for element in proteins_with_pattern:
textfile.write(element + "\n")

all_proteins = alphafold_accessibility_smooth_pattern_ext.protein_id.unique()

textfile = open("data/short_idrs/all_proteins.txt", "w")
for element in all_proteins:
textfile.write(element + "\n")

FileNotFoundError Traceback (most recent call last)
Input In [34], in <cell line: 3>()
1 proteins_with_pattern = alphafold_accessibility_smooth_pattern_ext[alphafold_accessibility_smooth_pattern_ext.flexible_pattern==1].protein_id.unique()
----> 3 textfile = open("data/short_idrs/proteins_with_pattern.txt", "w")
4 for element in proteins_with_pattern:
5 textfile.write(element + "\n")

FileNotFoundError: [Errno 2] No such file or directory: 'data/short_idrs/proteins_with_pattern.txt'

David enrichment analysis of GO MF

In: enrichment_david = pd.read_csv('data/short_idrs/pattern_enrichment.txt', sep='\t')

FileNotFoundError Traceback (most recent call last)
Input In [37], in <cell line: 1>()
----> 1 enrichment_david = pd.read_csv('data/short_idrs/pattern_enrichment.txt', sep='\t')

File ~\anaconda3\envs\structuremap_analysis\lib\site-packages\pandas\, in deprecate_nonkeyword_arguments..decorate..wrapper(*args, **kwargs)
305 if len(args) > num_allow_args:
306 warnings.warn(
307 msg.format(arguments=arguments),
308 FutureWarning,
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)

File ~\anaconda3\envs\structuremap_analysis\lib\site-packages\pandas\io\parsers\, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
665 kwds_defaults = _refine_defaults_read(
666 dialect,
667 delimiter,
676 defaults={"delimiter": ","},
677 )
678 kwds.update(kwds_defaults)
--> 680 return _read(filepath_or_buffer, kwds)

File ~\anaconda3\envs\structuremap_analysis\lib\site-packages\pandas\io\parsers\, in _read(filepath_or_buffer, kwds)
572 _validate_names(kwds.get("names", None))
574 # Create the parser.
--> 575 parser = TextFileReader(filepath_or_buffer, **kwds)
577 if chunksize or iterator:
578 return parser

File ~\anaconda3\envs\structuremap_analysis\lib\site-packages\pandas\io\parsers\, in TextFileReader.init(self, f, engine, **kwds)
930 self.options["has_index_names"] = kwds["has_index_names"]
932 self.handles: IOHandles | None = None
--> 933 self._engine = self._make_engine(f, self.engine)

File ~\anaconda3\envs\structuremap_analysis\lib\site-packages\pandas\io\parsers\, in TextFileReader._make_engine(self, f, engine)
1213 mode = "rb"
1214 # error: No overload variant of "get_handle" matches argument types
1215 # "Union[str, PathLike[str], ReadCsvBuffer[bytes], ReadCsvBuffer[str]]"
1216 # , "str", "bool", "Any", "Any", "Any", "Any", "Any"
-> 1217 self.handles = get_handle( # type: ignore[call-overload]
1218 f,
1219 mode,
1220 encoding=self.options.get("encoding", None),
1221 compression=self.options.get("compression", None),
1222 memory_map=self.options.get("memory_map", False),
1223 is_text=is_text,
1224 errors=self.options.get("encoding_errors", "strict"),
1225 storage_options=self.options.get("storage_options", None),
1226 )
1227 assert self.handles is not None
1228 f = self.handles.handle

File ~\anaconda3\envs\structuremap_analysis\lib\site-packages\pandas\io\, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
784 elif isinstance(handle, str):
785 # Check whether the filename is to be opened in binary mode.
786 # Binary mode does not support 'encoding' and 'newline'.
787 if ioargs.encoding and "b" not in ioargs.mode:
788 # Encoding
--> 789 handle = open(
790 handle,
791 ioargs.mode,
792 encoding=ioargs.encoding,
793 errors=errors,
794 newline="",
795 )
796 else:
797 # Binary mode
798 handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: 'data/short_idrs/pattern_enrichment.txt'

Then there are continouus NameError after it.

