Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup problems #1

Open
hbsong-03 opened this issue Aug 8, 2022 · 0 comments
Open

Setup problems #1

hbsong-03 opened this issue Aug 8, 2022 · 0 comments

Comments

@hbsong-03
Copy link

Dear Developer,
I am a biomedical student and fresh to coding, thank you for offering this great software.

I met with multiple problems in ptm_data_import related to :

human_fasta = fasta.IndexedUniProt('../data/human_fasta/uniprot-filtered-organism__Homo+sapiens+(Human)+[9606]_.fasta')
I could not find this .fasta file anywhere, thus was unable to process the [import_ubi_library_data.ipynb] and [import_sugiyama_data.ipynb].

I met with a KeyError in [IDR_benchmark.ipynb]:

In: disordered_data = pd.read_csv('/Users/nc1/StructuremapDEV/structuremap/data/order/disordered_regions.csv',sep=";")
disordered_data = extract_region_boundaries(disordered_data)
print(disordered_data[0:3])
disordered_data_annotation = get_disorder_annotation(df=disordered_data)
disordered_data_annotation = disordered_data_annotation.rename(columns={"structure": "disordered"})
print(disordered_data_annotation[0:3])

KeyError Traceback (most recent call last)
File ~\anaconda3\envs\structuremap\lib\site-packages\pandas\core\indexes\base.py:3621, in Index.get_loc(self, key, method, tolerance)
3620 try:
-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:

File ~\anaconda3\envs\structuremap\lib\site-packages\pandas_libs\index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()

File ~\anaconda3\envs\structuremap\lib\site-packages\pandas_libs\index.pyx:163, in pandas._libs.index.IndexEngine.get_loc()

File pandas_libs\hashtable_class_helper.pxi:5198, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas_libs\hashtable_class_helper.pxi:5206, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'UniProt boundaries'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
Input In [6], in <cell line: 2>()
1 disordered_data = pd.read_csv('/Users/nc1/StructuremapDEV/structuremap/data/order/disordered_regions.csv',sep=";")
----> 2 disordered_data = extract_region_boundaries(disordered_data)
3 print(disordered_data[0:3])
4 disordered_data_annotation = get_disorder_annotation(df=disordered_data)

Input In [4], in extract_region_boundaries(df)
1 def extract_region_boundaries(df: pd.DataFrame) -> pd.DataFrame:
----> 2 start = [x.split('-')[0] for x in df["UniProt boundaries"]]
3 end = [x.split('-')[1] for x in df["UniProt boundaries"]]
4 df["start"] = start

File ~\anaconda3\envs\structuremap\lib\site-packages\pandas\core\frame.py:3505, in DataFrame.getitem(self, key)
3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]

File ~\anaconda3\envs\structuremap\lib\site-packages\pandas\core\indexes\base.py:3623, in Index.get_loc(self, key, method, tolerance)
3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
-> 3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will raise
3626 # InvalidIndexError. Otherwise we fall through and re-raise
3627 # the TypeError.
3628 self._check_indexing_error(key)

KeyError: 'UniProt boundaries'

I also experienced multiple errors while running through data_anaylsys_structuremap.ipynb.

Prepare The Environment:

**In: from accessory_functions import ***

ModuleNotFoundError Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 from accessory_functions import *

ModuleNotFoundError: No module named 'accessory_functions'

Annotate IDRs >> IDR_benchmark notebook - Visualize pPAE cutoff

**In: pPSE_cut = px.bar(bincount_df, x='pPSE',y='count', color='cutoff',
color_discrete_map={'high exposure':'rgb(177, 63, 100)',
'low exposure':'grey'},
template="simple_white",
width=500, height=300)
pPSE_cut = pPSE_cut.update_layout(legend=dict(
title='',
yanchor="top",
y=0.99,
xanchor="right",
x=0.99
))
config={'toImageButtonOptions': {'format': 'svg', 'filename':'pPAE_cutoff'}}

pPSE_cut.show(config=config)**

KeyError Traceback (most recent call last)
Input In [30], in <cell line: 1>()
----> 1 pPSE_cut = px.bar(bincount_df, x='pPSE',y='count', color='cutoff',
2 color_discrete_map={'high exposure':'rgb(177, 63, 100)',
3 'low exposure':'grey'},
4 template="simple_white",
5 width=500, height=300)
6 pPSE_cut = pPSE_cut.update_layout(legend=dict(
7 title='',
8 yanchor="top",
(...)
11 x=0.99
12 ))
13 config={'toImageButtonOptions': {'format': 'svg', 'filename':'pPAE_cutoff'}}

File ~\anaconda3\envs\structuremap_analysis\lib\site-packages\plotly\express_chart_types.py:350, in bar(data_frame, x, y, color, facet_row, facet_col, facet_col_wrap, facet_row_spacing, facet_col_spacing, hover_name, hover_data, custom_data, text, base, error_x, error_x_minus, error_y, error_y_minus, animation_frame, animation_group, category_orders, labels, color_discrete_sequence, color_discrete_map, color_continuous_scale, range_color, color_continuous_midpoint, opacity, orientation, barmode, log_x, log_y, range_x, range_y, title, template, width, height)
306 def bar(
307 data_frame=None,
308 x=None,
(...)
344 height=None,
345 ):
346 """
347 In a bar plot, each row of data_frame is represented as a rectangular
348 mark.
349 """
--> 350 return make_figure(
351 args=locals(),
352 constructor=go.Bar,
353 trace_patch=dict(textposition="auto"),
354 layout_patch=dict(barmode=barmode),
355 )

File ~\anaconda3\envs\structuremap_analysis\lib\site-packages\plotly\express_core.py:1854, in make_figure(args, constructor, trace_patch, layout_patch)
1852 prefix = get_label(args, args["facet_row"]) + "="
1853 row_labels = [prefix + str(s) for s in sorted_group_values[m.grouper]]
-> 1854 for val in sorted_group_values[m.grouper]:
1855 if val not in m.val_map:
1856 m.val_map[val] = m.sequence[len(m.val_map) % len(m.sequence)]

KeyError: 'cutoff'

Find short unstructured regions within large folded domains

**In: proteins_with_pattern = alphafold_accessibility_smooth_pattern_ext[alphafold_accessibility_smooth_pattern_ext.flexible_pattern==1].protein_id.unique()

textfile = open("data/short_idrs/proteins_with_pattern.txt", "w")
for element in proteins_with_pattern:
textfile.write(element + "\n")

all_proteins = alphafold_accessibility_smooth_pattern_ext.protein_id.unique()

textfile = open("data/short_idrs/all_proteins.txt", "w")
for element in all_proteins:
textfile.write(element + "\n")
textfile.close()**

FileNotFoundError Traceback (most recent call last)
Input In [34], in <cell line: 3>()
1 proteins_with_pattern = alphafold_accessibility_smooth_pattern_ext[alphafold_accessibility_smooth_pattern_ext.flexible_pattern==1].protein_id.unique()
----> 3 textfile = open("data/short_idrs/proteins_with_pattern.txt", "w")
4 for element in proteins_with_pattern:
5 textfile.write(element + "\n")

FileNotFoundError: [Errno 2] No such file or directory: 'data/short_idrs/proteins_with_pattern.txt'

David enrichment analysis of GO MF

In: enrichment_david = pd.read_csv('data/short_idrs/pattern_enrichment.txt', sep='\t')

FileNotFoundError Traceback (most recent call last)
Input In [37], in <cell line: 1>()
----> 1 enrichment_david = pd.read_csv('data/short_idrs/pattern_enrichment.txt', sep='\t')

File ~\anaconda3\envs\structuremap_analysis\lib\site-packages\pandas\util_decorators.py:311, in deprecate_nonkeyword_arguments..decorate..wrapper(*args, **kwargs)
305 if len(args) > num_allow_args:
306 warnings.warn(
307 msg.format(arguments=arguments),
308 FutureWarning,
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)

File ~\anaconda3\envs\structuremap_analysis\lib\site-packages\pandas\io\parsers\readers.py:680, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
665 kwds_defaults = _refine_defaults_read(
666 dialect,
667 delimiter,
(...)
676 defaults={"delimiter": ","},
677 )
678 kwds.update(kwds_defaults)
--> 680 return _read(filepath_or_buffer, kwds)

File ~\anaconda3\envs\structuremap_analysis\lib\site-packages\pandas\io\parsers\readers.py:575, in _read(filepath_or_buffer, kwds)
572 _validate_names(kwds.get("names", None))
574 # Create the parser.
--> 575 parser = TextFileReader(filepath_or_buffer, **kwds)
577 if chunksize or iterator:
578 return parser

File ~\anaconda3\envs\structuremap_analysis\lib\site-packages\pandas\io\parsers\readers.py:933, in TextFileReader.init(self, f, engine, **kwds)
930 self.options["has_index_names"] = kwds["has_index_names"]
932 self.handles: IOHandles | None = None
--> 933 self._engine = self._make_engine(f, self.engine)

File ~\anaconda3\envs\structuremap_analysis\lib\site-packages\pandas\io\parsers\readers.py:1217, in TextFileReader._make_engine(self, f, engine)
1213 mode = "rb"
1214 # error: No overload variant of "get_handle" matches argument types
1215 # "Union[str, PathLike[str], ReadCsvBuffer[bytes], ReadCsvBuffer[str]]"
1216 # , "str", "bool", "Any", "Any", "Any", "Any", "Any"
-> 1217 self.handles = get_handle( # type: ignore[call-overload]
1218 f,
1219 mode,
1220 encoding=self.options.get("encoding", None),
1221 compression=self.options.get("compression", None),
1222 memory_map=self.options.get("memory_map", False),
1223 is_text=is_text,
1224 errors=self.options.get("encoding_errors", "strict"),
1225 storage_options=self.options.get("storage_options", None),
1226 )
1227 assert self.handles is not None
1228 f = self.handles.handle

File ~\anaconda3\envs\structuremap_analysis\lib\site-packages\pandas\io\common.py:789, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
784 elif isinstance(handle, str):
785 # Check whether the filename is to be opened in binary mode.
786 # Binary mode does not support 'encoding' and 'newline'.
787 if ioargs.encoding and "b" not in ioargs.mode:
788 # Encoding
--> 789 handle = open(
790 handle,
791 ioargs.mode,
792 encoding=ioargs.encoding,
793 errors=errors,
794 newline="",
795 )
796 else:
797 # Binary mode
798 handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: 'data/short_idrs/pattern_enrichment.txt'

Then there are continouus NameError after it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant