sequencework

Mainly python scripts related to nucleic or protein sequence work

I sorely need to put an index here with links to better guide to the appropriate folders. <== TO DO
(For now look at the title of the folders to try and discern if it is something of interest.)

Descriptions of the scripts are found within README.md files in the sub folders.

Several have demonstrations in sessions served by MyBinder.org from my command line-sequence associated repo; however, probably best to follow guide listed with individual scripts so that you quickly find the right location. If you already know where you are going, you can launch a session via this button:

Related gist by me

snippets for dealing with FASTA - useful_FASTA_handling.py includes splitting

Related 'Binderized' Utilities

Collection of links to launchable Jupyter environments where various sequence analysis tools work WITHOUT ANY NEED FOR ADDITIONAL EFFORT/INSTALLS. Many of my recent scripts are built with use in these environments in mind:

(Many of these include/feature Biopython, too, such as but I haven't made a one all encompassing one yet for that since I use it a lot as an underlying library.)

patmatch-binder - launchable Jupyter sessions for running command line-based PatMatch in Jupyter environment provided via Binder (Perl and Python-based).
blast-binder - launchable Jupyter sessions for running command line-based BLAST+ in Jupyter environment provided via Binder.
Demonstration Jupyter Notebooks for My imporved version ofAdam Bessa's Fasta2Structure - Fasta2Structure-cli - To make it more convenient to use, I've modified the Fasta2Structure script to allow more ways to run it to produce improved_Fasta2Structure (a.k.a Fasta2Structure-cli). It will run on the command line if you supply arguments specifying files as input or fallback to running on the command line if Tkinter cannot connect to a graphical display. improved_Fasta2Structure (a.k.a Fasta2Structure-cli) - User-Friendly Tool for Converting Multiple Aligned FASTA Files to STRUCTURE Format, that is even more user-friendly because it doesn't need a user to select files in a GUI (Tkinter-based) and can thus run well anywhere, such as on a computer cluster or in Jupyter running remotely or in conjunction with software to make pipelines like Snakemake & NextFlow. For those reasons, the improved script is more user-friendly for those familiar with computation and allows scaling up.
InterMine-binder - Intermine Web Services available in a Jupyter environment running via the Binder service. (See the guide to getting started with using Intermine sites and Jupyter using MyBinder-served Jupyter notebooks.)
mcscan-binder - MCscan software available in a launchable Jupyter environment running via the Binder service (Python 2-based), with an example workflow and some other use examples.
mcscan-blast-binder - MCscan and BLAST+ command line software available in a launchable Jupyter environment running via the Binder service (Python 2-based).
synchro-binder - SynChro software available in a launchable Jupyter environment running via the Binder service with Quick start and some other illustrations of its use.
cl_sq_demo-binder - launchable, working Jupyter-based environment that has a collection of demonstrations of useful resources on command line (or useable in Jupyter sessions) for manipulating sequence files. (Note: THIS WAS STARTED AFTER SEVERAL OTHER DEMO NOTEBOOKS (many meant to be static) MADE FOR SEQUENCE SCRIPTs, and hopefully slowly those will be added to here as well to be available in active form.)
clausen_ribonucleotides binder - Analyze ribonucleotide incorporation data from Clausen et al. 2015 data using script plot_5prime_end_from_bedgraph.py.
circos-binder - Circos software available in a launchable Jupyter environment running via the Binder service with tutorials illustrating use (TBD)(Perl and Python-based).

Related resources by others

genomepy

"Install and use genomes & gene annotations the easy way!
genomepy is designed to provide a simple and straightforward way to download and use genomic data. This includes (1) searching available data, (2) showing the available metadata, (3) automatically downloading, preprocessing and matching data and (4) generating optional aligner indexes. All with sensible, yet controllable defaults. Currently, genomepy supports UCSC, Ensembl and NCBI." - Includes an S. cerevisiae example.

rna-tools (previously rna-pdb-tools): a toolbox to analyze sequences, structures and simulations of RNA
'seqrequester', a tool for summarizing, extracting, generating and modifying DNA sequences.. Perl-based.
SeqKit - a cross-platform and ultrafast toolkit for FASTA/Q file manipulation by shenwei356 has some excellent utilities for handling FASTA or FASTQ files. See here for subcommands listing. Documentation.
SeqFu - an easy to use toolkit for FASTA and FASTQ manipulation and inspection on the commandline. Available from Bioconda as ‘seqfu’.

"A general-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files. Includes functions to interleave and de-interleave FASTQ files, to rename sequences and to count and print statistics on sequence lengths. SeqFu is available for Linux and MacOS. - A compiled program delivering high performance analyses - Supports FASTA/FASTQ files, also Gzip compressed - A growing collection of handy utilities, also for quick inspection of the datasets." - Example uses Biopython to make a Pandas dataframe from FASTA sequences
https://bsky.app/profile/robert.bio/post/3lbxcdesrwc2t

"I made a little swiss army knife tool for quickly inspecting FASTA/FASTQ sequences, including GC content, codon analysis, reverse complement, and file format conversions:
https://42basepairs.com/tools/sequence-analysis"
""I also have calculators for exploring:
➡️ FASTQ base quality: https://42basepairs.com/tools/fastq-base-quality
➡️ SAM flags: https://42basepairs.com/tools/sam-flag "

Name		Name	Last commit message	Last commit date
Latest commit History 1,250 Commits
AdjustFASTA_or_FASTQ		AdjustFASTA_or_FASTQ
Adjust_Annotation		Adjust_Annotation
Adjust_lists		Adjust_lists
CompareFASTA_or_FASTQ		CompareFASTA_or_FASTQ
Compare_biological_seq_strings		Compare_biological_seq_strings
ConvertSeq		ConvertSeq
Extract_Conserved_from_Aligned		Extract_Conserved_from_Aligned
Extract_Details_or_Annotation		Extract_Details_or_Annotation
Extract_from_FASTA		Extract_from_FASTA
FindSequence		FindSequence
LookUpTaxon		LookUpTaxon
RetrieveSeq		RetrieveSeq
SOLiD		SOLiD
alignment-utilities		alignment-utilities
annotation-utilities		annotation-utilities
assess_coding_potential_and_changes		assess_coding_potential_and_changes
bendit_server-utilities		bendit_server-utilities
bendit_standalone-utilities		bendit_standalone-utilities
blast-utilities		blast-utilities
circos-utilities		circos-utilities
count_triplets		count_triplets
ena-utilities		ena-utilities
hhsuite3-utilities		hhsuite3-utilities
mcscan-utilities		mcscan-utilities
omega-presence		omega-presence
patmatch-utilities		patmatch-utilities
plot_expression_across_chromosomes		plot_expression_across_chromosomes
plot_nt_imbalance		plot_nt_imbalance
plot_read_data		plot_read_data
plot_sites		plot_sites
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sequencework

Related gist by me

Related 'Binderized' Utilities

Related resources by others

See also

About

Releases

Packages

Languages

fomightez/sequencework

Folders and files

Latest commit

History

Repository files navigation

sequencework

Related gist by me

Related 'Binderized' Utilities

Related resources by others

See also

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages