Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lineage information input in the command line usage of CEFCON #6

Open
LiuCanidk opened this issue Apr 16, 2024 · 7 comments
Open

Lineage information input in the command line usage of CEFCON #6

LiuCanidk opened this issue Apr 16, 2024 · 7 comments

Comments

@LiuCanidk
Copy link

Hi, @WPZgithub
I have a question on lineage information input in the command line usage of CEFCON. As I'm not familiar with the python, I prefer to use the command line tools for CEFCON. But if I input an expression matrix as a csv file, I realize I did not input the lineage information and found no other arguments if I specify the input_expData as the csv file purely.

I guess the information was included in the single cell object like the python package SCANPY AnnData object. But since the CEFCON offered the option to input the csv file, how can I input the lineage information with the csv file to construct the lineage specific GRN?

Many thanks if early reply can be received!

@LiuCanidk
Copy link
Author

Besides, I also would like to ask the information input of differential gene expression. Can you specify the input format, like what is row and column information? Is there any matching rules between the input lineage file, differential gene expression file and the expression matrix file?

@LiuCanidk
Copy link
Author

What‘s more, what is the input data format of expression matrix? raw count or normalized by seurat or scanpy? Can I input TPM or log(TP10K+1) data?

@WPZgithub
Copy link
Owner

I'm sorry for any inconvenience when using CEFCON. I've been taking the time lately to update some of the code as well as the previous readme file instructions to enhance ser-friendliness.

To briefly address your question, if you just use the command-line version, the input expression matrix is guaranteed to belong to a separate lineage. Thus, you may need to execute it individually for each lineage.
Regarding the formats for the expression matrix and differential expression data, please refer to the example files in the 'example_data' folder. The gene order in the differential expression data file does not necessarily need to match the order in the expression matrix.
For the data format of the expression matrix, in fact, as a deep learning-based method, the model can adaptively adjust itself according to the data. Therefore, the method does not impose limitations on the normalization technique employed for the data. However, I recommend using normalized data. In the CEFCON papers, we used log(TPM+1) for all the experiments.

Please let me know if any part requires further clarification or if you have additional queries.

@LiuCanidk
Copy link
Author

Thanks for your detailed reply. @WPZgithub
I have another question on the differential gene expression file. How can I get it? If I use the Seurat, is it just the result of FindAllMarker function and extract the specific genes for specific clusters (i.e., lineage)?

@WPZgithub
Copy link
Owner

I have provided the code script for obtaining differential expression information. Please refer to MAST_script.R, which uses the MAST method.
Any other method for obtaining differentially expressed genes is acceptable, as long as you provide scores for the biologically significant genes (I used abs(logFoldChange) in the CEFCON paper). Please note that separate gene differential expression scores must be provided for each lineage.
CEFCON can be run without providing differential expression information for genes, although I do not recommend it.

@LiuCanidk
Copy link
Author

Sorry, I still cannot understand the meaning of the input differential gene expression. If I input only one lineage as a csv file, what are the comparing pairs for me to calculate the foldchange for genes? And if you use the abs(logFoldChange), why does not the direction or the sign of foldchange matter? Should I calculate the pseudotime as the MAST script shown?

@LiuCanidk
Copy link
Author

And do I need to obtain the score (logFC) for all genes? or only need significant genes? or the differential genes needs to be the same as the expression matrix (that seems the case in the example data)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants