A wrapper function reads RNA-seq related datasets from TCGA and GTEx.

initialize_RNAseq_data()

Details

Its side effects is the global variable TCGA_GTEX_RNAseq_sampletype, which was merged from two internal data frames:

(1) .TCGA_GTEX_RNAseq: the recomputed RNAseq data from both TCGA and GTEx generated by .get_TCGA_GTEX_RNAseq(), which imports the dataset TcgaTargetGtex_RSEM_Hugo_norm_count.

(2) .TCGA_GTEX_sampletype annotates the feature for each sample from TCGA and GTEx. The data frame imports the TcgaTargetGTEX_phenotype.txt dataset and performed basic data cleaning steps including removal of duplicates and NAs.

To reduce the data size, we only select the following four relevant columns out of TcgaTargetGTEX_phenotype.txt to construct .TCGA_GTEX_sampletype.

  • sample.type column that annotates malignant of normal tissues

  • primary.disease column that annotates cancer types for each sample

  • primary.site column that annotates the tissue types

  • study column that annotates the cohort “TCGA” or “GTEx”

TCGA_GTEX_RNAseq_sampletype was stored as TCGA_GTEX_RNAseq_sampletype.csv in ~/Documents/EIF_output/ProcessedData folder.

Examples

if (FALSE) {
initialize_RNAseq_data()
}