GenomeProt is a comprehensive proteogenomic analysis tool used to identify:
The workflow consists of four steps:
Users can choose between short-read and long-read options depending on their data type.
The Shiny web server supports database generation using:
Users can also generate a variant-aware proteome database using an optional multisample VCF file generated from variant calls derived from the same samples.
By default, the app loads a demo dataset (BAM file input) to demonstrate processing and output files.
Files used to run the demo dataset can be downloaded below:
proteome_database.fasta: Multi-FASTA protein databaseproteome_database_metadata.txt: TSV file containing annotations for candidate protein sequencesproteome_database_transcripts.gtf: GTF file with transcript coordinates used to generate the proteome databaseResults from this step can be downloaded within the app. The test results file can be downloaded here.
Note: To generate a database directly from FASTQ files, or BAM files larger than 20 GB, use the command-line version.
Command-line instructions are available here.
This step must be completed outside the Shiny application. Using the database file (.fasta) generated in Step 1, analyse the proteomics data with your preferred search algorithm to identify proteins and peptides in your dataset.
This step maps peptides identified in Step 2 to spliced transcript coordinates.
proteome_database_metadata.txt and proteome_database_transcripts.gtfpeptide_data.tsv (must follow required format)proteome_database_metadata.txt, proteome_database_transcripts.gtfpeptide_data.tsv from here.
Output directory contents:
summary_report.html: Summary report of mapped peptides, transcripts, and ORFsreport_images/: Folder containing PDF versions of graphs from the summary reportpeptide_info.tsv: Detailed peptide mapping annotationscombined_annotations.gtf: GTF file with mapped peptides and transcript coordinatespeptides.bed12: BED12 file with mapped peptide coordinates for UCSC Genome Browser visualisationtranscripts.bed12: BED12 file with transcripts supported by peptide evidence for UCSC Genome Browser visualisationORFs.bed12: BED12 file with ORFs supported by peptide evidence for UCSC Genome Browser visualisationThe demo results files can be downloaded here.
This step visualises peptides on transcript and gene coordinates using IsoVis.
combined_annotations.gtf as the 'transcript data' file (max. 3 GB).
Visualisation output:
The IsoVis visualisation displays separate tracks for peptides, ORFs and transcripts.
Users can limit their analysis to specific peptides, ORFs or transcripts of interest
by hiding irrelevant parts of the visualization and
using the peptide and ORF stacks to highlight specific features.
Overlapping ORFs are shown using a hatching pattern. Coding regions are represented
as thick dark grey boxes. Peptides uniquely mapped to ORFs, transcripts, and genes
are indicated in orange, cyan, and blue, respectively. Multi-mapping peptides are dark grey.
For additional IsoVis details, refer to the documentation
here.