Insights into the SARS-CoV-2 Genome, Transcriptome, and Epitranscriptome
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was first recognized in the beginning of 2020 and is responsible for the present COVID-19 pandemic. Continued research in elucidating the dynamics of the SARS-CoV-2 life cycle is essential to facilitate the design and development of novel diagnostics and antiviral therapies. In a recent investigation highlighted below, results from high throughput sequencing studies shed light on novel aspects of the SARS-CoV-2 genome, transcriptome, and epitranscriptome and their impact on viral life cycle management.
The SARS-CoV-2 Genome
SARS-CoV-2 consists of a positive-sense single-stranded RNA genome, spanning 29,903 nucleotides in length, and 4 different types of structural proteins: N, S, E, and M. The N, or nucleocapsid, protein encapsidates the genome, while the S (spike), E (envelope), and M (membrane) proteins comprise the surrounding lipid bilayer envelope. Of particular appeal is the S protein, which enables viral infection via ACE-2 receptor recognition and membrane fusion, making this structural protein and its host cell receptor ideal targets for therapeutic intervention. The genome of positive-strand RNA viruses like CoVs can act as mRNA and be directly translated into protein within their host cells. Negative-strand RNA intermediates are also produced by CoVs that serve as templates for: positive-strand synthesis of genomic RNA, which is then packaged by the structural proteins to assemble virion offspring; and subgenomic RNA transcripts (discussed in the next section).
Schematic presentation of the SARS-CoV-2 genome organization, the canonical subgenomic mRNAs, and the virion structure. Taken from Kim et al., 2020.
The “body” sequence of the SARS-CoV-2 genome is flanked by a 72 nucleotide-long “leader” sequence at the 5′ end and a poly(A) tail at the 3′ end. Several open reading frames (ORFs) have been identified corresponding to viral structural elements (S, E, M, and N proteins) and accessory genes (ORF 1a, 1b, 3a, 6, 7a, 7b, 8, and 10). The 5′-most ORF, ORF 1a/1b, is reported in CoVs to encode polymerases for viral RNA synthesis and other nonstructural proteins (nsps). Known as the replicase/transcriptase gene, ORF 1a/1b is translated by way of ribosomal frameshifting into polypeptide 1a (440-500 kDa) and 1b (740-810 kDa), which are subsequently cleaved into 11 and 16 nsps, respectively. Among the notable nsps encoded by ORF 1a/1b include: nsp3 and nsp5, viral proteases that mediate polypeptide 1a and 1b cleavage; nsp8, which displays adenylyltransferase activity and is involved in poly(A) tailing; and nsp12, a protein with RNA-dependent RNA polymerase activity for viral genome replication and transcription. Whether or not these nsp translation products and their biological functions are conserved in the SARS-CoV-2 coronavirus species has yet to be fully determined.
Scientists and medical practitioners: Learn about the SeroFlash rapid test for SARS-CoV-2 antibodies.
The SARS-CoV-2 Transcriptome
Immediately adjacent to each SARS-CoV-2 ORF 5′ end are short motifs known as transcription-regulating sequences (TRSs). Current evidence suggests that TRSs in CoVs represent signals mediating the discontinuous transcription of subgenomic viral mRNAs (sgRNAs). These sequences may also guide the production of the 9 major SARS-CoV-2 sgRNAs characterized thus far. Each SARS-CoV-2 sgRNA contains a 5′ leader and 3′ poly(A) tail, and their relative abundance is as follows: N > S > 7a > 3a > 8 > M > E > 6 > 7b RNA (expression of ORF 10 transcripts has not been definitively confirmed). Total RNA extracts harvested from SARS-CoV-2-infected cells in vitro revealed that viral transcripts dominate the transcriptome, indicating the strong suppression of host gene expression.
In addition to these canonical sgRNAs, fusion transcripts have been detected as a result of noncanonical recombination events, of which there are 3 main types:
TRS-L-dependent noncanonical recombination. The leader TRS (TRS-L) fuses to the body at unexpected 3′ sites within ORFs or untranslated regions.
TRS-L-independent distant recombination. Long-distance (> 5,000 nucleotide) fusion between sequences that do not involve the leader.
TRS-L-independent local recombination. Yields smaller deletions, mostly in structural and accessory genes.
For other CoVs, transcripts with partial sequences have been observed to behave like “parasites”, competing for viral proteins. Although the exact effects of noncanonical recombination on the life cycle and evolution of the SARS-CoV-2 strain are still unknown, the resulting transcripts do exhibit protein translation potential. For example, many SARS-CoV-2 transcripts formed by way of TRS-L-independent distant recombination encode the upstream part of ORF1a, including nsp1, nsp2, and truncated nsp3. The extent of both canonical and noncanonical SARS-CoV-2 sgRNA translation and the bioactivities of translated products require further examination.
As previously mentioned, both the SARS-CoV-2 genome and its sgRNA transcripts show poly(A) tailing. The genomic RNA carries a poly(A) tail with a median length of 47 nucleotides. The sgRNAs display 2 distinct tail populations: a major peak at around 45 nucleotides; and a minor peak at approximately 30 nucleotides, which may be indicative of aged RNAs that have undergone decay. It has been purported that viral nsps bearing adenylyltransferase activity like nsp8 might regulate CoV RNA tailing as a countermeasure against host deadenylases. Such regulation is likely critical for viral end replication and could apply to SARS-CoV-2 as well.
The SARS-CoV-2 Epitranscriptome
Epigenetic modifications are known to play important roles in the life cycles of RNA viruses like human CoV. Modified adenosines, for instance, such as m6A, m6Am, and 2'-O-me are reported to affect the viability of specific RNA viruses by modulating viral cap structures, viral replication, innate sensing pathways, and the innate immune response. Furthermore, members of the CoVs encode their own methyltransferases for self-methylating adenosine residues and promoting immune evasion.Thus, the SARS-CoV-2 epitranscriptome may present another suitable target of remedial treatment.
At least 41 prospective RNA modification sites have been located on SARS-CoV-2 transcripts. Among these sites, the AAGAA sequence is the most commonly observed motif. Long viral transcripts (S, 3a, E, and M) and genomic RNA are more frequently modified than shorter RNAs (6, 7a, 7b, 8, and N), and the modification frequency at certain sites is sgRNA species-dependent. Interestingly, the modified RNAs have shorter poly(A) tails than unmodified transcripts. Since the relationship between tailing and RNA turnover is well established, it is possible that these modifications may have an effect on the stability of SARS-CoV-2 RNAs. Additional research is needed to ascertain the particular kinds of modifications (e.g., m6A, 5-mC), their defined roles, and the associated modifying enzymes (e.g., methyltransferases, demethylases).