Structure and genome of HIV

Structure and genome of HIV

The genome and proteins of HIV (human immunodeficiency virus) have been the subject of extensive research since the discovery of the virus in 1983.[1][2] Each virion comprises a viral envelope and associated matrix enclosing a capsid, which itself encloses two copies of the single-stranded RNA genome and several enzymes. The discovery of the virus itself did not occur until two years after the first major cases of AIDS associated illnesses were reported in 1981.[3][4]


  • Structure 1
  • Genome organization 2
    • Viral structural proteins 2.1
    • Essential regulatory elements 2.2
    • Accessory regulatory proteins 2.3
  • RNA secondary structure 3
  • See also 4
  • References 5
  • External links 6


HIV is different in structure from other retroviruses. It is around 120 nm in diameter (around 60 times smaller than a red blood cell) and roughly spherical.

Diagram of HIV

HIV-1 is composed of two copies of noncovalently linked, unspliced, positive-sense single-stranded RNA enclosed by a conical capsid composed of the viral protein p24, typical of lentiviruses.[5][6] The RNA component is 9749 nucleotides long[7][8] and bears a 5’ cap (Gppp), a 3’ poly(A) tail, and many open reading frames (ORFs).[9] Viral structural proteins are encoded by long ORFs, whereas smaller ORFs encode regulators of the viral life cycle: attachment, membrane fusion, replication, and assembly.[9]

Structure of the immature HIV-1 capsid in intact virus particles

The single-strand RNA is tightly bound to p7 nucleocapsid proteins, late assembly protein p6, and enzymes essential to the development of the virion, such as reverse transcriptase and integrase. Lysine tRNA is the primer of the magnesium-dependent reverse transcriptase.[5] The nucleocapsid associates with the genomic RNA (one molecule per hexamer) and protects the RNA from digestion by nucleases. Also enclosed within the virion particle are Vif, Vpr, Nef, and viral protease. A matrix composed of an association of the viral protein p17 surrounds the capsid, ensuring the integrity of the virion particle. This is in turn surrounded by an envelope of host-cell origin. The envelope is formed when the capsid buds from the host cell, taking some of the host-cell membrane with it. The envelope includes the glycoproteins gp120 and gp41.

As a result of its role in virus-cell attachment, the structure of the virus envelope spike, consisting of gp120 and gp41, is of particular importance. Determining the envelope spike's structure will contribute to understanding the HIV replication cycle, and may help in the creation of a cure.[10] The first model of its structure was compiled in 2006 using cryo-electron tomography and suggested that each spike consists of a trimer of three gp120–gp41 heterodimers.[11] However, published shortly after was evidence for a single-stalk "mushroom" model, with a head consisting of a trimer gp120s and a gp41 stem, which appears as a compact structure with no obvious separation between the three monomers, anchoring it to the envelope.[12] There are various possibilities as to the source of this difference, as it is unlikely that the viruses imaged by the two groups were structurally different.[13] More recently, further evidence backing up the heterodimer trimer-based model has been found.[14]

Genome organization

Structure of the RNA genome of HIV-1

HIV has several major genes coding for structural proteins that are found in all retroviruses as well as several nonstructural ("accessory") genes unique to HIV. The HIV genome contains three major genes, 5'gag-pol-env-3', encoding major structural proteins as well as essential enzymes.[15] These are synthesized as polyproteins which produce proteins for virion interior, called Gag, group specific antigen; the viral enzymes (Pol, polymerase) or the glycoproteins of the virion env (envelope).[16] In addition to these, HIV encodes for proteins which have certain regulatory and auxiliary functions as well.[16] HIV-1 has two important regulatory elements: Tat and Rev and few important accessory proteins such as Nef, Vpr, Vif and Vpu which are not essential for replication in certain tissues.[17] The gag gene provides the basic physical infrastructure of the virus, and pol provides the basic mechanism by which retroviruses reproduce, while the others help HIV to enter the host cell and enhance its reproduction. Though they may be altered by mutation, all of these genes except tev exist in all known variants of HIV; see Genetic variability of HIV.

HIV employs a sophisticated system of differential RNA splicing to obtain nine different gene products from a less than 10kb genome.[18] HIV has a 9.2kb unspliced genomic transcript which encodes for gag and pol precursors; a singly spliced, 4.5 kb encoding for env, Vif, Vpr and Vpu and a multiply spliced, 2 kb mRNA encoding for Tat, Rev and Nef.[18]

Proteins encoded by the HIV genome
Class Gene name Primary protein products Processed protein products
Viral structural proteins gag Gag polyprotein MA, CA, SP1, NC, SP2, P6
pol Pol polyprotein RT, RNase H, IN, PR
env gp160 gp120, gp41
Essential regulatory elements tat Tat
rev Rev
Accessory regulatory proteins nef Nef
vpr Vpr
vif Vif
vpu Vpu

Viral structural proteins

The HIV capsid consists of roughly 200 copies of the p24 protein. The p24 structure is shown in two representations: cartoon (top) and isosurface (bottom)
  • gag (group-specific antigen) codes for the precursor gag polyprotein which is processed by viral protease during maturation to MA (matrix protein, p17); CA (capsid protein, p24); SP1 (spacer peptide 1, p2); NC (nucleocapsid protein, p7); SP2 (spacer peptide 2, p1) and P6 protein.[19]
  • pol codes for viral enzymes reverse transcriptase (RT) and RNase H, integrase (IN), and HIV protease (PR).[16] HIV protease is required to cleave the precursor Gag polyprotein to produce structural proteins, RT is required to transcribe DNA from RNA template, and IN is necessary to integrate the double-stranded viral DNA into the host genome.[15]
  • env (for "envelope") codes for gp160, which is cleaved by a host protease, furin, within the endoplasmic reticulum of the host cell. The post-translational processing produces a surface glycoprotein, gp120 or SU, which attaches to the CD4 receptors present on lymphocytes, and gp41 or TM, which embeds in the viral envelope to enable the virus to attach to and fuse with target cells.[15][19]

Essential regulatory elements

  • tat (HIV trans-activator) plays an important role in regulating the reverse transcription of viral genome RNA, ensuring efficient synthesis of viral mRNAs and regulating the release of virions from infected cells.[16] Tat is expressed as 72-amino acid one-exon Tat as well as the 86-101 amino-acid two-exon Tat, and plays an important role early in HIV infection. Tat (14-15kDa) binds to the bulged genomic RNA stem-loop secondary structure near the 5' LTR region forming the trans-activation response element (TAR).[5][16]
  • rev (regulator of expression of virion proteins): The Rev protein binds to the viral genome via an arginine-rich RNA-binding motif that also acts as a NLS (nuclear localization signals), required for the transport of Rev to the nucleus from cytosol during viral replication.[16] Rev recognizes a complex stem-loop structure of the mRNA env located in the intron separating coding exon of Tat and Rev, known as the HIV Rev response element (RRE).[5][16] Rev is important for the synthesis of major viral proteins and is hence essential for viral replication.

Accessory regulatory proteins

  • vpr (lentivirus protein R): Vpr is a virion-associated, nucleocytoplasmic shuttling regulatory protein.[16] It is believed to play an important role in replication of the virus, specifically, nuclear import of the preintegration complex. Vpr also appears to cause its host cells to arrest their cell cycle in the G2 phase. This arrest activates the host DNA repair machinery which may enable integration of the viral DNA.[5] HIV-2 and SIV encode an additional Vpr related protein called Vpx which functions in association with Vpr.[16]
  • vif - Vif is a highly conserved, 23 kDa phosphoprotein important for the infectivity of HIV-1 virions depending on the cell type.[5] HIV-1 has been found to require Vif to synthesize infectious viruses in lymphocytes, macrophages, and certain human cell lines. It does not appear to require Vif for the same process in HeLa cells or COS cells, among others.[16]
  • nef- Nef, negative factor, is a N-terminal myristoylated membrane-associated phosphoprotein. It is involved in multiple functions during the replication cycle of the virus. It is believed to play an important role in cell apoptosis and increase in virus infectivity.[16]
  • vpu (Virus protein U) - Vpu is specific to HIV-1. It is a class I oligomeric integral membrane phosphoprotein with numerous biological functions. Vpu is involved in CD4 degradation involving the ubiquitin proteasome pathway as well as in the successful release of virions from infected cells.[5][16]
  • tev: This gene is only present in a few HIV-1 isolates. It is a fusion of parts of the tat, env, and rev genes, and codes for a protein with some of the properties of tat, but little or none of the properties of rev.[20]

RNA secondary structure

HIV pol-1 stem loop
Predicted secondary structure of the HIV pol-1 stem loop
Symbol pol
Rfam RF01418
Other data
RNA type Cis-reg

Several conserved secondary structure elements have been identified within the HIV RNA genome. The 5'UTR structure consists of series of stem-loop structures connected by small linkers.[6] These stem-loops (5' to 3') include the trans-activation region (TAR) element, the 5' polyadenylation signal [poly(A)], the PBS, the DIS, the major SD and the ψ hairpin structure located within the 5' end of the genome and the HIV Rev response element (RRE) within the env gene.[6][21][22] Another RNA structure that has been identified is gag stem loop 3 (GSL3), thought to be involved in viral packaging.[23][24] RNA secondary structures have been proposed to affect the HIV life cycle by altering the function of HIV protease and reverse transcriptase, although not all elements identified have been assigned a function.

An RNA secondary structure determined by SHAPE analysis has shown to contain three stem loops and is located between the HIV protease and reverse transcriptase genes. This cis regulatory RNA has been shown to be conserved throughout the HIV family and is thought to influence the viral life cycle.[25]

The complete structure of an HIV-1 genome, extracted from infectious virions, has been solved to single-nucleotide resolution.[26]

See also


  1. ^ Barré-Sinoussi F, Chermann JC, Rey F, et al. (May 1983). "Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immune deficiency syndrome (AIDS)". Science 220 (4599): 868–71.  
  2. ^ Gallo RC, Sarin PS, Gelmann EP, et al. (May 1983). "Isolation of human T-cell leukemia virus in acquired immune deficiency syndrome (AIDS)". Science 220 (4599): 865–7.  
  3. ^ Centers for Disease Control and Prevention (1981-06-05). "Pneumocycstis Pneumonia – Los Angeles" (PDF). Morbidity and Mortality Weekly Report 30 (21): 250–2.  
  4. ^ Centers for Disease Control and Prevention (1981-07-04). "Kaposi's Sarcoma and Pneumocycstis Pneumonia Among Homosexual Men – New York City and California" (PDF). Morbidity and Mortality Weekly Report 30 (25): 305–8.  
  5. ^ a b c d e f g Montagnier, Luc. (1999) Human Immunodeficiency Viruses (Retroviridae). Encyclopedia of Virology (2nd Ed.) 763-774
  6. ^ a b c Lu, K; Heng, X; Summers, MF (2011). "Structural determinants and mechanism of HIV-1 genome packaging". Journal of molecular biology 410 (4): 609–33.  
  7. ^ Wain-Hobson S, Sonigo P, Danos O, et al. (1985). "Nucleotide sequence of the AIDS virus, LAV". Cell 40 (1): 9–17.  
  8. ^ Ratner L, Haseltine W, Patarca R, et al. (1985). "Complete nucleotide sequence of the AIDS virus, HTLV-III". Nature 313 (6000): 277–84.  
  9. ^ a b Castelli, Joann C. and Levy, Jay A. (2002) HIV (Human Immunodeficiency Virus). Encyclopedia of Cancer (2nd Ed.) 2:407--415
  10. ^ "'"3D structure of HIV is 'revealed. Health. BBC NEWS. 2006-01-24. Retrieved 2008-08-06. 
  11. ^ Zhu P, Liu J, Bess J Jr, et al. (2006). "Distribution and three-dimensional structure of AIDS virus envelope spikes". Nature 15 (7095): 817–8.  
  12. ^ Zanetti G, Briggs JAG, Grunewald K, et al. (2006). "Cryo-Electron Tomographic Structure of an Immunodeficiency Virus Envelope Complex In Situ". PLOS Pathogens 2 (8): e83.  
  13. ^ Sriram Subramaniam (2006). "The SIV Surface Spike Imaged by Electron Tomography: One Leg or Three?". PLOS Pathogens 2 (8): e91.  
  14. ^ Zhu P, Winkler H, Chertova E, et al. (2008). Farzan M, ed. "Cryoelectron Tomography of HIV-1 Envelope Spikes: Further Evidence for Tripod-Like Legs". PLOS Pathogens 4 (11): e1000203.  
  15. ^ a b c Mushahwar, Isa K. (2007) Human Immunodeficiency Viruses: Molecular Virology, pathogenesis, diagnosis and treatment. Perspectives in Medical Virology. 13:75-87
  16. ^ a b c d e f g h i j k l Votteler, J. and Schubert, U. (2008) Human Immunodeficiency Viruses: Molecular Biology. Encyclopedia of Virology. (3rd ed.) 517-525
  17. ^ Votteler, J. and Schubert, U. (2008) Human Immunodeficiency Viruses: Molecular Biology. Encyclopedia of Virology (3rd Ed) 517-525
  18. ^ a b Feinberg, Mark B and Greene, Warner C. (1992) Molecular Insights into human immunodeficiency virus type1 pathogenesis. Current Opinion in Immunology. 4:466-474.
  19. ^ a b King, Steven R. (1994) HIV: Virology and Mechanisms of disease. Annals of Emergency Medicine. 24:443-449
  20. ^ Benko, DM; Schwartz, S; Pavlakis, GN; Felber, BK (June 1990). "A novel human immunodeficiency virus type 1 protein, tev, shares sequences with tat, env, and rev proteins.". Journal of virology 64 (6): 2505–18.  
  21. ^ Berkhout B (January 1992). "Structural features in TAR RNA of human and simian immunodeficiency viruses: a phylogenetic analysis". Nucleic Acids Res. 20 (1): 27–31.  
  22. ^ Paillart JC, Skripkin E, Ehresmann B, Ehresmann C, Marquet R (February 2002). "In vitro evidence for a long range pseudoknot in the 5'-untranslated and matrix coding regions of HIV-1 genomic RNA". J. Biol. Chem. 277 (8): 5995–6004.  
  23. ^ Damgaard, CK; Andersen ES; Knudsen B; Gorodkin J; Kjems J (2004). "RNA interactions in the 5' region of the HIV-1 genome". J Mol Biol 336 (2): 369–379.  
  24. ^ Rong, L; Russell RS; Hu J; Laughrea M; Wainberg MA; Liang C (2003). "Deletion of stem-loop 3 is compensated by second-site mutations within the Gag protein of human immunodeficiency virus type 1". Virology 314 (1): 221–228.  
  25. ^ Wang Q, Barr I, Guo F, Lee C (December 2008). "Evidence of a novel RNA secondary structurein the coding region of HIV-1 pol gene". RNA 14 (12): 2478–88.  
  26. ^ Watts JM, Dang KK, Gorelick RJ, Leonard CW, Bess JW, Swanstrom R, Burch CL, Weeks KM (2009). "Architecture and Secondary Structure of an Entire HIV-1 RNA Genome". Nature 460 (7256): 711–6.  

External links

  • Hunt R. "HIV and AIDS". Human Immunodeficiency Virus and AIDS. University of South Carolina School of Medicine. Retrieved 2008-08-06. 
  • Rfam entry for HIV pol-1 stem loop
  • 3D model of the complete HIV1 virion
  • 3D visualization of HIV virions by cryoelectron tomography