Skip directly to site content Skip directly to page options Skip directly to A-Z link Skip directly to A-Z link Skip directly to A-Z link
Volume 27, Number 11—November 2021
Research Letter

Tracing the Origin, Spread, and Molecular Evolution of Zika Virus in Puerto Rico, 2016–2017

Gilberto A. Santiago1, Chaney C. Kalinich1, Fabiola Cruz-López, Glenda L. González, Betzabel Flores, Aaron Hentoff, Keyla N. Charriez, Joseph R. Fauver, Laura E. Adams, Tyler M. Sharp, Allison Black, Trevor Bedford, Esther Ellis, Brett Ellis, Steve H. Waterman, Gabriela Paz-Bailey, Nathan D. Grubaugh2, and Jorge L. Muñoz-Jordán2
Author affiliations: Centers for Disease Control and Prevention, San Juan, Puerto Rico, USA (G.A. Santiago, F. Cruz-López, G.L. González, B. Flores, K.N. Charriez, L.E. Adams, T.M. Sharp, G. Paz-Bailey, J.L. Muñoz-Jordán); Yale School of Public Health, New Haven, Connecticut, USA (C.C. Kalinich, A. Hentoff, J.R. Fauver, N.D. Grubaugh); US Public Health Service, Rockville, Maryland, USA (L.E. Adams, T.M. Sharp); Fred Hutchinson Cancer Research Center, Seattle, Washington, USA (A. Black, T. Bedford); US Virgin Islands Department of Health, Charlotte Amalie, St. Thomas, Virgin Islands, USA (E. Ellis, B. Ellis)

Cite This Article


We reconstructed the 2016–2017 Zika virus epidemic in Puerto Rico by using complete genomes to uncover the epidemic’s origin, spread, and evolutionary dynamics. Our study revealed that the epidemic was propelled by multiple introductions that spread across the island, intricate evolutionary patterns, and ≈10 months of cryptic transmission.

Puerto Rico reported the first confirmed case of Zika virus (ZIKV) disease in November 2015 and subsequently experienced epidemic transmission that peaked by mid-August 2016 (1). Despite the large number of confirmed cases detected by traditional surveillance, the origin, spread, and evolutionary dynamics of this epidemic remain undetermined. We sought to reconstruct the epidemic transmission period by using a genomic epidemiology approach and determine evolution of the virus in the island.

To investigate the emergence and subsequent epidemic of ZIKV in Puerto Rico, we generated 83 complete genomes (2,3) directly from PCR-positive serum samples (4) (Appendix) collected from the 8 health regions of Puerto Rico during March 2016–January 2017, congruent to a geotemporal representation of the epidemic in the island. We then performed phylogenetic analysis with an additional 233 published genomes from GenBank that represent the emergence and spread of ZIKV in the Americas during 2015–2017. The resulting reconstructed phylogeny was consistent with published tree topologies, nucleotide substitution rate ranges, and divergence patterns observed elsewhere for the entirety of the Americas (Appendix Figure 1, panel A), providing a pragmatic context to the proposed model of spread and divergence of ZIKV in Puerto Rico (5). At least 8 separate foreign-introduction events were captured within the ancestry of the viruses sequenced, including 2 that expanded into autochthonous lineages and 6 separate introduction events represented by individual sequences associated with genomes from the United States, the Caribbean, South America, and Central America, thus suggesting limited spread.

In addition, we analyzed the temporal molecular evolutionary signal in our dataset by reconstructing time-calibrated phylogenies by using genomes annotated with date of sample collection based on year, month, and days for temporal precision. The correlation between date of sample collection and root-to-tip genetic distance supported the heterochronous nature of our dataset. The estimated divergence from the root (i.e., time of most recent common ancestor [tMRCA] of this tree) occurred in February 2013 (because 2013–2014 ZIKV genomes from French Polynesia were used as the root), and the within-epidemic evolutionary rate was 1.09 × 10−3 substitutions/site/year (Appendix Figure 1, panel B).


Intra-island spread and divergence of Zika virus, Puerto Rico, 2016–2017. Bayesian phylogenetic reconstruction using maximum clade credibility trees shows genomes grouping with 2 separate clusters. PR C1 is associated with genomes from South America and the Caribbean (top); this clade diverged into SC1 and SC2. PR C2 is associated with genomes from Central America (center). Epidemic curve of total Zika cases per week (orange shade) and cases confirmed by reverse transcription PCR per week (blue shade) during 2015–2017 (bottom). All external branches representing Puerto Rico genomes are color-coded according to the 8 health regions of Puerto Rico: region 1, red; region 2, blue; region 3, orange; region 4, green; region 5, purple; region 6, cyan; region 7, brown; and region 8, magenta. C, clade; PR, Puerto Rico; SC, subclade; tMRCA, time of most recent common ancestor.

Figure. Intra-island spread and divergence of Zika virus, Puerto Rico, 2016–2017. Bayesian phylogenetic reconstruction using maximum clade credibility trees shows genomes grouping with 2 separate clusters. PR C1 is associated with...

Bayesian reconstruction of Puerto Rico clade 1 (PR C1) presents the largest autochthonous monophyletic cluster that originated from viruses from South America and the Caribbean, including Brazil, Suriname, French Guyana, the US Virgin Islands, and Dominican Republic (Figure). tMRCA estimates place the divergence of PR C1 in mid-June 2015 (95% highest posterior density [HPD] February 2015–October 2015) and a within-outbreak evolutionary rate of 1.61 × 10−3 (95% HPD 1.13–2.10 × 10−3) substitutions/site/year. In addition, PR C1 was observed to diverge further into 2 subclades (SC1 and SC2) spreading across the island. The second clade, Puerto Rico clade 2 (PR C2), presents a smaller autochthonous monophyletic cluster that originated from viruses in Central America, including Nicaragua and Honduras (Figure). Our tMRCA estimates placed the emergence of PR C2 in February 2016 (95% HPD October 2015–April 2016) and its evolutionary rate was similar to PR C1 at 1.87 × 10−3 (95% HPD 1.1–2.64 × 10−3). We compared the ZIKV epidemic history of Puerto Rico to the time-calibrated Bayesian phylogenies and observed that the tMRCA of PR C1 precedes the initial confirmation of ZIKV in the island through traditional surveillance methods by 3–10 months and that expansion of all PR lineages coincides with the peak of the epidemic curve (Figure). We assessed phylogenetic clustering patterns for geographic association with each of the health regions and detected none (Appendix Figure 2).

We inferred past viral population dynamics by using Bayesian Skygrid plots, which show an increase in genomic diversity that coincides in time with the emergence of ZIKV in the Americas, followed by a series of fluctuations in the effective population size, characteristic of the virus spreading rapidly through the region (Appendix Figure 3). In Puerto Rico, we observed a similar sharp increase upon emergence and subsequent patterns that mirror the trends observed in the Americas.

Our study revealed the origin and epidemic spread of ZIKV in the island after a period of cryptic transmission undetected by traditional surveillance. Similar cryptic transmission was reported in Brazil and Colombia (68), where case detection was hindered by the difficulty to capture asymptomatic or mild cases with clinical manifestations that overlap endemic arboviruses and other laboratory testing limitations particular to ZIKV (9). The dataset we generated in our study presents a relevant contribution to the geotemporal sampling of ZIKV genomes from the region, enabling the study the evolutionary and epidemic dynamics in the Americas.

The integration of genomic epidemiology to arbovirus surveillance has proven to be central to the ascertainment of disease epidemiology, uncovering information otherwise concealed by the nature of the disease and limitations of surveillance systems. Fundamentally, integrated proactive genomic surveillance may help us to predict virus emergence and mitigate more effectively their regional or global expansion.

Dr. Santiago is a lead research microbiologist at the Centers for Disease Control and Prevention in San Juan, Puerto Rico. His research is focused on the development of molecular diagnostic tests and genomic epidemiology of dengue virus and severe acute respiratory syndrome coronavirus 2.



We thank the collaborators from the Ponce Medical School Foundation, Inc. (grant no. U01CK000580), the Puerto Rico Health Department, and members of the Puerto Rico Zika Task Force for the valuable contributions to the enhanced surveillance during the Zika outbreak in 2016.

This project was partially funded by the Centers for Disease Control and Prevention’s Advanced Molecular Detection Program and the Yale University’s School of Public Health start-up package provided to N.D.G. Additional support for coauthors C.K. and A.H. was provided by the Yale University’s Jackson Institute of Global Health Field Experience Award and the Yale Collaborative Action Fellowship.



  1. Sharp  TM, Quandelacy  TM, Adams  LE, Aponte  JT, Lozier  MJ, Ryff  K, et al. Epidemiologic and spatiotemporal trends of Zika Virus disease during the 2016 epidemic in Puerto Rico. PLoS Negl Trop Dis. 2020;14:e0008532. DOIPubMedGoogle Scholar
  2. Quick  J, Grubaugh  ND, Pullan  ST, Claro  IM, Smith  AD, Gangavarapu  K, et al. Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nat Protoc. 2017;12:126176. DOIPubMedGoogle Scholar
  3. Grubaugh  ND, Gangavarapu  K, Quick  J, Matteson  NL, De Jesus  JG, Main  BJ, et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 2019;20:8. DOIPubMedGoogle Scholar
  4. Santiago  GA, Vázquez  J, Courtney  S, Matías  KY, Andersen  LE, Colón  C, et al. Performance of the Trioplex real-time RT-PCR assay for detection of Zika, dengue, and chikungunya viruses. Nat Commun. 2018;9:1391. DOIPubMedGoogle Scholar
  5. Metsky  HC, Matranga  CB, Wohl  S, Schaffner  SF, Freije  CA, Winnicki  SM, et al. Zika virus evolution and spread in the Americas. Nature. 2017;546:4115. DOIPubMedGoogle Scholar
  6. Faria  NR, Quick  J, Claro  IM, Thézé  J, de Jesus  JG, Giovanetti  M, et al. Establishment and cryptic transmission of Zika virus in Brazil and the Americas. Nature. 2017;546:40610. DOIPubMedGoogle Scholar
  7. Black  A, Moncla  LH, Laiton-Donato  K, Potter  B, Pardo  L, Rico  A, et al. Genomic epidemiology supports multiple introductions and cryptic transmission of Zika virus in Colombia. BMC Infect Dis. 2019;19:963. DOIPubMedGoogle Scholar
  8. Grubaugh  ND, Saraf  S, Gangavarapu  K, Watts  A, Tan  AL, Oidtman  RJ, et al.; GeoSentinel Surveillance Network. GeoSentinel Surveillance Network. Travel surveillance and genomics uncover a hidden Zika outbreak during the waning epidemic. Cell. 2019;178:10571071.e11. DOIPubMedGoogle Scholar
  9. Peters  R, Stevenson  M. Zika virus diagnosis: challenges and solutions. Clin Microbiol Infect. 2019;25:1426. DOIPubMedGoogle Scholar




Cite This Article

DOI: 10.3201/eid2711.211575

Original Publication Date: October 08, 2021

1These first authors contributed equally to this article.

2These senior authors contributed equally to this article.

Table of Contents – Volume 27, Number 11—November 2021

EID Search Options
presentation_01 Advanced Article Search – Search articles by author and/or keyword.
presentation_01 Articles by Country Search – Search articles by the topic country.
presentation_01 Article Type Search – Search articles by article type and issue.


Page created: September 16, 2021
Page updated: October 19, 2021
Page reviewed: October 19, 2021
The conclusions, findings, and opinions expressed by authors contributing to this journal do not necessarily reflect the official position of the U.S. Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions. Use of trade names is for identification only and does not imply endorsement by any of the groups named above.