AgamP3
As described in Holt et al. (2002), plasmid and BAC DNA libraries were constructed with stringently size-selected PEST strain DNA. Two BAC libraries were constructed, one (ND-TAM) using DNA from whole adult male and female mosquitoes and the other (ND-1) using DNA from ovaries of PEST females collected about 24 hours after the blood meal. Plasmid libraries containing inserts of 2.5, 10 and 50 kb were constructed with DNA derived from either 330 male or 430 female mosquitoes. For each sex, several libraries of each insert size class were made, and these were sequenced such that there was approximately equal coverage from male and female mosquitoes in the final data set. Celera Genomics, Genoscope and TIGR contributed sequence data that collectively provided 10.2-fold coverage, assuming a genome size of 278 Mb. The whole-genome data set was assembled with the Celera assembler (MOZ1 assembly), which constituted the basis of the primary genome publication (Holt et al. 2002).
The first update to this assembly (MOZ2) involved the results of a concerted effort to correct some of the ambiguities in scaffold map locations and orientations by manual analysis of the archived BAC chromosome hybridization photographs and by the hybridization of a small number of new BAC clones selected to resolve questions of scaffold orientation. The new AGP file, and early draft of which was first displayed on the A. gambiae genome poster published in the 4 October 2002 issue of science, formed the basis of a new annotation and gene build displayed on 1 October 2003 (MOZ2) (Mongin et al. 2004). This assembly is also 278 Mb.
In 2006, the major scaffolds were re-ordered into a new golden path file by use of additional physically mapped BAC clones combined with scaffold-to-scaffold sequence comparisons that identified some sequence overlaps. The AgamP3 assembly has a total of 80 scaffolds assigned to and ordered on the chromosome arms X, 2R, 2L, 3R and 3L, 28 of which are newly mapped or oriented. The most significant improvement in this new assembly is 24 scaffolds (8.64 Mbp) located to pericentromeric regions. However, this does not complete the centromeric region of any of the chromosomes. The new GenBank entries, CM000356-CM000360, reflect the revised 2L, 2R, 3L, 3R and X chromosome assemblies. This new assembly (AgamP3) of non-redundant ~264 Mb is still probably an overestimation of the true genome size (Sharakhova et al. 2007).
Additional notes compiled on known assembly issues from the initial VectorBase project can be found here.
New in situ Scaffold Mappings |
|
Identification of Overlapping Scaffold Ends |
Using Exonerate and Dotter, adjacent scaffolds who's ends contain overlapping regions have been identified through visual inspection. For a stretch of sequence covered by two scaffolds, we have taken one the overlapping regions and selected it for use with our updated A. gambiae Golden Path. The overlapping sequence from the other scaffold will be still be associated with the same region on the chromosome, except that it will be listed as a haplotype region instead of part of the golden path.
Using these techniques, 18 major overlaps have been identified between scaffolds mapped to chromosome arms. Based on these overlaps, approximately 3.5Mb of overlapping sequence has been removed from the current Golden Path and reclassified as haplotype region. |
Y Chromosome Scaffold Identification |
The scaffolds containing Y chromosome-specific satellite DNA families were identified by in silico searches. Initially, each male-only scaffold was screened for the possible presence of satellite DNA using Tandem Repeat Finder software. Subsequently, the consensus sequence of each identified tandem repeat family was used as a query in BLASTN searches against a database made of scaffold sequences derived exclusively from male libraries and a database containing all scaffolds constituting An. gambiae genome. The satellite DNA queries, that returned the same number of hits in both databases, were regarded as potentially Y-linked and in each case their Y-linkage was experimentally confirmed using PCR and Southern blot techniques. All scaffolds harboring the Y-specific satellite sequences were treated as originating from the Y chromosome.
One scaffold (AAAB01008227), containing more complex sequences, was serendipitously discovered in a TBLASTN search of the unmapped scaffold set using as queries GenBank-derived sequences of sex-determining or male fertility-related proteins. Although sequence similarity of the scaffold to the query (GenBank accession no. B21124) was limited to low-complexity (microsatellite) region, implying lack of homology between the query and the subject, PCR experiments confirmed Y-linkage of that scaffold.
|
Bacterial Scaffold Identification |
678 unmapped scaffolds in the current A. gambiae assembly have been identified as bacterial contaminants. All unmapped scaffolds were used as query for a BLAST against NCBI's nr protein database. Based on results, a scaffold was classified as bacterial contaminant if it met the following criteria:
|
Gene sets
25 Feb 2014 |
29 Oct 2012 |
1 Dec 2010 |
1 Sep 2009 |
7 Jan 2007 |