Introduction
The Rice Genome Research Program (RGP) has been pursuing the sequencing of the entire genome since 1998 in collaboration with the International Rice Genome Sequencing Project (IRGSP). A high-throughput sequence production during the last 4 years has resulted in accumulation of a huge amount of sequence data and annotation information. As of Jan 2004, we have sequenced and accumulated a total of 234 Mb of six rice chromosomes assigned to RGP [1]. These include about 110 Mb of the non-overlapping, finished sequences.
As we approach the post genome sequencing era, it is extremely essential that the accumulated information should be efficiently managed and integrated to facilitate map-based informatics. We have currently developed a Rice Annotation Database (RAD) in order to store and concisely view the rice genome sequence with relevant annotation data.
System architecture and main features
The primary concept of RAD is to provide contig-oriented administration of the sequenced PAC/BAC clones. RAD is a relational database, which facilitates storage, query and visualization of annotation information such as sequence data, predicted genes and homology analysis of each PAC/BAC clone. These information are managed in contig or chromosome level to provide a more general view of the data for map-based informatics.
The main features of RAD are as follows:
(1) web presentations of rice genome annotation information,
(2) gene search in the entire genome,
(3) statistical analysis of specific features of the sequence
(4) efficient management of annotation data.
Users can view rice genome annotation information with sequence data in a numerical and graphical web presentation form (Figure 1). Predicted genes are classified by the definitions of the IRGSP consortium and these information are linked to the public databases. A key word search function allows overall survey of specific genes throughout the genome. Statistical analyses of various features of the sequence such as the length of exon, intron, splice site sequence, base/codon usage, functional classifications to the predicted genes can be performed (Figure 2).
Figure 1. Web presentations of genome information at chromosome and contig levels.
Figure 2. Analysis of various features of the genome sequence.
Current progress and future works
The IRGSP has already completely annotated 4 rice chromosomes (1, 4, 7, 10 ). Annotation of finished PAC/BAC clones in other chromosomes have also been manually curated. These data have been incorporated in RAD at clone, contig and chromosome-levels. The entire rice genome is expected to be completely finished by the end of 2004. All finished clones can be accessed in RAD as they become available.
The RGP's annotation system is recently improved by incorporating the rice full-length cDNA data [2]. From chromosome 7 annotation data, RAD could display the updated data, based on the similarities with proteins, ESTs, and the full-length cDNAs (Fig. 1).
The sequence of chromosomes 1 [3], 4 [4], and 10 [5] have also been finished and manually annotated. These data are submitted in the public database as flat files. We have integrated these flat files for the completed chromosomes into RAD thereby providing a manual annotation database for IRGSP sequences.
Furthermore, we are adding the supplemental database functions such as gene ontology annotation of the obtained CDSs and homology search of the chromosome-level contigs as subjects. These functions allow correlating the sequence information with gene expression, gene ontology pathways, as well as genome-wide comparison with other organisms.
Acknowledgment
This research is funded by the MAFF grant SY-1102 (Developmenmt of the Rice Genome Simulators ). Also, the genome sequence and relevant annotation data is used from the MAFF project (The Rice Genome Project) GS-1201,and GS-1302, respectively. We are indepted to their supports.
References
[1] http://rgp.dna.affrc.go.jp/cgi-bin/statusdb/status.pl
[2] The Genome Sequence and Structure of Rice Chromosome 1. nature, 420: 312-316 (2002)
[3] The Rice full-length cDNA Consortium (2003) Collection, mapping, and annotation of over 28000 cDNA clones from japonica rice. Science: 301, 376-379.
[4] Sequence and analysis of rice chromosome 4. Nature, 420: 316-320 (2002)
[5] The Rice Chromosome 10 Sequencing Consortium 2003. In-depth view of structure, activity, and evolution of rice chromosome 10. Science 300:1566-1569 .
Our members are
(1)Institute of the Society for Techno-innocation and Agriculture, Forestry, and Fisheries (Y.I.Ito, Y.Mukai, N.Namiki, M.Shibata, M.Yamamoto, Y.K.Ito, M.H.Tsugane, S.Hosokawa, M.Hamada )
(2)National Institute of Agrobiological Sciences (B.A.Antonio, T.Matsumoto, T.Sasaki)
(3)Mitsubishi Space Software Co.ltd.(K.Sakata, Y.Sakai, J.Yokoyama)
(4)Hitach Science Systems Co.ltd.(K.Arikawa)
|