------------------------------------------------------------------------ --README --This is the README file for ftp://ncbi.nlm.nih.gov/refseq/H_sapiens/RefSeqGene --Last updated March 28, 2016 ------------------------------------------------------------------------ See also: http://www.ncbi.nlm.nih.gov/refseq/rsg/ Contains information about the RefSeqGene project and LRG http://www.ncbi.nlm.nih.gov/clinvar/docs/submit/ Instructions about submitting to ClinVar. Identification of variants based on RefSeqGene/LRG are encouraged. ftp://ftp.ncbi.nih.gov/gene/DATA/gene2refseq.gz A comprehensive list of RefSeq accessions per GeneID, for all taxa. Please note RefSeqGene is not compehensive, but is a specialized product, specific to human genes of medical interest. Please use the file in Gene's ftp site if you want comprehensive data. Contact us: e-mail: rsgene@ncbi.nlm.nih.gov ------------------------------------------------------------------------ Summary of updates to this directory (latest at the top) March 29, 2016 Added link to the comprehensive file of gene/RefSeq relationships provided from Gene's FTP site to the section above. August 26, 2014 Renamed the file to README Generated an archive section to indicate removal of submission files Added hints about submitting to ClinVar. September 30, 2013 Added a file reporting the placement of RefSeqGenes to an assembly in gff3 format. The file is named by accession and version assigned to the assembly. For more information about assemblies, please refer to http://www.ncbi.nlm.nih.gov/assembly, for example http://www.ncbi.nlm.nih.gov/assembly/?term=GCF_000001405 March 17, 2011 Reorganization of reporting of LRG and RefSeqGene data Removed RefSeqGene_standards to incorporate into LRG_RefSeqGene January 24, 2011 Explanation of the splitting of the .fna and .gbff files January 17, 2011 Added two files LRG_RefSeqGene RefSeqGene_standards June 29, 2010: Created this README Added files to explain how to submit variation data for the LRG/RefSeqGene collaboration -------------------------------------------------- Contents -------------------------------------------------- A. RefSeqGene sequence files ---------------------------------------- These files are updated daily. ---------------------------------------- Beginning in January of 2011, files in each of these formats are provided in non-overlapping subsets, differentiated by a numeral, so large file size would not be a barrier to downloads. refseqgene[numeral].genomic.gbff.gz: RefSeqGene records in GenBank format refseqgene[numeral].genomic/fna.gz RefSeqGene records in fasta format For example: refseqgene1.genomic.fna.gz refseqgene1.genomic.gbff.gz refseqgene2.genomic.fna.gz refseqgene2.genomic.gbff.gz -------------------------------------------------- B. RefSeqGene/LRG reports Updated daily -------------------------------------------------- 1. LRG_RefSeqGene Tab-delimited file reporting, for each Gene, the accession.version of the genomic and RNA and protein RefSeqs the RefSeqGene/LRG project treats as reference standards. The columns are: NCBI taxonomy id (all 9606) GeneID Symbol of the gene (official from HGNC when available) Accession.version of the standard RefSeq Term describing the RefSeq RefSeqGene genomic sequence Ref Std, nucleotide RNA sequence Ref Std, protein protein sequence The LRG equivalent of the RefSeq standard LRG: genomic sequence t1 locations for transcript 1 p1 CDS from transcript 1 NOTE: t values can be > 1 , the integer assigned to t is matched by the integer assigned to p 2. Aligned2RefSeqGene The RefSeqGene records include alignments of previous versions of RefSeq RNAs and other reference standards that may have been used for a gene. For example, the RefSeqGene for BRCA1 includes the alignment of U14680.1, long used as a reference standard for BRCA1. 3. gene_RefSeqGene This is a subset of LRG_RefSeqGene, limited to the RefSeqGene accessions for a gene. The columns are: NCBI taxonomy id (all 9606) GeneID Symbol of the gene (official from HGNC when available) Accession.version of the RefSeqGene 4. GCF_000001405.[version]_refseqgene_alignments.gff3 This file is updated weekly. It reports information about alignments of current RefSeqGene accessions to the top-level sequences in the version of the assembly indicated in the file name. Only the latest alignments will be shown. When the version or patch release of the reference GRC assembly changes, a new file will be created because the version of the assembly accession will change. More about the gff3 standard: http://www.sequenceontology.org/gff3.shtml Alignments to both GRCh37 and GRCh38 will be mainained indefinitely. Please note that a RefSeqGene is reported by its accession.version, e.g. Target=NG_008724.1. Alignment data may be reported relative to multiple top level sequences in the assembly, e.g. both an assembled chromosome and an alternative locus or patch. The first column is the accession and version of the top-level sequence. If that accession starts with an NC, that sequence is an assembled chromosome. [Hint, for human the RefSeq sequence accessions match the chromosome number, i.e. NC_000001 is for chromosome , 23 for X and 24 for Y.] LocalWords: LRG CDS RefSeqGene RefSeq RNAs -------------------------------------------------- C. Presentations ---------------------------------------- A directory of past presentations about RefSeqGene. For more citations, see also documentation on RefSeqGEne's web site. ============================================================ ARCHIVES ============================================================ -------------------------------------------------- submission related -------------------------------------------------- The RefSeqGene/LRG collaboration generated files named submission_template.xls SubmittingVariationHelp.pdf to support submitting information about variation based on the RefSeqGene/LRG standard. With the implementation of ClinVar, such submissions should be provided directly to ClinVar. -------------------------------------------------- Reports of sequence accessions -------------------------------------------------- RefSeqGene_standards: deprecated, all data now in LRG_RefSeqGene