ExPASy Home page |
Site Map | Search ExPASy | Contact us | Swiss-Prot |
![]() ![]() ![]() | UniProt Knowledgebase Swiss-Prot Protein Knowledgebase TrEMBL Protein Database Forthcoming changes Release 13.6 of 01-Jul-2008 |
Also read about recent changes, and recent and forthcoming changes for the XML version of the UniProt Knowledgebase.
Table of contents
Change of the protein description (DE line)
Changes in the FASTA header line
New OG (OrGanelle) line value: Chromatophore
Change of the comment line (CC) topic INTERACTION
| Change of the protein description (DE line) |
|---|
Not before: 22-Jul-2008
The UniProtKB description (DE) lines list protein names in a computer parsable format, but currently with a minimal amount of structure. In UniProtKB/Swiss-Prot the description starts with the recommended name of the protein and additional alternative names are indicated between parentheses. In UniProtKB/TrEMBL the description is derived directly from the underlying nucleotide entry and its accuracy relies on the information provided by the submitter of the nucleotide entry, unless it has been improved by automatic annotation procedures.
Consistent nomenclature is indispensable for communication, literature searching and entry retrieval. The protein names provided in the description lines of UniProtKB/Swiss-Prot are widely used by life scientists and often propagated during the annotation of new genomic sequences. For these reasons we intend to structure the UniProtKB DE lines more explicitly: We will introduce 3 categories, as well as several subcategories, of protein names:
Category Field Subcategory Field Cardinality Description RecName: 1 in UniProtKB/Swiss-Prot
0-1 in UniProtKB/TrEMBLThe name recommended by the UniProt consortium. Full= 1 The full name. Short= 0-n An abbreviation of the full name or an acronym. EC= 0-n An Enzyme Commission number. AltName: 0-n A synonym of the recommended name. Full= 0-1 The full name. Short= 0-n An abbreviation of the full name or an acronym. EC= 0-n An Enzyme Commission number. AltName: Allergen= 0-1 See allergen.txt. AltName: Biotech= 0-1 A name used in a biotechnological context. AltName: CD_antigen= 0-n See cdlist.txt. AltName: INN= 0-n The international nonproprietary name: A generic name for a pharmaceutical substance or active pharmaceutical ingredient that is globally recognized and is a public property. SubName: 0 in UniProtKB/Swiss-Prot
0-n in UniProtKB/TrEMBLA name provided by the submitter of the underlying nucleotide sequence. Full= 1 The full name. EC= 0-n An Enzyme Commission number.
Each name is shown on a separate line; lines may therefore exceed 75 characters.
A block of DE lines may further contain multiple Includes: and/or Contains: sections and a separate field Flags: to indicate whether the protein sequence is a precursor or a fragment:
Field Cardinality Value Includes: 0-n A block of protein names as described in the table above. Contains: 0-n A block of protein names as described in the table above. Flags: 0-1 Precursor and/or Fragment or Fragments
Examples:
P09919:Current format:
DE Granulocyte colony-stimulating factor precursor (G-CSF) (Pluripoietin) DE (Filgrastim) (Lenograstim).
New format:
DE RecName: Full=Granulocyte colony-stimulating factor; DE Short=G-CSF; DE AltName: Full=Pluripoietin; DE AltName: INN=Filgrastim; DE AltName: INN=Lenograstim; DE Flags: Precursor;Q10743:
Current format:
DE ADAM 10 precursor (EC 3.4.24.81) (A disintegrin and metalloproteinase DE domain 10) (Mammalian disintegrin-metalloprotease) (Kuzbanian protein DE homolog) (CD156c antigen) (Fragment).
New format:
DE RecName: Full=A disintegrin and metalloproteinase domain 10; DE Short=ADAM 10; DE EC=3.4.24.81; DE AltName: Full=Mammalian disintegrin-metalloprotease; DE AltName: Full=Kuzbanian protein homolog; DE AltName: CD_antigen=CD156c; DE Flags: Precursor; Fragment;Q07908:
Current format:
DE Arginine biosynthesis bifunctional protein argJ [Includes: Glutamate DE N-acetyltransferase (EC 2.3.1.35) (Ornithine acetyltransferase) DE (Ornithine transacetylase) (OATase); Amino-acid acetyltransferase DE (EC 2.3.1.1) (N-acetylglutamate synthase) (AGS)] [Contains: Arginine DE biosynthesis bifunctional protein argJ alpha chain; Arginine DE biosynthesis bifunctional protein argJ beta chain].
New format:
DE RecName: Full=Arginine biosynthesis bifunctional protein argJ; DE Includes: DE RecName: Full=Glutamate N-acetyltransferase; DE EC=2.3.1.35; DE AltName: Full=Ornithine acetyltransferase; DE Short=OATase; DE AltName: Full=Ornithine transacetylase; DE Includes: DE RecName: Full=Amino-acid acetyltransferase; DE EC=2.3.1.1; DE AltName: Full=N-acetylglutamate synthase; DE Short=AGS; DE Contains: DE RecName: Full=Arginine biosynthesis bifunctional protein argJ alpha chain; DE Contains: DE RecName: Full=Arginine biosynthesis bifunctional protein argJ beta chain;
| Changes in the FASTA header line |
|---|
Not before: 22-Jul-2008
The current UniProtKB FASTA headers are unfortunately incompatible with the -o option of the NCBI's program formatdb. We have been working with the NCBI to remedy this and changes are required on both sides. While future versions of formatdb will accept a database code for UniProtKB/TrEMBL, we will also have to modify our UniProtKB FASTA headers. For consistency reasons, we will also change the FASTA headers of the other UniProt databases.
>db|UniqueIdentifier|EntryName ProteinName OS=OrganismName[ GN=GeneName]PE=ProteinExistence SV=SequenceVersionWhere:
RecName field from release 14.0 on. For UniProtKB/TrEMBL
entries without a RecName field, the SubName field is used. The 'precursor'
attribute is excluded, 'Fragment' is included with the name if applicable.OrderedLocusName or ORFname,
the GN field is not listed.Examples:
>sp|Q8I6R7|ACN2_ACAGO Acanthoscurrin-2 (Fragment) OS=Acanthoscurria gomesiana GN=acantho2 PE=1 SV=1 >sp|P27748|ACOX_RALEH Acetoin catabolism protein X OS=Ralstonia eutropha (strain ATCC 17699 / H16 / DSM 428 / Stanier 337) GN=acoX PE=4 SV=2 >sp|P04224|HA22_MOUSE H-2 class II histocompatibility antigen, E-K alpha chain OS=Mus musculus PE=1 SV=1 >tr|A3SA23|A3SA23_9RHOB TonB dependent, hydroxamate-type ferrisiderophore, outer membrane receptor OS=Sulfitobacter sp. EE-36 GN=EE36_08023 PE=3 SV=1 >tr|Q8N2H2|Q8N2H2_HUMAN CDNA FLJ90785 fis, clone THYRO1001457, moderately similar to H.sapiens protein kinase C mu OS=Homo sapiens PE=2 SV=1Alternative isoforms (this only applies to UniProtKB/Swiss-Prot):
>sp|IsoID|EntryName Isoform IsoformName of ProteinName OS=OrganismName[ GN=GeneName]Where:
Name field of the UniProtKB entry.Example:
sp|Q4R572-2|1433B_MACFA Isoform Short of 14-3-3 protein beta/alpha OS=Macaca fascicularis GN=YWHAB
>UniqueIdentifier ClusterName n=Members Tax=Taxon RepID=RepresentativeMemberWhere:
Example:
>UniRef100_A5DI11 Elongation factor 2 n=1 Tax=Pichia guilliermondii RepID=EF2_PICGU
>UniqueIdentifier status=StatusWhere:
Example:
>UPI0000000005 status=active
>UniqueIDentifier ProteinName OS=OrganismName[ Pep=SourcePeptideIdentifier]SV=SequenceVersionWhere:
Example:
>MES00000000005 Putative uncharacterized protein GOS_3018412 (Fragment) OS=marine metagenome Pep=JCVI_PEP_1096688850003 SV=1
>db|UniqueIdentifier archived from Release ReleaseNumber ReleaseDate SV=SequenceVersionWhere:
Examples:
"pre-UniProt":>sp|P05067 archived from Release 18.0 01-MAY-1991 SV=3 >tr|Q55167 archived from Release 17.0 01-JUN-2001 SV=1"post-UniProt":
>sp|P05067 archived from Release 9.2/51.2 28-NOV-2006 SV=3 >tr|A0RTJ8 archived from Release 11.0/36.0 29-MAY-2007 SV=1
| New OG (OrGanelle) line value: Chromatophore |
|---|
Not before: 22-Jul-2008
We are going to add Chromatophore to the list of valid plastid values in the OG line. The chromatophore is the photosynthetic inclusion found in Paulinella chromatophora, a photosynthetic thecate amoeba. It encodes and houses the machinery necessary for photosynthesis and CO2 fixation; it also has the genetic capacity to synthesize some amino acids, some fatty acids and a few cofactors. It is not yet clear whether the chromatophore derives from the same endosymbiotic event that is thought to have led to all other plastids. The chromatophore genome of P. chromatophora has been sequenced (PubMed:18356055) and been found to be just over 1 Mb, approximately 9 times larger than the average photosynthetic plastid and approximately 1/3 smaller than the smallest cyanobacterial genome.
Example:
OG Plastid; Chromatophore.
| Change of the comment line (CC) topic INTERACTION |
|---|
Not before: 01-Oct-2008
The CC line topic INTERACTION conveys information about binary protein-protein interactions. A description of its current format is available in the UniProtKB User Manual. Up to now, all interaction data has been automatically derived from the IntAct database. In the future we will start to add manually curated binary protein-protein interactions in UniProtKB/Swiss-Prot (these are currently described in the CC line topic SUBUNIT). In order to represent isoform- and chain-specific interactions (e.g. for viral polyproteins), and to add interactor-specific comments (e.g. PTMs and binding regions), we are going to modify the format of the INTERACTION lines. Each binary interaction will be represented by a block of 3 to 4 lines:
By
similarity and/or cross-reference to the database from which the data was
derived).
Protein1= line represents the currently displayed entry, the
Protein2= line the other interacting protein. If Protein2 is
from a different species than Protein1, its species or taxonomic range is
indicated.
Note: Variable values are represented in italics. Perl-style multipliers indicate whether a pattern (as delimited by parentheses) is optional (?), may occur 0 or more times (*), or 1 or more times (+). Alternative values are separated by a pipe symbol (|). Special characters are escaped by a backslash (\).
CC -!- INTERACTION: (CC Interact=status( \(source|By similarity\))?;( Xref=xref;)? (CC Comment=free_text;)? CC Protein1=name [id(:subid)?];( Note=free_text;)? CC Protein2=name [id(:subid)?];( Organism=organism;)?( Note=free_text;)?)+Where:
Yes | No | Uncertain
Yes if there is experimental evidence that the two proteins (or their homologues) interact in a physiological context.No if there is experimental evidence that the two proteins (or their homologues) do not interact under the experimental conditions described in the cited publication.Uncertain if the experimental evidence for the interaction between the two proteins (or their homologues) is not considered to reliably reflect an interaction in a physiological context (e.g. results from not further validated yeast-two-hybrid or in-vitro experiments).PubMed
IntAct
Name= or
OrderedLocusNames= or ORFNames=) or a dash '-'
if the gene name is unknown.
IsoId= field).
FTId= field).
Protein2. In a host-virus protein interaction this refers to the
range of species of Protein2. An entry from a representative
virus strain/isolate is displayed in Protein2.
Host:
In a virus-host protein interaction this refers to the range of species of
Protein2 which corresponds to the
OH line of
Protein1. An entry from a representative host is displayed in
Protein2.
Comment= contains additional information concerning the
interaction (like subcellular location).
Note= contains additional information concerning the interacting
protein (like PTM status, binding domains).
Examples:
CC -!- INTERACTION: CC Interact=Yes (PubMed:11533489); CC Comment=HDAC3 mediates the deacetylation of RELA. CC Protein1=RELA [Q04206]; CC Protein2=HDAC3 [O15379];Isoform-specific interaction:
CC -!- INTERACTION: CC Interact=Yes (PubMed:10837489); CC Protein1=MCL1 [Q07820-1]; CC Protein2=BAK1 [Q16611]; CC Interact=Yes (PubMed:15901672, 17097560); Xref=IntAct:EBI-1003422,EBI-519866; CC Protein1=MCL1 [Q07820]; CC Protein2=BAK1 [Q16611];Negative isoform-specific interaction:
CC -!- INTERACTION: CC Interact=Yes (PubMed:11418237); Xref=IntAct:EBI-375446,EBI-389883; CC Protein1=ABI1 [Q8IZP0]; CC Protein2=NCK1 [P16333]; Note=SH3 1 domain; CC Interact=No (PubMed:12681507); CC Protein1=ABI1 [Q8IZP0-6]; CC Protein2=NCK1 [P16333]; CC Interact=Yes (By similarity); CC Protein1=ABI1 [Q8IZP0]; Note=N-terminus; CC Protein2=WASF1 [Q92558];Chain-specific host-virus interaction:
CC -!- INTERACTION: CC Interact=Yes (By similarity); CC Protein1=C1QR1 [Q9NPY3]; CC Protein2=Core protein p21 [P27955:PRO_0000037583)]; Organism=Hepatitis C virus [NCBI_TaxID=11103]; Note=See also other virus strains;Chain-specific virus-host interaction:
CC -!- INTERACTION: CC Interact=Yes (By similarity); CC Protein1=Core protein p21 [P27955:PRO_0000037583]; CC Protein2=C1QR1 [Q9NPY3]; Organism=Host; Note=See also other hosts;Heterologous interaction between Bos taurus and Homo sapiens proteins:
CC -!- INTERACTION: CC Interact=Yes (PubMed:16470652); Xref=IntAct:EBI-907934,EBI-907894; CC Protein1=CNP [P06623]; CC Protein2=CABP1 [Q9NZU7]; Organism=Homo sapiens [NCBI_TaxID=9606];Uncertain interaction:
CC -!- INTERACTION: CC Interact=Uncertain (PubMed:15231747); CC Protein1=NOB1 [Q9ULX3]; CC Protein2=UPF2 [Q9HAU5];
ExPASy Home page |
Site Map | Search ExPASy | Contact us | Swiss-Prot |
| Hosted by | Mirror sites: | Brazil | Canada | China | Korea | Switzerland |