Find your protein
Note: for next week the search is limited to UniProtKB (>200M records)
(re-indexing the SQL database in progress)
For protein sequence use single-letter IUPAC amino acid symbols (alphabet: 'ACDEFGHIKLMNPQRSTVWXY', any other letters are removed).
The input longer than 20 letters is considered as amino acid sequence.
Examples:
MVIHFSNKPAKYTPNTTVAFLALVDGAEVECEISVEALEDHFDAPSMQGVDLVAAFEAHRTQIEAVARVKLPQRLPAGRCLLISDYF
TLYFIFGIWAGMVGTSLSLLIRAELGNPGSLIGNDQIYNTIVTAHAFIMIFFMVMPIMIGGFGNWLVPLMLGAPDMAFPRMNNMSF... (long sequence)
DYKLTYYTPDYVTKDTDILAAFRVTPQPGVPPEEAGAAVAAESSTGTWTTVWTDGLTSLDRYKGRCYNIEPVAGEENQYICYFTK... (even longer sequence)
Note that the safest option is to use plain protein sequence itself as a query, but we also support external database UIDs like UniProt (e.g. P42694, O88093 or A0A4R5KZE2_9BURK), AlphaFold (A0A4R5KZE2) or ESMatlas (e.g. MGYP000001820099) databases
For full list of supported databases and example queries see FAQ.
Additionally, you can get secondary structure predictions for whole proteomes (e.g.organism_id:9606). For more options check Search Proteomes.
Secondary structure for selected proteomes (all proteins)
Human |
|
Mouse |
|
Rat |
|
Rabbit |
|
Chicken |
|
Xenopus laevis |
|
Zebrafish |
|
Fruit fly |
|
Caenorhabditis elegans |
|
Mosquito |
|
Arabidopsis thaliana |
|
Soybean |
|
Rice |
|
Maize |
|
Tomatoes |
|
Baker's yeast |
|
Candida albicans |
|
Bacillus subtilis |
|
Trypanosoma cruzi |
|
Escherichia coli |
|
For more proteomes check "Search Proteomes" section.
Currently, the database contains annotation for proteins from the following databases:
- UniProt (UniProtKB/Swiss-Prot, UniProtKB/TrEMBL; UniProt/UniParc version 2023_05; 578M protein sequences)
- AlphaFold database (215M protein sequences, >90k proteomes)
- ESMatlas (772M protein sequences)
- NCBI BLAST nr (non-redundant) (596M protein sequences)
- BFD (2.2B protein sequences)
- Uniclust (367M protein sequences)
- PDB70 (112M protein sequences, as used in AlphaFold2)
-
MGnify (Full version, 3.0B protein sequences, and clustered version, 624M protein sequences, as used in AlphaFold2 are suported)
- KMAP (310M protein sequences)
- FESNov (400M protein sequences)
- GMGC (966M protein sequences)
- JGI_IMG (459M protein sequences)
The prediction of secondary structure is done using the following programs/algorithms:
- DSSP (ss3 and ss8 alphabet secondary structure annotations from 3D protein models from ESMatlas and AlphaFold databases)
- STRIDE (ss3 and ss8 alphabet secondary structure annotations from 3D protein models from ESMatlas and AlphaFold databases)
- SSE-PSSM (ss3 and ss3 sequence-based predictions)
- Bio Embeddings (seqvec model, ss3 and ss8 sequence-based predictions)
- ProtTrans (prot_bert_bfd_ss3 model, ss3 only sequence-based predictions)
Additionally, consensus based secondary structure is provided. All predictions provide additionally confidence scores at individual residue level. For more details see
FAQ section.
Disclaimer
Any information present in pSSdb datatabase is theoretical inference and it is provided for research, educational and informational purposes only. For clarity, it is provided 'as-is' without any warranty of any kind, whether expressed or implied. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.