The database of pre-computed secondary structure (ss) annotations for proteins and whole proteomes
7 billion (7,005,354,422) unique (non-redundant) protein sequences included

Find your protein

Note: for next week the search is limited to UniProtKB (>200M records)

(re-indexing the SQL database in progress)

For protein sequence use single-letter IUPAC amino acid symbols (alphabet: 'ACDEFGHIKLMNPQRSTVWXY', any other letters are removed).
The input longer than 20 letters is considered as amino acid sequence.

Examples:
MVIHFSNKPAKYTPNTTVAFLALVDGAEVECEISVEALEDHFDAPSMQGVDLVAAFEAHRTQIEAVARVKLPQRLPAGRCLLISDYF
TLYFIFGIWAGMVGTSLSLLIRAELGNPGSLIGNDQIYNTIVTAHAFIMIFFMVMPIMIGGFGNWLVPLMLGAPDMAFPRMNNMSF... (long sequence)
DYKLTYYTPDYVTKDTDILAAFRVTPQPGVPPEEAGAAVAAESSTGTWTTVWTDGLTSLDRYKGRCYNIEPVAGEENQYICYFTK... (even longer sequence)

Note that the safest option is to use plain protein sequence itself as a query, but we also support external database UIDs like UniProt (e.g. P42694, O88093 or A0A4R5KZE2_9BURK), AlphaFold (A0A4R5KZE2) or ESMatlas (e.g. MGYP000001820099) databases

For full list of supported databases and example queries see FAQ. Additionally, you can get secondary structure predictions for whole proteomes (e.g.organism_id:9606). For more options check Search Proteomes.


Secondary structure for selected proteomes (all proteins)

Human
Mouse
Rat
Rabbit
Chicken
Xenopus
laevis
Zebrafish
Fruit fly
Caenorhabditis
elegans
Mosquito
Arabidopsis
thaliana
Soybean
Rice
Maize
Tomatoes
Baker's
yeast
Candida
albicans
Bacillus
subtilis
Trypanosoma
cruzi
Escherichia
coli
For more proteomes check "Search Proteomes" section.

Currently, the database contains annotation for proteins from the following databases:

  1. UniProt (UniProtKB/Swiss-Prot, UniProtKB/TrEMBL; UniProt/UniParc version 2023_05; 578M protein sequences)
  2. AlphaFold database (215M protein sequences, >90k proteomes)
  3. ESMatlas (772M protein sequences)
  4. NCBI BLAST nr (non-redundant) (596M protein sequences)
  5. BFD (2.2B protein sequences)
  6. Uniclust (367M protein sequences)
  7. PDB70 (112M protein sequences, as used in AlphaFold2)
  8. MGnify (Full version, 3.0B protein sequences, and clustered version, 624M protein sequences, as used in AlphaFold2 are suported)
  9. KMAP (310M protein sequences)
  10. FESNov (400M protein sequences)
  11. GMGC (966M protein sequences)
  12. JGI_IMG (459M protein sequences)

The prediction of secondary structure is done using the following programs/algorithms:

  1. DSSP (ss3 and ss8 alphabet secondary structure annotations from 3D protein models from ESMatlas and AlphaFold databases)
  2. STRIDE (ss3 and ss8 alphabet secondary structure annotations from 3D protein models from ESMatlas and AlphaFold databases)
  3. SSE-PSSM (ss3 and ss3 sequence-based predictions)
  4. Bio Embeddings (seqvec model, ss3 and ss8 sequence-based predictions)
  5. ProtTrans (prot_bert_bfd_ss3 model, ss3 only sequence-based predictions)
Additionally, consensus based secondary structure is provided. All predictions provide additionally confidence scores at individual residue level. For more details see FAQ section.

Disclaimer

Any information present in pSSdb datatabase is theoretical inference and it is provided for research, educational and informational purposes only. For clarity, it is provided 'as-is' without any warranty of any kind, whether expressed or implied. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

pSSdb database is available under Creative Commons Attribution-NoDerivs license, for more details see here

Reference: Kozlowski LP pSSdb: Protein Secondary Structure Database (submitted) Contact: Lukasz P. Kozlowski