The Contents of a PDB File

You can view PDB files directly on the web. Use the Protein Data Bank's 3DB Browser to search for the file of your choice. Then on the 3DB Atlas page for your file, click on [complete with coordinates] in the Data Retrieval line that reads as follows:

Asymmetric unit, PDB entry: [header only] or [complete with coordinates]

The resulting display shows the file contents, usually with web links to abstracts of papers about the structure.

Alternatively, make a duplicate of your downloaded PDB file, and name it #xxx.txt, where #xxx is the PDB file code. Open #xxx.txt in a word processor of your choice. You may have to reformat the file so that all lines begin with one of the line types listed below (HEADER, COMPND, and so forth. To format it correctly, try selecting the whole file (Edit: Select All) and setting the font to Courier or Monaco and the size to 9 point. Adjust the right margin so that lines begin with a line number and designation of line types (listed below) and end with the four-character file code followed by a line number. Once you have the file in readable format, examine its contents.

The contents of the file, in order of appearance, are

- HEADER lines, containing the file name and date.

- COMPND lines, containing the name of the protein.

- SOURCE lines, giving the organism from which the protein was obtained.

- AUTHOR lines, listing the persons who placed this data in the Protein Data Bank.

- REVDAT lines, listing all revision dates for data on this protein.

- REMARK lines, which contain 1) references to journal articles about the structure of this protein, and 2) general information about the contents of this file.

- SPRSDE lines, which list older coordinate files of this same structure.

- SEQRES lines, which give the amino-acid sequence of the protein.

- FTNOTE lines, which contain footnotes

- HET and FORMUL lines, which list the cofactors, prosthetic groups, inhibitors or other nonprotein substances present in the structure.

- HELIX, SHEET, and TURN lines, which list the elements of secondary structure in the protein.

- CRYST1, ORIG, and SCALE lines, which contain some general information about the protein crystals from which this structure was obtained by the technique of x- ray crystallography.

- ATOM and HETATM lines, which contain the atomic coordinate data needed to display the structure of the protein. Notice that no hydrogen atoms are listed in the file. Most protein crystals do not diffract well enough to allow hydrogen atoms to be resolved. Positions of hydrogen atoms must be inferred from the positions of other atoms.

Note that residue numbers sometimes do not begin with #1 (and sometimes end before the known last residue number of the protein). In a crystallographic model, this usually means that some of the terminal residues were not found in the electron-density map that was interpreted to produce the model, perhaps due to disorder in these parts of the molecule. For more information on the strengths and limitations of crystallographic models, click here.

- CONECT lines, which list bonds involving nonprotein atoms in the file.

- MASTER and END lines, which mark the end of the file.

Remember that this PDB file is the source of all the information that a modeling program uses to produce a display of the protein structure. Most modeling programs can display any protein for which you provide coordinate data in the format of ATOM lines in a PDB file.

Look again at the ATOM lines. Each line lists one atom. In order across the row after the line number and the word ATOM are

a) atom number: atoms are numbered in sequence through the file;

b) atom type: n = amide N, ca = alpha C, c = carbonyl C, o = carbonyl O, cb = beta carbon, and so forth;

c) residue name: three-letter amino acid abbreviation;

d) residue number;

e) x-coordinate of the atom, in angstroms from the unit-cell origin;

f) y-coordinate of the atom;

g) z-coordinate of the atom;

h) occupancy: the fraction of unit cells that contain the atom in this particular location, usually 1.00, or all of them (can be used to represent alternative conformations of side chains);

i) temperature factor: an indication of uncertainty in this atom's position due to disorder or thermal vibrations (can be used by graphics programs to represent the relative mobility of different parts of a protein)

j) every line ends with the PDB file identification code.

Columns e, f, and g locate the atom in three dimensional space, giving each atom a set of Cartesian coordinates (x, y, z). Molecular modeling programs use this information to produce three dimensional displays of molecules on the computer screen. During rotations of the image, modeling programs continually recalculate all coordinates to establish new atom positions for all atoms as they move.

When you have finished studying the PDB file, you may want to save it with the changes you made in its format. Be sure to keep the name #xxx.txt to distinguish it from the original file #xxx.pdb. Most molecular modeling programs will not open a PDB file that has been modified in a word processor.

Close this window to return to the SwissPdbViewer Tutorial. Click the Back button in your browser to return to other sites.