Writing chemistry




Atoms and molecules/chemical compounds

When writing chemistry, it is important to remember that the syntax carries information. This means that there is a difference between uppercase and lowercase letters, whether you have superscipt or subscript, etc.

There is thus a difference between writing CO and Co. CO is the chemical compound carbon monoxide, whereas Co it the metal cobalt. Likewise there is a difference between writing 2 H, 2H or H2. 2 H is two hydrogen atoms that aren't connected, 2H is the hydrogen isotope deuterium, while H2 is 2 hydrogen atoms that are connected by a covalent bond to one molecule.

So, the way to write the number of atoms or chemical compounds, is numbers in the front, e.g. 2 H, refers to a number of separate units, while numbers placed behind, using subscript, e.g. H2, refer to the number of atoms/chemical subunit in the same chemical substance. This means that if you see something like 2 H2, then you have 2 hydrogen molecules, each consisting of hydrogen atoms, i.e. you have a total of 4 hydrogen atoms.

When writing charges on ions and complexes, this is written in supscript. If we start with the simple ions with a single charge, it could be something like this: H+ and OH. If the ions/complexes have multiple charges, the number is written before the charge, e.g. SO42− and Fe3+. In old literature you may encounter the charges being written individually, e.g. Fe++, but in the same manner as we don't say elephant elephant, but two elephants, the way we write it today is Fe2+.

Out there in the real world, you can encounter alternative ways of writing formulas like CO2 and H2O. This is not the correct way of writing formulas, and it never has been.



Chemical formulas

Formulas are a way of writing ratios for chemical compounds. You have three main types of formulas:


The empirical formula is the ratio between atoms, reduced to the minimum. With the empirical formula you have no structural information about the chemical compound, so, if you have the empirical formula CH, you only know that the C:H ratio is 1:1, whether the molecule is benzene (C6H6) or acetylene (C2H2), or something else entirely, you have no way of telling from the empirical formula.


The molecular formula is what most people associate with chemical formulas, and it contains some measure of structual information about the chemical compound. Writing molecular formulas offers some room for creativity, in regards to getting as many structural informations included as possibel, so, the way molecular formulas are written is not entirely uniform.

Starting with the inorganic compounds, you have a number of well defined ions, and there is some degree of consensus about the order in which they are mentioned. The positively charged ions are written first, and the polyatomic ions are written as one, e.g. the sulfate ion SO42− in calcium sulfate, CaSO4, instead of just listing the atoms alphabetically as CaO4S. You can have more than one of the same well defined ion, like you have in hydroxy apatite, Ca5(PO4)3OH, which contains three phosphate ions, PO43−. If there is more than one, you put the polyatomic ions in brackets, and after the bracket you write the number of the particular ion. If you have more than one positively or negatively charged ion, e.g. the negatively charged ions in hydroxy apatite, the order in which you write them is a matter of history, i.e. this is how we have always done it, and in the absense of tradition you list them alphabetically.

In some parts of the educational system, you learn that the ratios must be natural numbers (integers). However, in the real world, decimal numbers are used. It is something that is used for natural minerals and doped crystals for hightech equipment. A mineral like dolomite is CaCO3 where Ca is partially changed to Mg in the crystal structure. If you don't know the ratio between Ca and Mg, you usually write this as (Ca·Mg)CO3 or CaMg(CO3)2. You could get the impression that the ratio Ca:Mg is 1:1 then, but in the most common dolomite, the formula is around Ca0.62Mg0.38CO3. References can be found to types where the ratio between Ca and Mg is around 1:1, but this is not the norm. So, if you know the ratio for the substitutions in the crystal structure, you write this as decimal number, as shown for the dolomite.

The organic molecules quickly becomes so big that you can't make an molecular formula showing the structure, if you know it. In that case you have to show as much of the structure as possible. Small molecules are easy, e.g. ethanol: CH3CH2OH, but at propanol you already get into trouble. If we just write the molecular formula C3H8O, we know the atom ratio and we know the molecular weight, but we can't tell whether this is propanol or methyl-ethyl-ether. Let's say that we know that it is propanol, but not whether this is 1-propanol or 2-propanol, you would write C3H7OH. If we know which one of the two types of propanol it is, the empirical formulas are CH3CH2CH2OH and CH3CH(OH)CH3, respectively.

Therefore: The empirical formula reflects the knowledge we have about the chemical compound's structure, the the extend that it can be written in practice.


The structural formula is the most descriptive in regards to the structure of the chemical compound. If we take a molecule like pentanol, it has the formula C5H11OH, but at this point we don't know the position of the OH group. But this we can show with the structural formula (hydrogens on carbon are not shown):

2-pentanol 2D

Now we can see that it is 2-pentanol. Actually we can, if we draw it a bit better, also show a spatial structure, where this is relevant for understanding the structural formula:

2-pentanol 3D



Reaction equations

How to write reaction equations depends to some extend on the reaction type. There are some general rules, that we will take a look at here, but other than that, you can see how it is done specifically on the various pages on the site.


A reaction equation shows reactions, or the lack thereof.

The traditional way of showing a reaction, is having the reactants on the left side of a reaction arrow of products on the right side of the arrow. Like this:

A + B C + D

Because we read from left to right, we also prefer reading and writing reaction equations this way, but there is really nothing wrong with writing right to left, like this:

D + C B + A

It may also be that nothing happens. No reaction is just as important as a reaction taking place. This is written like this:

A + B


No charges or atoms suddenly appear or vanish during the reaction.

Reactions must be balanced, so you have the same number of the various atoms on both sides of the reaction arrow. You can't have atoms vanishing or appearing, e.g. like if there was only one Ag+ on the left side of this equation:

Cu + 2 Ag+ 2 Ag + Cu2+

The same goes with charges. Same totale charge (+2 in the shown example) on both sides of the reaction arrow, otherwise the reaction haven't been properly balanced. No atoms or charges appear out of nowhere or vanishes.


The ratio numbers don't have to be natural numbers.

For reaction equations having ratios in natural numbers is preferred, and in some schools, this is considered the only correct way to deliver the result. However, while it is considered good behavior using natural numbers, as it makes it easier to read the reaction equations, there is no formal requirement in regards to chemistry. So, a reaction balanced like this:

H2 + ½ O2 H2O

is fully acceptable, as long as the ratios are correct.



Physical states

In the classical chemistry for educational purposes, we work with four physical states for reaction equations:
The physical state is written in brackets after the chemical compound, e.g. water in liquid state: H2O (l) and as ice H2O (s). In a reaction equation it looks like this:

Ag+(aq) + Cl(aq) AgCl(s)

These are NOT the only physical states you can have. Occasionally you may encounter the physical state (cr) used for crystalline precipitates, but for physical states like plasma and supercritical, there is no notation. Similarly, a lot of organic chemistry takes place in organic solvents, but contrary to aqueous solutions, there is no notation for organic solutions, or inorganic solutions, for that matter, like the ones we know from hydrogenations, e.g. having a palladium catalyst where H2 is dissolved in the metal.

If you have reactions in physical states not covered by the four states (s), (l), (g) and (aq), this just needs to be in the text that comes with the reaction equation.