Picture this, you head to the supermarket to purchase food for a week, you collect the items you need, and if you are like me this includes several additional “treats”, and you then wander over to the checkout. Each item is scanned by a barcode reader, the computer reads the barcode and records the price, item after item is scanned and then suddenly, the production line halts, the barcode cannot be read, you’re in trouble. The computer cannot understand what the item is, it is missing the unique code that has been assigned to all the other items. If this is familiar to you then you already have a basic understanding of DNA barcoding but replace barcodes with sequences of DNA and items with plants. DNA barcoding is an extremely important field of research, enabling the identification of plants within our environment. After DNA is extracted from plants uniquely identified gene regions or barcodes are amplified using Polymerase Chain Reaction – this is the process by which DNA is amplified. The resulting amplified DNA is then sent to be sequenced i.e., the DNA sequence is read and a whole bunch of A’s, G’s, C’s and T’s are recorded. The combination and order of these four characters dictates the unique barcode for each plant species on earth.
Generating large databases of these unique DNA sequences is a challenge and currently databases are severely depauperate, meaning there are lots of plants with missing barcodes. This is a serious problem in the field of environmental DNA, which is the recovery of DNA from environmental samples such as water and soil (mentioned previously). If DNA is recovered from an environmental sample to say, try to detect an invasive species, monitor diet or document diversity, then this sequence needs to be matched to a database, otherwise it is just a series of characters with not much use. The recent study I worked on looked at a method to generate these barcodes for a large number of plant species across multiple barcodes. This is where it gets tricky, as plants can have multiple barcodes, not just one. This is because not all barcodes are as unique to the species, genus or family as others, and so the ability to uniquely identify a plant may depend on one or a combination of plant barcodes. This means not only do we need to increase the number plants which have a barcode sequence generated, but also increase the number of these barcodes for each species to ensure unique identification. Not an easy task that’s for sure! but recent advances in both sequencing and the advent of targeted capture (explained here) is making this more achievable.
In my recent research published in Ecology and Evolution, I applied targeted capture to generate 20 barcodes across 93 plant species, all of which were coastal temperate plants, given my field of interest. I had a 92% success rate for recovering all targeted barcodes across the 93 samples. I also investigated which of the barcodes may be better to use to identify species, in other words, which barcode was unique to most of the species in the database. I found this varied across different plant families and the combination of multiple barcodes was the best way to uniquely identify closely related species. This approach to barcode generation for plants means multiple barcodes can be developed for the same cost and time to generate one. So in the race to generate barcodes for all flora on earth this research leads the charge and is a great step towards barcoding all plants, an initiative that is currently being undertaken for all of Australian flora within the genomics for Australian plants project and the Tree of Life Initiative.