“I’ve looked at tens of thousands of living and fossil leaves,” says that paleobotanist, Peter Wilf of Penn State’s College of Earth and Mineral Sciences. “No one can remember what they all look like. It’s impossible—there’s tens of thousands of vein intersections.” There’s also patterns in vein spacing, different tooth shapes, and a whole host of other features that distinguish one leaf from the next. Unable to commit all these details to memory, botanists rely instead on a manual method of identification developed in the 1800s. That method—called leaf architecture—hasn’t changed much since. It relies on a fat reference book filled with “an unambiguous and standard set of terms for describing leaf form and venation,” and it’s a painstaking process; Wilf says correctly identifying a single leaf’s taxonomy can take two hours.
That’s why, for the past nine years, Wilf has worked with a computational neuroscientist from Brown University to program computer software to do what the human eye cannot: identify families of leaves, in mere milliseconds. The software, which Wilf and his colleagues describe in detail in a recent issue of Proceedings of the National Academy of Sciences, combines computer vision and machine learning algorithms to identify patterns in leaves, linking them to families of leaves they potentially evolved from with 72 percent accuracy. In doing so, Wilf has designed a user-friendly solution to a once-laborious aspect of paleobotany. The program, he says, “is going to really change how we understand plant evolution.”
The project began in 2007, after Wilf read an article in The Economist titled “Easy on the eyes.” It documented the work of Thomas Serre, the neuroscientist from Brown, on image-recognition software. Serre was at MIT at the time and had taught a computer to distinguish photos with animals from photos without animals, with an 82 percent rate of accuracy. That was better than his (human) students, who only only pulled it off 80 percent of the time. “An alarm went off in my head,” says Wilf, who cold-called Serre and asked if this computer program could be taught to recognize patterns in leaves. Serre said yes, and the two scientists cobbled together a preliminary image set of leaves from about five families and started running recognition tests on the computer. They quickly achieved an accuracy rating of 35 percent.
By now, Wilf and Serre have fed the program a database of 7,597 images of leaves that have been chemically bleached and then stained, to make details like vein patterns and toothed edges pop. Small imperfections like bug bites and tears were purposefully included, since those details provide clues to the plant’s origins. Once the software processes these ghost images, it creates a heat map on top of them. Red dots point out the importance of different codebook elements, or tiny images illustrating some of the 50 different leaf characteristics. Together, the red dots highlight areas relevant to the family the leaf may belong to.
This, rather than detecting species, is the broader goal for Wilf. He wants to start feeding the software tens of thousands of images of unidentified, fossilized plants. If you’re trying to identify a fossil, Wilf says, it’s almost always of an extinct species, “so finding the evolutionary family is one of our motivators.” Knowing the leaf’s species isn’t as helpful as knowing where the leaf came from or what living leaves it’s related to—invaluable information to a paleobotanist.It remains to be seen whether this program moves us any closer to a fuller answer to one of Wittgenstein's Philosopical Investigations:
73. When someone defines the names of colours for me by pointing to samples and saying "This colour is called 'blue', this 'green' ..... " this case can be compared in many respects to putting a table in my hands, with the words written under the colour-samples.-Though this comparison may mislead in many ways.-
One is now inclined to extend the comparison: to have understood the definition means to have in one's mind an idea of the thing defined, and that is a sample or picture. So if I am shewn various different leaves and told "This is called a 'leaf' ", I get an idea of the shape of a leaf, a picture of it in my mind.-But what does the picture of a leaf look like when it does not shew us any particular shape, but 'what is common to all shapes of leaf'? Which shade is the 'sample in my mind' of the colour green-the sample of what is common to all shades of green?
"But might there not be such 'general' samples? Say a schematic leaf, or a sample of pure green?"
-Certainly there might. But for such a schema to be understood as a schema, and not as the shape of a particular leaf, and for a slip of pure green to be understood as a sample of all that is greenish and not as a sample of pure green-this in turn resides in the way the samples are used.
Ask yourself: what shape must the sample of the colour green be? Should it be rectangular? Or would it then be the sample of a green rectangle?-So should it be 'irregular' in shape? And what is to prevent us then from regarding it-that is, from using it-only as a sample of irregularity of shape?
74. Here also belongs the idea that if you see this leaf as a sample of 'leaf shape in general' you see it differently from someone who regards it as, say, a sample of this particular shape. Now this might well be so -- though it is not so -- for it would only be to say that, as a matter of experience, if you see the leaf in a particular way, you use it in such-and-such a way or according to such-and-such rules.