Using digital information to record facts about the planet, its built environment and the people that live on it is a Big Data exercise par excellence.
To be credibly comprehensive the database is so big that it presents problems of collection as well as analysis.
But that didn’t faze attendees at the Digital Earth Summit in Wellington last week.
John Richards of Australian National University told the conference that there are two ways of tackle the ultimate Big Data challenge – crowdsourcing using information acquired by amateurs, and entrusting the machine with more of the task of verifying the data.
Data in a comprehensive Digital Earth system will not be confined to the usual spatial geographical data and data on buildings, he says, but will include socially mediated information; facts and alleged facts that only humans can input to the database.
“Physical sensors generate data that’s analysed to produce information,” Richards says; “that information processed by human analysis, leads to knowledge. Citizens’ input is generally in the form of knowledge already.”
Other speakers at the conference referred to crowdsourced maps, where the local population is a source for input at the “data” level.
With citizen-sourced knowledge, there is clearly a problem with verification, Richards says.
An informant may be mistaken or mischievously spreading misinformation.
Some machine processes will necessarily be involved in the correction process, he says, because we simply have not the resources to run a critical human eye over every piece of input.
A number of techniques are available to solve the problem in a machine-assisted way. Software can check for “convergence” among a group of alleged pieces of knowledge about the same entity. If most inputs agree and there are one or two outliers, these can probably be discounted.
Expert systems – a mature form of “artificial intelligence” — can be bought to bear, running the rule of accumulated knowledge of experts on that particular class of information over the amateurs’ input.
A different and powerful technique is to crowdsource correction – the principle on which Wikipedia and rumour on social media operate.
If someone tweets an erroneous piece of information, at least 10 corrections from better-informed contributors can be expected in short order, Richards says.
Multiple vetted sources will contribute to a Digital Earth database and there will, of course be multiple consumers, taking different views of the information.
Some of these consumers will process the information, adding elements of their own and will contribute the results back to the database.
A mature Digital Earth representation will settle down to a point somewhere between content generated and vetted fully automatically, and content that still has human input at some point. However, Richards sees this mature line at one edge of a triangle. The opposite corner he labels the “gee-whiz” point, where the discipline started; where researchers, providers and consumers were overawed with the technology.
The current state of Digital Earth work still has elements of “gee-whiz” about it, Richards says.