In order to identify influences between geographical phenomena, NYU Tandon researchers are developing a mathematical framework that allows small data sets as efficient as big data in identifying spatial dependencies.
Data depends on the identification of climate change-driven human migration, COVID-19 spread, agricultural patterns, and socio-economic problems in neighboring regions – the more complicated the model, the more data is required to understand these spatially dispersed phenomena. Reliable knowledge, however, is often costly and difficult to acquire, or too sparse to make reliable predictions.
Maurizio Porfiri, an Institute Professor of Mechanical, Aerospace, Biomedical, and Civil and Urban Engineering and a member of the Center for Urban Science and Progress (CUSP) at NYU Tandon School of Engineering, has developed a new solution based on network and information theory that, by applying mathematical techniques normally used for time series, makes ‘small data’ look big on spatial processes.
The thesis, published on the cover of Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, entitled ‘An information-theoretical approach to the study of spatial dependencies in small datasets,’ explains how observers may make robust inferences on influences from a small sample of attributes at a limited number of locations, including interpolations to intervening areas or eve
“Most of the time, the data sets are poor,” Porfiri explains. “That’s why we took a very basic approach and applied information theory to investigate whether influences can be extended to space in a temporal sense, which allows us to work with a very small data set, between 25 and 50 observations,” he said. “We take a snapshot of the data and draw connections – not based on cause and effect, but on the interaction between points – to see if there is some form of underlying, collective response in the system.”
The method, developed at the Department of Quantitative Methods, Law and Modern Languages of the Technical University of Cartagena, Spain, by Porfiri and his collaborator Manuel Ruiz Marín, includes:
Consolidate a given data set into a small range of permissible symbols, similar to how a face with minimal pixel data can be defined by a machine learning system: a chin, cheekbones, forehead, etc.
Applying an information theoretical concept to establish a non-parametric test (which does not presume an underlying model for interaction between locations) to draw ties between events and to find out if, given knowledge of uncertainty at another location, uncertainty at a given location is decreased.
Porfiri clarified that no underlying structure for the influences between nodes is assumed by a nonparametric approach and thus provides flexibility in how nodes can be connected or even how a neighbor’s definition is described.
Since we abstract the definition of a neighbor, we may describe it, for example, philosophy, in the sense of any property.
Ideologically, even though they are not geographically combined, California could be a neighbor to New York.
Yet they may share values that are identical.
In order to gain statistically sound insight into the mechanisms of major socio-economic problems, the team validated the framework using two case studies: population migrations in Bangladesh due to sea level rise and deaths from motor vehicles in the U.S.
“In the first case, we wanted to see if migration between places could be predicted by geographic distance or the severity of flooding in that particular county – whether knowing which county is near another county or knowing the extent of flooding helps predict the magnitude of migration,” says Ruiz Marín.
For the second example, the geographic distribution of alcohol-related automobile accidents in 1980, 1994 and 2009 was analyzed, contrasting states with high levels of such crashes to neighboring states and states with similar drinking and driving statutory ideologies.
We also discovered a stronger interaction between states with common boundaries than between states with similar drinking and driving regulatory philosophies.’