Discovering the Unexpected & Explicable : Scientific Reasoning and Research Design for Spatial Data Analysis

Abstract. Methods for exploratory data analysis and exploratory spatial data analysis (or ESDA) are useful to identify outliers, clusters, skewed distributions and correlations (see Figure 1 for examples implemented in our GeoDa software). Researchers routinely use these methods to find insights.However, what motivated this project is that, by default, it is easier to find insights that confirm the expected. Often, in fields like geography, statistics or computer science, available software and data drive the choice of research questions and the process of how we explore data. Typical insights gained this way are descriptive, like where a cluster is located or whether variables are correlated.To be sure, expected insights are reassuring. And descriptive insights are important. But instead of stopping here we want to build on them to also find insights that are new and relevant – we want to discover the unexpected (Anselin, 1998; Kielman & May, 2009). And, while we do need to know where clusters and outliers are – as researchers, we also want to go further and explain why these patterns exist (Good, 1983). In this project we presume that there are ways for structuring the process of data exploration that make it more likely to discover unexpected and explanatory insights (Platt, 1964).This presentation summarizes results from a summer 2020 lab where we started experimenting with how to do this using our Center’s GeoDa software. The summer lab was directed by Julia Koschinsky of the University of Chicago’s Center for Spatial Data Science. Marcos Falcone helped mentor five young University of Chicago and high school students for 7–10 weeks (majoring in statistics, computation, geography, political science, and economics).Our approach was to draw on philosophy of science and scientific reasoning to understand how the discovery of unexpected and explicable insights can work. We then tried to translate this to research designs for ESDA. Finally, we implemented the designs in replicable prototype examples for teaching and learning spatial research at the undergraduate or high school level.For instance, in terms of scientific reasoning, classic work on causal explanations (Mill, 1843) augments the typical current focus on correlations by also highlighting the need to assess the plausibility of your own explanation versus alternatives. This requires a mindset and practice of rigorously testing how our explanations might be wrong (Popper, 1959) rather than confirming that they're right (Nuzzo, 2015; Kahneman, 2011). To do this requires an iterative exchange between data and explanations – referred to as abductive reasoning, as it combines inductive and deductive approaches (Peirce, 1878; Heckman and Singer, 2017). We used Sherlock Holmes stories and the famous John Snow cholera case to illustrate the structure of these scientific reasoning concepts for a high school context (Coleman, 2019; cf Konnikova, 2013; Vinten-Johansen, 2020).Scientific reasoning goes back hundreds of years. Our challenge this summer and from here on has been to translate this reasoning to research designs that are applicable to modern interactive ESDA tools. Each of us developed four prototype resources for teaching and learning ESDA in GeoDa that we will develop further (Fig. 2): 1) protocols for how this could be done; 2) case examples to apply and revise the protocol; 3) GeoDa demo scripts to make the examples replicable; and 4) cleaned data and documentation. These resources will be released as part of a GeoDa Cookbook in the near future.Fig. 3 illustrates one of the protocols that differs from how ESDA is typically navigated. The starting point is the exploration of patterns in the outcome variable of interest. Next is the formulation of alternative explanations whose patterns plausibly match those of the outcome variable. Then we draw on quasi-experimental research designs to structure the testing of this match (Shadish et al., 2002). Finally, data about the hypothesized explanations are analyzed with ESDA and regressions to test or reformulate the hypotheses as part of an abductive process.


Methods for exploratory data analysis and exploratory spatial data analysis (or ESDA) are useful to identify outliers, clusters, skewed distributions and correlations (see Figure 1 for examples implemented in our GeoDa software). Researchers routinely use these methods to find insights.
However, what motivated this project is that, by default, it is easier to find insights that confirm the expected. Often, in fields like geography, statistics or computer science, available software and data drive the choice of research questions and the process of how we explore data. Typical insights gained this way are descriptive, like where a cluster is located or whether variables are correlated.
To be sure, expected insights are reassuring. And descriptive insights are important. But instead of stopping here we want to build on them to also find insights that are new and relevant -we want to discover the unexpected (Anselin, 1998;Kielman & May, 2009). And, while we do need to know where clusters and outliers are -as researchers, we also want to go further and explain why these patterns exist (Good, 1983). In this project we presume that there are ways for structuring the process of data exploration that make it more likely to discover unexpected and explanatory insights (Platt, 1964).
This presentation summarizes results from a summer 2020 lab where we started experimenting with how to do this using our Center's GeoDa software. The summer lab was directed by Julia Koschinsky of the University of Chicago's Center for Spatial Data Science. Marcos Falcone helped mentor five young University of Chicago and high school students for 7-10 weeks (majoring in statistics, computation, geography, political science, and economics).
Our approach was to draw on philosophy of science and scientific reasoning to understand how the discovery of unexpected and explicable insights can work. We then tried to translate this to research designs for ESDA. Finally, we implemented the designs in replicable prototype examples for teaching and learning spatial research at the undergraduate or high school level.
For instance, in terms of scientific reasoning, classic work on causal explanations (Mill, 1843) augments the typical current focus on correlations by also highlighting the need to assess the plausibility of your own explanation versus alternatives. This requires a mindset and practice of rigorously testing how our explanations might be wrong (Popper, 1959) rather than confirming that they're right (Nuzzo, 2015;Kahneman, 2011). To do this requires an iterative exchange between data and explanations -referred to as abductive reasoning, as it combines inductive and deductive approaches (Peirce, 1878;Heckman and Singer, 2017). We used Sherlock Holmes stories and the famous John Snow cholera case to illustrate the structure of these scientific reasoning concepts for a high school context (Coleman, 2019;cf Konnikova, 2013;Vinten-Johansen, 2020).  Scientific reasoning goes back hundreds of years. Our challenge this summer and from here on has been to translate this reasoning to research designs that are applicable to modern interactive ESDA tools. Each of us developed four prototype resources for teaching and learning ESDA in GeoDa that we will develop further (Fig. 2): 1) protocols for how this could be done; 2) case examples to apply and revise the protocol; 3) GeoDa demo scripts to make the examples replicable; and 4) cleaned data and documentation. These resources will be released as part of a GeoDa Cookbook in the near future. Fig. 3 illustrates one of the protocols that differs from how ESDA is typically navigated. The starting point is the exploration of patterns in the outcome variable of interest. Next is the formulation of alternative explanations whose patterns plausibly match those of the outcome variable. Then we draw on quasi-experimental research designs to structure the testing of this match (Shadish et al., 2002). Finally, data about the hypothesized explanations are analyzed with ESDA and regressions to test or reformulate the hypotheses as part of an abductive process.