Model, Framework, Data
This article will make much more sense if you read the previous installment first!
The art of modeling
In the previous installment of this article, we have followed Priscilla the Professional Problem solver in her iterative adventures around the problem solving hermeneutic loop. She started with a framework (personas), “filled it” with data, zoomed out to assess its explanatory or predictive power, decided to reject it and moved to a new framework (eg. mindsets).
A similar process is followed every day by millions of problem solvers around the world. However, most people follow this model subconsciously, in a haphazard and unstructured way.
In order to make problem solving a rigorous, teachable skill, it is essential to unpack what is actually going on when we solve problems, and what are the best practices.
Throughout the previous article, we have been using terms like model, modeling, mental model and framework fairly loosely, but it’s worth clarifying what we mean by each of these terms, starting with model.
Models take the center stage in the Problem Solving Map (PSM), as they are the expression of the knowledge created in such loop (which as we said, is subjective, evolving, etc.) More generally, models offer an interesting paradigm for knowledge creation. There are many definitions of model, my working definition is:
A model is a simplified representation of an observed or imagined reality driven by a goal.¹
Here is what this definition implies:
- Selected (not all) elements of reality are represented as variables that stand in a certain relation to each other.
- This selection is goal-driven. The goal-drivenness of any model, which ties in with our previous argument about subjectivity of the hermeneutic loop, can be easily illustrated by taking the example of a map, the model par excellence. As we can see in the picture, the same place can be modeled in radically different ways depending on my goal.
- A model can describe something as it is or as it could be, either in an imaginary world or in a future point in time, and, once interpreted, it could tell us how to get from a current state to a target state.
- Perhaps the most important thing about models is summarized by George Boxe’s saying “All models are wrong, but some are useful”. A map is useful because it is NOT the territory. Or, to quote Borges’s short story On Exactitude in Science:
“In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. […] In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography”.
A model is by definition imperfect, because a perfect model would be a copy.
So what do models look like? Well, in all kinds of ways:
As we can see in the image above, models can take a huge range of shapes, from maps to formulas, from drawings to physical maquettes, from literary allegories to real-life organisms, from digital simulations to back-of-a-napkin diagrams. There is a sense in which anything can be a model of anything: the relationship between predator and prey within an ecosystem can be modeled by a set of two equations or, at a more defined level, by a set of ten; but it can also be modeled by a graphic simulation, by a metaphor, by a drawing or a play.
On the other hand, some models can be seen as referring to multiple objects in reality: for example, the first chapters in the book of Genesis can be either seen as a model for the creation of the world and the early days of humanity, or for a set of moral principles displaying what happens if you disobey divine commandments. Or of course, at a different level of analysis, it can be seen as not a model at all; rather, as a data point within the larger framework of Ancient near-Eastern mythology.
So the answer to “what is a model of what” is fully dependent on the intention of the modeler, the understanding of the interpreter and the level of analysis of the interpretation. At the same time, some models can of course be better than others: for example, the wind tunnel at MIT is probably a better model for aerodynamic testing than a person blowing on a paper airplane. But this, as we can see, is fully dependent on the goal we set ourselves: blowing on a paper plane can for example be a better way to explain to a 5-years old how an airplane works.
Models as knowledge
Models are epistemologically interesting objects. On the one hand, we rely on them extensively to produce scientific and other kinds of knowledge, not only by manipulating them, but also as we create them. It can be said both that models produce knowledge and that models in themselves are knowledge. In fact, much scientific knowledge is acquired by observing and manipulating models rather than reality directly².
On the other hand, as we said, models have no claim to truth, being simplifications by definition (see Boxe’s dictum). As the evolving product of hermeneutic interpretive cycles, models represent a form of knowledge that integrates the whole and the part, the theory and the observation, and that is filtered by our previous knowledge and agenda, either as individuals or as groups. Thus for example, even a very “solid” scientific model like the standard model of physics is in a sense the result of the dialectic between theories and frameworks on the one hand, and findings and results on the other, filtered through the scientific community’s intent (the desire to classify particles and forces), scientific paradigms (relativity, quantum physics, etc.) and practices (open discussion, peer review, etc.).
In science, models are very useful as they solve the error-fragility problem of pure Popperian falsificationism. In other words, while if I have just one empirical instance contradicting a theory, the theory must be rejected, this is not the case with models: a model can only be replaced by a better model — that is, a model that fits better more aspects of the problem it’s trying to describe³.
Thomas Schelling, a political scientist and one of the first people to think about modeling as a method, distinguished two types of model:
A model is a precise and economical statement of a set of relationships that are sufficient to reproduce the phenomenon in question. Or, a model can be an actual biological, mechanical or social system that embodies the relationships in an especially transparent way […] — Thomas Schelling, 1978
Other scholars have categorized models in different ways. Here is my shot at categorizing models in a way that can be useful for problem-solving:
In short, models can either represent a static system or a changing one. We can easily place any model in one of these two buckets by asking ourselves whether time is a variable. Within each of these categories, we can classify models based on the main thing they help us do, or in other words, which part of the model does the explanatory heavy-lifting:
- Structural models help us understand what are the variables of a problem. They typically look like taxonomies or typologies: think about the 3Cs, Porter’s 5 forces, etc.
- Relational models go one step further. Given a set of variables, relational models tell us in what relationship they stand with each other. These models often look like equations in which two or more variables equal a constant, another variable, etc; but there are also more qualitative versions. A great example is the classic microeconomic model of price as a function of demand and supply.
- Dynamics are ways to model change. In practically all models where one of the variables is time, we are looking at dynamics. This category includes everything from a business forecast to a Montecarlo simulation, all the way to particle, fluid or systems dynamics.
- Finally, computations are models in which what matters is not so much the variables in themselves, their relationship or their evolution, but the end result or a calculation. These models are most often algorithms, a great example being the aforementioned Schelling segregation model.
This categorization gives rise to a set of (non-exhaustive) model archetypes:
Models in problem solving
Why is all of this interesting or useful?
- Models are the key output at the center of the hermeneutic loop: what we do by contextualizing frameworks and integrating data is to create a model of what we are trying to analyze.
- Models are also critical in problem solving because they allow us to reduce the space of possibility. From the endless ocean of all possible data I may want to collect and analyze, I get down to a manageable number of elements that make sense of my problem.
- Models are also the springboard to the next step of problem solving, i.e. prototyping a solution. Although some problem solving projects may end with a model itself, in most cases some intervention in the real world will be required. Once a model integrates well a framework with data, we may often want to build a real life version of what the model prescribes, or in some cases of the model itself, to be deployed in the real world.
- Most importantly, models allow us the flexibility to play with frameworks and variables. They are made to be wrong, to be scrapped and replaced. In most situations, it is true to say that infinite models can potentially be useful to understand a certain problem or dataset. The problem solver’s core skill is the ability to build the right one.
Model = Framework + Data + Reason
As we have seen, two key inputs combine in creating models: from the top (i.e. the world of a-prioris), frameworks give models form; from the bottom (i.e. from the empirical world), data gives them substance. Our conscious action, in the form of reason, allows the problem solver to merge the two, picking variables, choosing and fine-tuning frameworks, and making logical inferences.
We’ll now briefly zoom into reason and data to discuss their roles, and a full discussion of frameworks will be the subject of the next article.
Reason and thinking modes
Sometimes Priscilla the Professional Problem Solver and her team will get stuck on some problem. For example, the research team may have found two possible and inconsistent segmentations of the target market for a US pharmacy chain. One may rely on spending power, whereas the other may rely on shopping behavior. In these cases, Priscilla tells her team: “Stop everyone. Which direction should we think in? Should we go up or down?”
What she means by going up is: what happens if we take our datasets as a given and change our high-level framework accordingly? For example, can we somehow “hammer together” and intersect behavior and spending power? By doing so, we would lose segment sizes, as the two segmentations are not MECE (mutually exclusive, collectively exhaustive); on the other hand, it would allow us to put together richer personas. By doing so, the team’s thought goes from the particular to the general, inductively creating a top level category to fit empirical instances⁴.
What she means by going down is: let’s take our problem statement and framework as a given and find a better way to break it down. Is any of our two datasets a better proxy for what we are trying to observe, explain or predict? Should we just look for another dataset? By doing so, the team is thinking in a deductive way, from the general to the particular.
Another way of thinking about levels of abstraction and movement bewteen them is parametrization. In lower levels, variables will be tied to specific dastets or even to a specific data point. Abstracting, or going up in level of analysis, means that we are parametrizing the variable, i.e. asking ourselves what would be true if we let our variable take any (possible) value. This is well explained visually in this essay by Bret Victor.
Of course, when creating and manipulating models we do much more than just think inductively or deductively. “Lateral thought” may be considered as a separate mode of thinking: we are not directly breaking down an existing category or categorizing existing instances, but rather drawing a simile between and existing category and an unrelated category, by way of common characteristics shared by the instances (or the other way around, a simile between separate sets of instances). And inductive and deductive inferences themselves involve more granular “thinking modes”: an understanding of sameness and difference, the identification of co-occurrence, temporal sequence, cause and consequence, etc.
What makes the top-down vs bottom-up thinking modes interesting is not so much that they accurately represent how we think or give an exhausting explanation of the logic of thought; they are simply a useful way to orient ourselves when we get stuck in a project.
Data and variables
Having discussed reason, the first element of modeling, let’s now move to the second, data. As a professional familiar with different methods or research and analysis, Priscilla is used to handle all kind of data:
- Qualitative data: including things like interview transcripts, market reports, but also simply any sensory input, anything we see, hear, touch, etc.
- Quantitative data: including things like digital analytics, market sizes and SKU databases.
Data can come in different forms: it can be discrete or continuous, structured or unstructured, nominal, ordinal etc. There is no need to discuss each type in depth here.
From a modeling perspective, all data plays the same role, i.e. being the bottom of the hermeneutic loop, the part that has to be integrated into the whole, the “substance” of a model where the framework is “the form”, which are all ways to say the same thing. There are just two things we may want to remark in relationship to data:
- Data is what we get from observing the world: it’s not the world itself. Leaving aside the debate on the observability of “things in themselves”, the point is that in most cases data is not neutral: by deciding what data to collect and how to slice it, we always reveal an agenda, our biases and our high-level frameworks. Examples like racially-discriminatory AIs and cars developed using only men as a testing sample are by now well known. Additional common biases in data collection and manipulation include confirmation bias, survivorship bias, outlier bias, etc.
- Data is mediated by variables: while data is the set of individual instances, variables are the set of characteristics attributed to each of these instances insofar as they matter to the problem we are solving. For a simple way to visualize it, on an Excel sheet all the rows starting from 2 are the dataset; row 1 indicates the variables.
Variables are therefore the connection point between model and data. They can be approached from the top, i.e. from the model (remember the role of variables in the categorization of different types of models? Structural, relational, etc.); or from the bottom, i.e. from a given dataset. This is what in data science is called feature engineering: taking the dataset as a given we try to twist it and push it into variables that are useful to our model. We may want to consolidate variables, we eliminate duplicate or correlated variables, create additional variables out of existing ones, “projecting” a set of variables into a lower dimensional space (known as principal component analysis), etc.
Moving across the project space
We have been using the metaphor of space a lot in this article, and done much talking of top-down, bottom-up, vertical and lateral thinking etc. I love to think that, in a metaphorical but useful sense, ideas have volume, dimensions, direction and momentum. The axes across which ideas move are the ones we introduced in the previous article: they go from whole to part, from abstract to concrete, from internal to external, from a priori to a posteriori. This constant, loopy movement is how we acquire knowledge and how we solve problems.
We have seen how these movements can be represented on the Problem Solving Map. A more open-ended and “actionable” way to visualize these flows of ideas is the Problem Solving Canvas. It’s a framework devised to help us orient ourselves in projects, helping us keep track of our iterative loops between frameworks and data in an orderly fashion.
We’ll discuss the details of this canvas later. For now, it will suffice to broadly describe the framework:
- On the y axis, we overlap the whole/part and empirical/abstract axes that we used to plot the Problem Solving Map
- The x axis can represent time, as we display subsequent iterations of the framework-data-framework cycle from left to right
- The bottom-right blocks, models, variables and data, have all been described above, together with their relationship.
- In the next installment, we’ll focus on frameworks and problem statements.
To sum up, in this installment we have looked at models as the core output of the non-empiricist, iterative hermeneutic loop of problem solving. We characterized and categorized models, and delved into two of their main components: reason and data. In the next section, we’ll focus on the third and key component of modeling, frameworks, a.k.a. mental models; as well as exploring problem statements and how to best formulate them.
This is Part 2 of a series on Modeling in Problem Solving. Part 3 here.
¹ Here are some additional definitions from the literature:
- Models are formal structures represented in mathematics and diagrams that help us to understand the world. Page, Scott E.. The Model Thinker
- Concrete models are physical objects whose physical properties can potentially stand in representational relationships with real-world phenomena. Mathematical models are abstract structures whose properties can potentially stand in relations to mathematical representations of phenomena. Computational models are sets of procedures that can potentially stand in relations to a computational description of the behavior of a system. Michael Weisberg, Simulation and similarity
² See the Stanford Encyclopedia of Philosohy’s entry for “Models in Science”
³ Timothy Williamson, Philosophical method: a very short introduction, Oxford University Press
⁴ This is not inconsistent with the principle of “starting with theory” (see previous article). While it’s typically preferable to start with theory, as we have seen, knowledge creation is a dialectical, cyclical process. “Bottom-up” thinking is what we do when we integrate data into frameworks.
[14 Nov 2022 edit: added paragraph about abstraction as parametrization]