Learning How to Learn: Abduction as the ‘Missing Link’ in Machine Learning

During the 1956 Dartmouth Summer Research Project on Artificial Intelligence, which some consider to be the birthplace of AI research, the problem of artificial intelligence was posed as an attempt ‘to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.’¹ Here the link between symbolic AI and machine learning becomes apparent, although many still claim that the latter refers only to connectionist models.² In fact, the problem of machine learning, which lies at the heart of artificial intelligence, can be described ‘on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.’³ In the following paper, I would like to return to this initial understanding of machine learning and explore the conceptual underpinnings of learning processes. In particular, I will ask whether Bayesian epistemology, as a form of conjectural reasoning (i.e. hypothesising), is the missing link between symbolic AI, represented by the Dartmouth proposal, and connectionist models, crystallised in today’s neural networks. This is necessary because recent achievements in artificial intelligence, especially around transformer models, have led to a one-dimensional debate in favour of neural networks, i.e. connectionism.⁴
While symbolic AI typically follows a deductive logic, connectionism embodies an inductive approach. However, following Charles Sanders Peirce, there is a third category of reasoning that may be necessary to get the full picture of machine learning: abduction, which allows for the generation of new concepts and thus a way to learn how to learn. While contemporary debates about machine learning focus primarily on its inductive nature, essentially reducing it to yet another form of statistical inference, highlighting the relationship between deductive, inductive and abductive reasoning will help to broaden the debate and point to the conjectural basis of machine learning. This is not an attempt to simply return to classical AI and deductive reasoning, but to argue that abduction, precisely because it relies on hypotheses and conceptual knowledge, offers a way to mediate between inductive and deductive AI and thus provide a better understanding of machine learning processes. By the example of Bayesian belief networks, which arguably rely on abductive reasoning, I will further show that machine learning, whether called abductive or not, always involves some form of situatedness. In this way, the discussion of Bayesian learning models can help to make explicit what is usually left implicit in the current debates and techniques of machine learning.

I) Induction vs. Deduction

The question of whether or not computers can learn is at the heart of the quest for AI.⁵ It dates back to Alan M. Turing’s Computing Machinery and Intelligence (1950), in which he famously challenged Ada Lovelace’s claim that machines could not surprise us.⁶ For Turing, this belief is based on the false ‘assumption that as soon as a fact is presented to a mind all consequences of that fact spring into the mind simultaneously with it.’⁷ In other words, for something ‘surprising’ to happen, we assume that a ‘creative mental act’ occurs spontaneously, ignoring all the training and work that goes into it. According to Turing, this is an unfair juxtaposition, particularly given the decades of education that an adult typically receives. The issue of machine learning must therefore be split into two distinct parts: the initial programming of a childlike mind, and the subsequent educational process. This clear emphasis on the educational aspects is often lost in today’s debates about AI. For Turing, however, the teacher-pupil relationship is fundamental. As long as experience can be conceptualised and taught to a machine, Turing sees no reason why such a machine should not ‘compete with men in all purely intellectual fields’⁸ and, as a consequence, surprise us.⁹
The quest for AI is usually presented as a competition between two opposing approaches: Symbolic AI, often derisively dubbed Good Old-Fashioned AI (GOFAI) by its critics, is based on the long-held belief that rational truths can be encoded in such a way that a machine can process them automatically. This deductive approach allows data to be processed according to defined rules, transforming inputs into outputs using formal logic.¹⁰ Connectionism, on the other hand, takes an inductive approach, going beyond purely symbolic logic. It generates rules based on input and output data, which can then be applied to new, previously unknown data. In the connectionist vision, artificial neural networks generate patterns that are used to classify the world, similar to the human brain, which – at least in this view – also has a model derived from training data for all possible phenomena, such as a cat or abstract mathematical concepts.¹¹
The problem with induction, of course, is that it tempts us to simply use past experience to make predictions about experiences we have not yet had, and to make general claims that go beyond what we have experienced.¹² In the case of predictive systems, which is the case for most of contemporary AI systems, this raises the significant issue of bias and, as a consequence, discrimination against marginalised groups.¹³ For example, an inductively derived prediction that women are less likely to succeed in the labour market than men does not rectify past structural inequalities; rather it serves to perpetuate them.¹⁴ But the problem points to an even older debate about how we experience and perceive the world. While the classical AI approach is rationalist in the sense that it presupposes formalisable rules that need only to be programmed into a machine, the neural network paradigm is empiricist in that it is not about a predetermined set of rules, but rather about an iterative approach to reality. Consequently, certainty is replaced by prediction, the rational by the empirical world.
The empiricist nature of deep neural networks has led to a significant increase in the efficacy of pattern recognition. The combination of vast quantities of training data and the seemingly limitless computing power of large Internet companies has rendered the elegant algorithms of the symbolists obsolete. Take machine translation as an example: the conventional approach has been to define the vocabulary and grammatical rules of at least two natural languages, with the objective of enabling a machine to translate word sequences from one language into the other. The inherent limitation of this approach is that natural language cannot be reduced to its syntactic level, which is why until recently machine translations have sounded very clumsy and became the subject of countless Internet jokes. This changed abruptly at the end of 2016. After five years of intensive work on and with connectionist models, Google Translate rolled out its new system overnight, silencing the sceptical voices that had previously been raised.¹⁵ Its success was pivotal in the development of today’s large language models (LLMs) and the breakthrough of generative AI.
Nevertheless, a look at the material basis of machine learning raises the question of how ‘artificial’ artificial intelligence really is. Not only do AI models still require a huge amount of human labour and actual infrastructure, they also rely on ‘general social knowledge’ that has been accumulated over centuries, if not millennia.¹⁶ The recent surge in automation can thus be seen as another stage in a much longer process. The transition from deductive to inductive models represents a significant shift not only in the field of AI research, but also in the broader context of data processing, marking a move from classical to non-classical approaches.¹⁷ This is exemplified by so-called transformer models (e.g. BERT, GPT, T5), which are essentially deep neural networks that establish relationships in sequential data. Following the initial identification of a pattern, such as a word, the model then searches for further patterns, for example a sentence, using previous results to facilitate the process.¹⁸ Consequently, data processing is not based on deductive reasoning alone, but on inductive inference, where the necessary rules are implicitly calculated from the data. For proponents of connectionism, this is the central tenet of learning, applicable to both human and machine cognition.¹⁹
The learning process enables automated pattern recognition based on inductive reasoning; however, it does not allow for ‘conceptual breakthroughs’ in the sense of an artistic or scientific discovery.²⁰ This limitation has led numerous contemporary AI critics, such as Matteo Pasquinelli, to regard these models as merely an application of ‘statistical inference’.²¹ As noted above, the issue with induction is that it establishes the past as an implicit rule for the future, which ultimately leads to models reproducing the past over and over again. This makes generative AI rather generic, as the results of contemporary transformer models show. Yet, this criticism, as it stands, only applies to inductive machine learning systems, which have been developed as a contrast to deductive AI models. In light of Turing’s initial formulation of the problem, it is my intention with this paper to broaden the scope of the debate by examining the role of abductive reasoning in machine learning processes to see if a ‘creative mental act’ is indeed possible.

II) Abductive Reasoning

In the final section of Computing Machinery and Intelligence, Turing speculates about the prospective trajectory of AI research. He postulates that the machine could potentially engage in the most abstract forms of cognition, such as playing chess. Alternatively, it could be equipped with sensory organs to facilitate the acquisition of natural language and to learn how to communicate.²² According to Turing, both approaches should be pursued, and it is notable that both are reflected in the comparison of symbolic and connectionist AI outlined above. While connectionism adopts an empiricist stance and draws upon the brain as the source of intelligence, symbolists invoke the mind as the domain of rational reasoning and, consequently, learning. However, this juxtaposition often ignores Turing’s reference to the ‘creative mental act’ and its central place in the learning process, which is key to his understanding of machine learning. Where does it come from? And how can it be conceptualised?
One could argue, as Turing seems to suggest, that any ‘creative mental act’ depends on pre-existing knowledge in the form of concepts that must be acquired in order to perform such an act. Therefore, if a machine can learn how to learn (i.e. make use of conceptual knowledge), it will be able to come up with all sorts of new things. However, this cannot be achieved by induction or deduction alone. Deductive learning is a priori and therefore requires the machine to ‘have a complete system of logical inference “built in”’²³ or rather programmed into it beforehand. Induction, on the other hand, allows for a learning a posteriori, but only in a very limited way, that is through the inference of already existing experience. Only abduction, a form of explanatory reasoning to generate a hypothesis, entails a real learning effect as an intuitive conception of something new.²⁴ Imagine a machine that, instead of just being told what the problem is and then coming up with a solution based on data, can find by itself the problem in the data it is given. Abduction thus can be understood in terms of an ‘authentic problem finding’, that is, in the formulation of hypotheses.²⁵
According to Charles Sanders Peirce, abduction ‘is the only logical operation which introduces any new idea’²⁶ and as such it precedes deduction and induction.²⁷ In contrast to abduction, deduction is a process of tracing out the necessary conclusions from an abductively derived idea, whereas induction is a method of testing those conclusions, as well as the idea from which they arise. Consequently, only abduction can account for ‘all the operations by which theories and conceptions are engendered.’²⁸ In this context, it is important to note that Peirce’s conception of the three logical methods, especially their relation to each other, has changed over the years. And yet there are clear tendencies in his theory: Given its a posteriori nature, induction is more closely aligned with abduction than deduction. However, because of its limited character, it is incapable of generating new concepts. Instead, it serves to confirm the conclusions reached through abduction. Peirce writes: ‘[T]he essence of an induction is that it infers from one set of facts to another set of similar facts, whereas hypothesis (i.e. abduction) infers from facts of one kind to facts of another.’²⁹ Hence, abduction can deal with conceptual change, which in turn allows for innovative ideas.³⁰
As KT Fann observes, the abductive approach places Peirce in opposition to the tenets of positivism, who propagate a “descriptive” theory of science, according to which the propositions of science should properly describe sense-impressions.’³¹ For positivists, a hypothesis serves at most to guide an observation, but not to generate new knowledge. This implies that an idea as a ‘creative mental act’ is not possible, since any account of the world is only possible through observation.³² The problem with such an approach is that it cannot do justice to the unperceived elements involved in the creation of knowledge. Indeed, learning itself seems to be based on abductive insights rather than merely observable facts.³³ This is crucial, because it runs counter to the currently dominant paradigm in machine learning, which, as we have seen before, is based on inductive (i.e. case-by-case) inference. In fact, machine learning models – supervised or not – rely heavily on ‘hidden layers of knowledge production’ as they constantly infer from social categories and norms.³⁴ For example, when features are learned from data they often correlate with gendered and racialised categories, which has a detrimental influence on the overall learning of machine learning models.³⁵ Despite positivist claims that models learn those features automatically from unlabelled data, the learning process still makes use of conceptual knowledge, which is derived from pre-existing categories.
Any form of learning, whether machinic or not, depends on categories, but these categories cannot be experienced, they are always generalised from specific instances. The generalisation of abstract ideas in the form of categories is a process of abduction rather than induction, because it always relies on some prior knowledge of what those categories should be. In this sense, abduction is indeed closer to induction because it allows for a conceptual leap, but this leap is reversed: with abduction, we do not induce a general category from empirical instances, but we infer that the instances we encounter belong to the same category. Such implicit inference is biased, of course, but it is necessary for all learning processes because it allows us to generalise specific instances into abstract categories.³⁶
It follows that ‘ML systems cannot be anything other than abductive: the contingent biases of a given data set invariably provide the grist for any algorithmic model trained on that data.’³⁷ At first glance, this statement may seem to contradict the previous one, that neural networks are merely inductive, and, as a consequence, cannot learn anything new. Yet despite their inductive character, which allows them to extract ‘objective’ patterns from a vast amounts of data, they also rely on ‘subjective’ belief systems to infer (via hidden layers of knowledge production) from individual and socially situated knowledge.³⁸ This shows that abductively derived hypotheses, which are implicitly negotiated in connectionist approaches, need to be made explicit in order to get the full picture of machine learning processes. Now, making prior beliefs – together with their social biases and inequalities – explicit is exactly what Bayesian reasoning as a form of explanatory inference promises to do. In fact, ‘the explanatory considerations (of abduction) may serve as a heuristic to determine, even if only roughly, priors and likelihoods in cases in which we would otherwise be clueless and could do no better than guessing.’³⁹ For Igor Douven, abductive inference extends beyond the scope of Bayesian inference. However, the latter is itself inherently abductive, as it also relies on explanatory considerations.⁴⁰
Bayesian inference could thus provide the missing link between purely inductive and deductive machine learning models. According to Hanti Lin, the assumption ‘that beliefs can come in different strengths is a central idea behind Bayesian epistemology.’⁴¹ One application can be seen in Bayesian belief networks (BBNs), which are powerful graphical models used to visualise and reason about uncertain knowledge.⁴² They consist of nodes, representing variables, and directed edges, depicting the causal relations between these variables. BBNs leverage the idea that there are ‘degrees of belief’ to update the probability estimates for a hypothesis as new evidence becomes available, allowing for dynamic inference in complex systems.⁴³ Because of their ability to deal with uncertainty and provide explanatory inference, these networks are widely used in fields as diverse as artificial intelligence, medical diagnosis, risk assessment and decision making under uncertainty.⁴⁴ For example, in medical diagnosis, a BBN can be created to diagnose a disease based on symptoms and risk factors. Using data from medical records, a structure is established using prior probabilities and assumptions about the likely relationships between the variables. The result is a network with directed edges showing these dependencies. If a patient shows a symptom and has a specific risk factor, the BBN will update the probabilities, allowing it to infer the likelihood of an assumed disease.⁴⁵
Bayes’ theorem plays a crucial role in BBNs by providing a mathematical framework for updating the probabilities of the nodes based on new evidence. As evidence is observed, the theorem allows the network to compute the posterior probabilities of the nodes by combining prior probabilities with the likelihood of the observed evidence. This process enables the network to dynamically adjust its beliefs about the state of the system, leading to improved inference and decision-making. By systematically incorporating new information, BBNs can effectively manage uncertainty and support reasoning in complex scenarios. In addition, the visualisation of prior beliefs about the variables in question allows BBNs to recognise explanatory considerations more clearly. Factoring in the priors and likelihoods of a possible outcomes can thus make BBNs a model for abductive learning processes.⁴⁶
Such a heuristic distinguishes BBNs from purely deductive or inductive methods. Unlike deductive expert systems, such as those developed in symbolic AI in the 1970s and 1980s, BBNs allow for uncertainty to be dealt with in computational analyses by incorporating conditional probabilities. As non-deterministic, probabilistic models, they therefore appear to be closer to the inductive paradigm of neural networks. However, and this is crucial to the argument here, Bayesian epistemology provides not only a probabilistic framework for making more robust and reliable predictions, as would be the case with Bayesian neural networks (BNNS), but also a means of theorising hypothesis building itself.⁴⁷ Herein lies the link to abductive reasoning as a way of generating plausible hypotheses based on prior knowledge. What this allows for, at least in theory, is to conceptualise hypothesis building as a form of ‘strong abduction’ in the context of Bayes’ theorem.⁴⁸
Bayesian epistemology can be defined as having the ‘goal of explaining or justifying a wide range of intuitively good epistemic practices.’⁴⁹ Although this involves a rather narrow focus on an a posteriori credence change (i.e. the updated belief after considering new data), it also assumes the a priori representation of what is initially believed or expected (i.e. the belief based on prior experience). For example, when you meet someone new and instinctively form a judgment about that person, you are likely to draw on your previous experience (prior belief) and adjust that judgment as you gather more clues (new evidence). What makes this epistemological framework powerful and useful is that, in addition to its a posteriori nature, it allows for prior degrees of belief to be made explicit in explaining a possible outcome. It therefore allows us to understand learning as an intuitive experience of something new, and thus to value intuition as a central element of machine learning processes.⁵⁰
In contrast to a strictly connectionist view, the epistemological framework offered by Bayesianism may provide a way to open the debate about machine learning to ‘interpretability, explainability, and trustability.’⁵¹ As probabilistic models, Bayesian belief networks can serve as a formal mechanism to support and enhance abductive reasoning, providing an approach for generating and evaluating hypotheses. Unlike their connectionist counterpart (i.e. deep neural networks), which learn the relationships between data variables implicitly, BBNs achieve this by making pre-existing knowledge of the data structure explicit. They employ a rational ‘formalism for representing, learning, and reasoning about causal relations.’⁵² The notion that Bayesian belief networks represent ‘causal relations’ – or, as Eugene Charniak puts it, attempt ‘to model a situation in which causality plays a role but where our understanding of what is actually going on is incomplete’⁵³ – is based on the premise that past experience does not automatically generate predictions beyond what has been experienced; instead, the prediction is intuitively inferred through explanatory factors. In the absence of extensive (training) data, the network and its parameters can be estimated using prior information, such as expert knowledge or everyday assumptions. Bayesian approaches thus explain real-world problems probabilistically, without sweeping explanatory considerations under the carpet of automatic inference.⁵⁴
The probabilistic nature of Bayes’ theorem offers a potential avenue to rethink the current state of machine learning. By incorporating pre-existing knowledge as a fundamental element of learning processes, Bayesian models represent, in a more general sense, an abductive approach to learning how to learn.⁵⁵ As mentioned before, such models can be used to derive assumptions in the form of mathematical formulae that allow for the calculation of the conditional probability of an event given that some additional event has occurred, or that some additional knowledge has been acquired. This process of probabilistically generating causality from priors is central to Bayesian learning; however, as a form of ‘weak abduction’ it also highlights the pivotal role of conjectural reasoning in all machine learning systems. When filling in missing information, machine learning models rely heavily on cultural assumptions and biases.⁵⁶ And as is the case with any other form of prejudiced generalisation or biased belief system, this can create a vicious circle when applied to automatic inference.⁵⁷

III) Unlearning Machine Learning

In the context of contemporary machine learning models, ranging from medical applications to product recommendation to predictive analytics, there is a tendency to conflate Peirce’s distinction between abductive, deductive and inductive aspects of inference. As Luke Stark has argued, this poses a significant challenge: abductive insights generated by these models, are ‘often misinterpreted in contemporary work around AI as reliably reproducible truths.’⁵⁸ As a result, the pragmatist assertion that a hypothesis obtained by abduction must be subjected to verification is transformed into the positivist view that the data itself yields the truth. Instead of a verifiable narrative, ‘automated conjecture’ via statistical inference simply induces concepts and categories about the world without knowing them.⁵⁹ Not only does this lead to the abandonment of established practices of critical examination and interpretation, but it also contravenes Peirce’s ‘method of methods’ as the foundation for evidence-based knowledge production and thus true invention.
Those who advocate for the current inductive machine learning paradigm, including deep neural networks and transformer models, claim that artificial intelligence is essentially an empirical endeavour that cannot be explained by logical or theoretical means.⁶⁰ However, a crucial aspect that is often overlooked in contemporary AI debates is the fact that machine learning algorithms ‘must embody some knowledge or assumptions beyond the data it is given in order to generalize beyond it.’⁶¹ A machine learning algorithm cannot directly experience the world. Instead, representations of the data must be created to allow the model to see it. In other words, for the model to learn, features must be selected (often even created) that – in the eyes of the still very human trainer – best represent the data.⁶² This basic insight contradicts the common idea that we, or the models, only need to look at the data to get the desired result. What is overlooked in this rather naïve view is the fact that the desired outcome is always already inscribed in the process. With each iteration, the model gets more and more tweaked towards good property values in order to filter out the right information from the data set. Now the problem with such an optimisation process is that it relies heavily on abduction, respectively the obfuscation of it. Instead of making clear the abductive assumptions, they get automated and transformed into a supposedly self-learned truth. In today’s inductive machine learning, basic assumptions about the application domain are automatically translated into models, resulting in the obfuscation of those assumptions.
In feature learning, for example, “‘each training case is [treated] as a vector of desired outputs of a stochastic generative model.”⁶³ As a form of unsupervised learning, the structure is captured implicitly in a set of input vectors (i.e. the mathematical representation of the data), in order to generate correlation in the output vectors (i.e. learned feature representations that capture the underlying patterns and dependencies within the data). But rather than being a magical process, the whole process depends on someone modelling the data. This involves selecting appropriate architectures, objective functions and learning algorithms to guide the model towards the desired outcomes. Again, the production of knowledge is indeed conjunctural, albeit cloaked in empirical objectivity. What a Bayesian approach might allow us to do, then, is to make the learned feature representation explicit, and to provide a more intuitive understanding of how certain features affect the outcomes. The graph structure of BBNs would also make it easier for domain experts to interpret and modify the structure of the networks, while providing a rationale for why these features were chosen over others.⁶⁴
Similar to approaches in medical professions, knowledge production in AI is not based solely on direct observation. Rather, it presupposes prior knowledge, which is essential for the modelling of learning algorithms. The conjectural, that is abductive nature of these algorithms allows us to consider their ‘situated knowledge’, not least because ‘abduction is context sensitive in that its use heavily relies on background knowledge, most notably, for making judgements of explanatory goodness.’⁶⁵ Rather than simply inducing concepts and categories through ‘automated conjecture’, which ultimately serves the positivist view that data itself provides the truth about the problem being analysed, assumptions about the problem, and therefore its analysis, are seen as a central part of the modelling process. This, of course, resonates with critical approaches to epistemology, such as the one by Donna Haraway, who writes that ‘[s]ituated knowledges require that the object of knowledge be pictured as an actor and agent, not as a screen or a ground or a resource, never fully as a slave to the master that closes off the dialectic in his unique agency and his authorship of “objective” knowledge.’⁶⁶ Against a universalist claim to truth, but still grounded in the larger enterprise of epistemological truth-seeking, Haraway goes against the notion of positivist objectivity, emphasising positioning as a key practice in scientific discovery. Similarly to Peirce, situated knowledge cannot be reduced to observable facts alone, but always relies on the conceptual knowledge in which one is situated. Not only is every inference conceptual, but conceptual knowledge is also subject to the learning process. With each iteration, the concepts necessary to produce knowledge about the application domain (i.e. the algorithm’s belief system) are updated according to the situation the learner is in.
The example of Bayesian belief networks is also interesting in this context. BBNs typically require smaller data sets because they make use of prior knowledge (i.e. concepts) about the structure of the data. This already implies a certain situatedness within the data, which goes against the ‘black box’-mentality of current machine learning approaches. As a result, BBNs are more interpretable and use probabilities to explicitly model uncertainty, allowing for clear, interpretable inference. Furthermore, the use of expert knowledge, also in form of those affected by ML-systems, helps to build more trustworthy AI. As Dan McQuillan writes: ‘Participant’s direct knowledge of prevailing conditions helps determine both what data is reliable and what responses are most important.’⁶⁷ Bayesian epistemology can therefore also provide a way of addressing the alignment problem in AI, which refers to the challenge of ensuring that the goals of ML-systems align with human values, intentions, and ethical standards, especially as they become more powerful and autonomous. Understanding how decisions of a system are made, and what can be learned from them, is critical to building more trustworthy AI.⁶⁸ As the demand for explainable AI (XAI) grows, we may thus see increased attention to BBNs and other interpretable models in the near future.
By using BBNs, data scientists and machine learning practitioners can gain insights into the feature extraction process, understand the relationships between variables, and provide more transparent explanations of model predictions. This requires an intelligible standpoint from which to understand a problem. Again, the abductive standpoint, as represented by Peirce, plays a central role here. It implies that all knowledge is essentially partial and thus cannot be purely objective or value-neutral.⁶⁹ But precisely because explanations are limited by their socio-cultural context, ‘we might also expect new values – especially theoretical ones – to appear over time.’⁷⁰ As Zachary Wojtowicz and Simon DeDeo further explain, it is the abductive nature of explanatory values that leads us to prefer one explanation over another.⁷¹ And as such, they need to be made explicit in machine learning processes, because they are largely responsible for the generalisation of the processed data.
The essence of machine learning is the ability to generalise. It operationalises the notion that a number of discrete cases can ‘get united in a general idea.’⁷² As such, it is truly abductive, abstracting the world into models that, in turn, influence the world. Beyond immediate sensation or perception, this process of generalisation gives rise to abstract concepts that are fundamental to the learning process. It is important to note in this context is that the principles of abductive learning ‘are not limited by fixed categories, but are essentially determined by the social practices of learning that are embedded in the conceptual infrastructures.’⁷³ Consequently, learning itself is animated by social practices, necessitating a collective process of elaboration. What Turing made clear, but is almost forgotten in today’s machine learning practice, is the fact that learning does not happen on an individual level alone, but is always embedded in socio-cultural relations. Not only humans, but also machines, need to be educated, whereby education does not simply mean teaching existing concepts, but developing them together. This, in a nutshell, is Lev Vygotsky’s understanding of learning, which, according to the Russian Marxist psychologist, creates socially mediated minds.⁷⁴
Updating this approach for contemporary AI debates, Tyler Reigeluth and Michael Castelle propose ‘a social theory of machine learning: i.e., a theory that accounts for the interactive underpinnings and dynamics of artificial “learning” processes.’⁷⁵ Although they argue against an emphatic notion of machine-based concept learning – at least as far as neural networks are concerned – the authors do see value in considering machine learning in relation to human learning. As such, ‘machine learning algorithms are not merely executors or implementors of prior or external social norms or knowledge; instead, their activity reshapes collective activity as much as it is shaped by it.’⁷⁶ What makes this approach valuable for a theory of abductive machine learning is that it underlines the need to reconsider the current ML-paradigm with its focus on individualistic learning structures. According to a social theory of machine learning, in which the social and the technical are not mutually exclusive, but rather embedded within one another, we are indeed all ‘machine learners’.⁷⁷ As much as the machine learns from us, we learn from it, not least because both, humans and machines, are part of the same symbolic realm (i.e. language) as the primary mediator of all cognitive and therefore learning processes.
Unlearning machine learning in its current form, requires the recognition of language as the driving force in concept learning. Vygotsky sees language as a social product. By using signs, we develop higher mental processes in the form of concepts, which are not simply the result of development, but must to be actively learned.⁷⁸ What this means for machine learning is that learning is not simply a matter of processing pre-programmed concepts, nor is it the self-generation of mental functions by neural networks. Instead, it is the formation of new ideas through socially meaningful activity. From this perspective, computers are inherently social, as can already be seen in Turing’s notion of the ‘thinking machine’ (evaluated through a conversational setting). They function through human-machine interaction, during programming, usage and training, as well as through the internalisation of socio-cultural knowledge. Human logic shapes their development as much as they shape it.⁷⁹

Conclusion

Returning to the beginning of this article, the question arises as to whether a machine can think, and therefore learn, abductively. This concerns not only Turing’s question of learning machines – and whether they can surprise us – but also a normative framework within which computational inference can take account of prior knowledge, including its social prejudices and cultural biases. There is much to suggest that current AI models cannot do this, or only to a very limited extent. Because of their inductive nature, artificial neural networks can only repeat what is already there. AI-systems based on deep learning, such as transformer models, promise to extract patterns from data without relying on theoretical models. However, as this article has shown, learning is more than pattern recognition, not least because it involves forming a prior belief about what is being observed. Such a hypothesis is always limited by its subjective nature. And yet it does allow for generalisation, because missing information can be filled in intuitively. Just as children do not need thousands of examples to visualise a chair, a cat or their way to school, machines could – and arguably already do in the form of ‘weak abduction’ –draw on situated knowledge to close the informational gap. In this respect, an abductive understanding of machine learning, as exemplified by Bayesian belief networks, may provide the missing link between deductive and inductive approaches, allowing for a more complete picture of machine learning processes.
The question of whether a strong form of abductive machine learning is possible depends less on how the technologies are currently used (e.g. in the form of artificial neural networks) than on how they could be used.⁸⁰ So far, machine learning models have mainly been used to solve problems, but not to find them. This may sound banal, but it actually goes to the heart of what Peirce means by abduction: ‘The abductive suggestion comes to us like a flash. It is an act of insight, although of extremely fallible insight. It is true that the different elements of the hypothesis were in our minds before; but it is the idea of putting together what we had never before dreamed of putting together, which flashes the new suggestion before our contemplation.’⁸¹ Such a creative mental act gives rise to new ideas, concepts or problems. It allows for the famous ‘aha’ moment, which is neither reducible to the deductive inference from pre-programmed rules, nor data-driven inductive learning. As Luciana Parisi points out with regard to machine learning: ‘If abduction has a logical form that is distinct from deduction and induction, it is because, when working computationally, the selective or creative activities of this retroactive thinking (i.e. that starts from consequences to track causes) involves hypothesis generation and not simply an explanation of consequences.’⁸² Abductive thinking, as seen in science and the arts, breaks through existing patterns to forge novel connections.
Two conclusions can be drawn from what has been said so far: Firstly, Bayesian belief networks, which may allow for the representation of abductive reasoning, could be employed to explicitly integrate conjectural knowledge into the modelling process of machine learning. Highlighting the abductive nature of Bayesian reasoning could help us to better understand the explanatory values and considerations involved in the adoption of specific hypotheses in machine learning. This would, in turn, facilitate the interpretation of predictions made by machine learning models.⁸³ As a form of ‘weak abduction’ (i.e. identifying a probable truth as an approximate truth of the best explanation), this approach can also shed light on the pervasive process of ‘beginning with an end target and abductively working back to adjust the parameters of the model in order to converge on the target.’⁸⁴ This is crucial because, as Louise Amoore writes, contemporary discussions about political issues are increasingly foreclosed by machine learning practices in which modelling has become an end in itself.
Secondly, Bayesian belief networks point beyond themselves by invoking a form of ‘strong abduction’. Rather than simply processing data retrospectively, their abductive nature may allow them to work speculatively, generating new ideas from pre-existing knowledge. As Parisi states: ‘This learning through hypothetical processing may coincide with the speculative and transcendental elaboration of algorithmic retroduction, whereby consequences (or results) are not only tracked back to their causes (by means of explanation) but are also, importantly, hypothesized beyond the observable’.⁸⁵ For Parisi, it is evident that this learning process can be effectively carried out by a machine, even suggesting the possibility of a ‘general artificial intelligence’. However, the meta-level abduction that the latter would require, cannot be achieved by inductive reasoning alone, as this article has tried to argue. Rather than reducing abduction to a data-driven verification of reproducible truths, it needs to be recognised in its capacity to generate new concepts and thus a way of learning how to learn.⁸⁶
It remains to be seen how Bayesian epistemology, with its ability to infer a general theory from incomplete information, can contribute to this vision. As a form of abductive concept learning, Bayesian belief networks may indeed be able to generate new insights. However, while Bayes’ theorem can be seen as an abductive process in itself, since it also relies on explanatory considerations, abduction is not Bayesian per se. Indeed, causality or causal inference may be too complex for BBNs to capture, limiting what they can learn. But unlike many common machine learning techniques that use frequentist methods (e.g. linear or logistic regression), Bayesian epistemology at least offers the prospect of a causal rather than a merely correlative framework; and as such, promises to make prior knowledge explicit in AI systems. This is necessary if we want not only to predict, but also to intervene in and reflect upon future knowledge.⁸⁷
Whatever the actual implementation of such approaches may look like, it will ultimately depend on how these systems can deal with contingency, not just uncertainty. Or, in the words of Lucy Suchman: ‘In the hands of a critical practitioner, encounters with the contingency and partiality of knowing are taken not as a sign of a failure that needs to be hidden but of the irremediable openness of worldly relations. Those relations involve modes of learning that are deep, not in the sense of the multiplication and ingenious manipulation of homogeneous arrays of numbers but through their implication in practices of ongoing and heterogeneous world-making’.⁸⁸ It is my conviction that this is what Turing ultimately had in mind: to find a way of teaching the machine to learn how to learn, so that it can eventually surprise us.

John McCarthy, Marvin L. Minsky, Nathaniel Rochester, and Claude Shannon, “A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence,” 1956, https://home.dartmouth.edu/about/artificial-intelligence-ai-coined-dartmouth. In full the text reads: ‘We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.’ ↩
On the difference between symbolic AI and connectionist models, see Dominique Cardon, Jean-Philippe Cointet et Antoine Mazières, “Neurons spike back. The invention of inductive machines and the artificial intelligence controversy,” Réseaux 5:211 (2018): 173–220. ↩
McCarthy et al, “Proposal.” ↩
Today machine learning usually comes in the form of artificial neural networks (i.e. connectionism); the idea is that underlying structures, which help us (and machines) to learn, are modelled as distributed representations in hidden layers of units that relate input and output data. The problem with these models is that they are implicit summaries of the input-output-relation and cannot be made explicit (i.e. in form of rule-governed representations such as is the case in ‘good-old fashioned AI’ or ‘expert systems’). ↩
See Nils J. Nilsson, The Quest for Artificial Intelligence. A History of Ideas and Achievements (New York: Cambridge University Press, 2010), in particular Chapter 29. See also Margaret A. Boden Artificial Intelligence: A Very Short Introduction (Oxford: Oxford University Press, 2018); John Haugeland, Artificial Intelligence: The Very Idea (Cambridge, MA.: MIT Press, 1985); Stuart Russel and Peter Norvig, Artificial Intelligence. A Modern Approach (London: Pearson, 2020). ↩
Alan M. Turing, “Computing Machinery and Intelligence,” in The Essential Turing, edited by B. Jack Copeland, 441-464. Oxford: Oxford University Press, 2004, 455f. Ada Lovelace was skeptical about self-learning capacities of machines, because they were built to follow instructions written by the programmer rather than to create anything itself. ↩
Ibid., 451. ↩
Ibid., 460. ↩
The question of experience is, of course, the crux of the whole problem: what constitutes an experience on the one hand, and how to conceptualise it on the other. The problem revolves around the question of whether there is a general law that encompasses all possible individual cases of this experience (deduction), or whether this experience is derived from individual cases (induction), or whether experience is always already pre-structured by certain categories (abduction). ↩
This line of thinking (i.e. the formalisation of all human knowledge, which could then be calculated using symbolic logic) goes back at least to the German mathematician and ‘last universal genius’ Gottfried W. Leibniz. See e.g. Gottfried W. Leibniz, Schriften zur Logik und zur philosophischen Grundlegung von Mathematik und Naturwissenschaft (Frankfurt a.M.: Suhrkamp, 1996). ↩
This view is reflected by some of the leading figures in AI research with their faith in back-propagation as a way to match human cognition. See e.g. Yann LeCun: “Learning world models: The next step towards AI,” Keynote lecture, International Joint Conference on AI (IJCA) (Stockholm, 2018). ↩
It was David Hume, who first formulated the problem in 1739. David Hume, Treatise of Human Nature (Philosophical Classics) (Garden City: Dover Publications, 2003). ↩
Clemens Apprich, Wendy Hui Kyong Chun, Florian Cramer, and Hito Steyerl, Pattern Discrimination (Minneapolis: Minnesota University Press, 2018). ↩
This was the case with the so-called AMS algorithm, a predictive model developed by the Austrian Public Employment Service (Arbeitsmarktservice or AMS) in 2018 to classify job seekers and allocate support resources. The AMS algorithm has sparked debate due to its use of sensitive personal data and its potential to reinforce societal, particularly gender, biases in employment opportunities. ↩
See Gideon Lewis-Kraus, “The Great A.I. Awakening”, New York Times, December 14, 2016, https://www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html. ↩
Matteo Pasquinelli, The Eye of the Master: A Social History of Artifical Intelligence (London: Verso, 2023), chapter 4. See also Sybille Krämer’s argument that the computer, on which AI models still rely, has been developed ‘within us’ long before it was invented as an actual device, in Sybille Krämer, Symbolische Maschinen (Darmstadt: Wissenschaftliche Buchgesellschaft, 1988). ↩
For some AI is no longer programmed, but grown (see Pedro Domingos, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World (New York City: Basic Books, 2015)), Introduction). ↩
Transformer models use the mathematical concept of ‘attention’ to emphasise certain sequences in the data over others. Ashish Vaswani, et. Al., “Attention Is All You Need,” arXiv (2017), https://doi.org/10.48550/ARXIV.1706.03762. ↩
See, for example, Geoffrey Hinton, “Where do features come from?”, Cognitive Science 38:6 (2014): 1078–1101. ↩
See Ruairidh M. Battleday and Samuel J. Gershman, “Artificial intelligence for science: The easy and hard problems,“ arXiv (2024), https://arxiv.org/html/2408.14508v1. ↩
Matteo Pasquinelli, “Machines that Morph Logic: Neural Networks and the Distorted Automation of Intelligence as Statistical Inference,” Glass Bead Journal 1 (2017), https://www.glass-bead.org/article/machines-that-morph-logic. For Pasquinelli, this is a form of weak abduction, although in his view machine learning methods do not allow for strong abduction. I adopt the distinction between “‘weak” ’ and “‘strong abduction” ’ in this article, although for me the question of whether machines can be abductive remains open. ↩
Turing, “Computing Machinery and Intelligence”, 463. ↩
Ibid., 461. ↩
Charles S. Peirce, “On the Logic of Drawing History from Ancient Documents, Especially from Testimonies,” in The Essential Peirce, Vol. II, ed. Peirce Edition Project (Bloomington/Indianapolis: Indiana University Press, 1998), 75–114. ↩
See Mark A. Runco, “AI can only produce artificial creativity,” Journal of Creativity 33 (2023), https://www.sciencedirect.com/science/article/pii/S2713374523000225. Runco remains sceptical about such ‘authentic problem-finding’ by machines. ↩
Charles S. Peirce, The Collected Papers of Charles Sanders Peirce, Vol. II: Elements of Logic, ed. Charles Hartshorne and Paul Weiss (Cambridge: Harvard University Press, 1931–1935), 216. ↩
Charles S. Peirce, “On the Logic of Drawing History from Ancient Documents,” 106f. See also, Wim Staat, “On Abduction, Deduction, Induction and the Categories,” Transactions of the Charles S. Peirce Society XXXIX (1993): 225–237. In fact, Peirce conceptualised ‘abduction’ quite differently to what it is currently understood to mean. A principal distinction concerns the scope of the term: whereas contemporary logic usually situates abduction within the “‘context of justification” ’ as the stage of scientific inquiry where theories are evaluated, for Peirce, abduction belongs within the context of discovery, which encompasses the generation of theories that may subsequently be assessed (see Igor Douve, “Abduction”, The Stanford Encyclopedia of Philosophy (2021), https://plato.stanford.edu/entries/abduction/peirce.html. ↩
Charles S. Peirce, The Collected Papers of Charles Sanders Peirce, Vol. V: Pragmatism and Pragmaticism, ed. Charles Hartshorne and Paul Weiss (Cambridge: Harvard University Press, 1931–1935), 413. ↩
Peirce, The Collected Papers, Vol. II, 386. ↩
Conceptual change is arguably what a metaphor does. Within AI-debates, the question therefore remains whether a machine is capable of producing metaphors or not (cf. Umberto Eco, Semiotics and the Philosophy of Language (Bloomington: Indiana University Press, 1986), 127. ↩
KT Fann, Peirce’s Theory of Abduction (Singapore: Partridge, 2020), 47. ↩
It is interesting to see how this positivist belief is being revived in the age of big data and machine learning (see, for example, Chris Anderson, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete,” Wired, June 23, 2008. https://www.wired.com/2008/06/pb-theory). ↩
See Heidi Kloos and Guy Van Orden, “Abductive Reasoning by Children,” Review of Psychology Frontier 1:2 (2012): 1–9. ↩
Anja Bechmann and Geoffrey C. Bowker, “Unsupervised by any other name: Hidden layers of knowledge production in artificial intelligence on social media,” in Big Data & Society 6:1 (2019), https://journals.sagepub.com/doi/10.1177/2053951718819569#bibr45-2053951718819569. ↩
Joy Buolamwini and Timnit Gebru, “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification”, Proceedings of the 1st Conference on Fairness, Accountability and Transparency (2018): 77–91. ↩
This is in line with the broader programme of Bayesianism, as Justin Joque explains: ‘Here we see the revolutionary implications of Bayesian analysis: frequentism sets up the experiment to determine repeatable, population-level abstractions, whereas Bayesianism allows the production of a nearly infinite field of hypotheses that can create an abstraction for each case. Bayesianism favors abstractions that produce a thought that is not universal, but merely momentary (…).’ Justin Joque, Revolutionary Mathematics. Artificial Intelligence, Statistics and the Logic of Capitalism (London: Verso, 2022), 183. ↩
Luke Stark, “Artificial Intelligence and the conjectural sciences,” in BJHS Themes 8 (2023): 35–49, here 41. Stark Luke bases his study on the work of Carlo Ginzburg, who at the end of the 1970s contrasted the empirical sciences with the conjectural model (Carlo Ginzburg, “Morelli, Freud and Sherlock Homes: clues and scientific method,” History Workshop Journal 9:1 (1980): 5–36). ↩
In fact, it can be said: The more objective these systems pretend to be, the more subjective they actually are. See Joque, Revolutionary Mathematics. ↩
Igor Douven, The Art of Abduction (Cambridge, MA: The MIT Press, 2022), 23. ↩
The rule goes back to the probability theory of the English pastor and statistician Thomas Bayes (1701–1761) and describes the probability of an event, based on prior knowledge of conditions that might be related to the event (see James Joyce, “Bayes’ Theorem”, The Stanford Encyclopedia of Philosophy (2019), https://plato.stanford.edu/archives/spr2019/entries/bayes-theorem). ↩
Hanti Lin, “Bayesian Epistemology,” The Stanford Encyclopedia of Philosophy (2024), https://plato.stanford.edu/archives/sum2024/entries/epistemology-bayesian. ↩
The term was coined by Judea Pearl in 1985. See Judea Pearl, “Bayesian Networks: A Model of Self-Activated Memory for Evidential Reasoning,” Proceedings of the 7th Conference of the Cognitive Science Society (1985): 329–334. ↩
Pete Barbrook-Johnson and Alexandra S. Penn, “Bayesian Belief Networks,” in Systems Mapping (Cham: Palgrave Macmillan, 2022), 97–112. ↩
Zachary Wojtowicz and Simon DeDeo, “From Probability to Consilience: How Explanatory Values Implement Bayesian Reasoning,” Trends in Cognitive Sciences 24 (2020): 981–93. ↩
For a more schematic overview of Bayesian (belief) networks, see Jan Sprenger and Stephan Hartmann, Bayesian Philosophy of Science (Oxford: Oxford University Press, 2019), 21–24. ↩
José A. Gámez, “Abductive Inference in Bayesian Networks: A Review,” in Advances in Bayesian Networks. Studies in Fuzziness and Soft Computing, edited by José A. Gámez, Serafín Moral and Antonio Salmerón (Berlin/Heidelberg: Springer, 2004), 101–120. ↩
Note that for Alan Hájek, conditional probability (i.e. the probability of a variable given another variable), which is central in Bayesian epistemology, is not just a mathematical formula, but the epistemic cornerstone of all probability theory. Alan Hájek, “What Conditional Probability Could Not Be,” Synthese 137 (2003): 273–323. ↩
In plain English, you start with a prior hypothesis and then update your belief based on new data (the probability of evidence). As more evidence accumulates, your hypothesis becomes more refined. This can be read in the context of Turing’s ‘creative mental act’. ↩
Lin, “Bayesian Epistemology.” ↩
This seems to be particularly advantageous in view of the existing ‘explainability crisis’ in machine learning, as Zachary Wojtowicz and Simon DeDeo argue. See Wojtowicz and DeDeo, “From Probability to Consilience,” 991. ↩
David Guile, “Machine learning – A new kind of cultural tool? A ‘recontextualisation’ perspective on machine learning + interprofessional learning,” Learning, Culture and Social Interaction 42 (2023), https://doi.org/10.1016/j.lcsi.2023.100738. ↩
Alison Gopnik and Joshua B. Tenenbaum, “Bayesian networks, Bayesian learning and cognitive development,” Developmental Science 10:3 (2007): 281–287, here 282. ↩
Eugene Charniak, “Bayesian Networks without Tears,” AI Magazine 12:4 (1991): 50–63, here 51. ↩
There are, of course, overlaps between connectionist models and Bayesian inference, for example in stochastic neural networks (see Michael S. C. Thomas and James L. McClelland, “Connectionist models of cognition,” in The Cambridge handbook of computational psychology, edited by Ron Sun (New York: Cambridge University Press, 2008), 29–79, here 41). However, connectionist models, fundamentally data-driven, are much closer to frequentism, the historical antagonist of Bayesianism. ↩
With reference to Vygotsky‘s concept learning theory, see: Tyler Reigeluth and Miachel Castelle, “What kind of learning is machine learning?” in The Cultural Life of Machine Learning: An Incursion into Critical AI Studies, edited by Jonathan Roberge, and Michael Castelle (Cham, Switzerland: Palgrave Macmillan, 2020), 79–115. ↩
Florian Jaton, “Assessing biases, relaxing moralism: On ground-truthing practices in machine learning design and application,” Big Data & Society 8:1 (2021), https://journals.sagepub.com/doi/full/10.1177/20539517211013569. ↩
See, for example, Safiya Umoja Noble, Algorithms of Oppression: How Search Engines Reinforce Racism. New York: NYU Press, 2018); Virginia Eubanks, Automating Inequality (New York: St. Martin’s Press, 2018); Cathy O’Neil, Weapons of Math Destruction (New York: Crown Publishing, 2016). ↩
Luke Stark, “Artificial intelligence and the conjectural sciences,” BJHS Themes, Volume 8: Histories of Artificial Intelligence: A Genealogy of Power (2023): 35–49, here 36. ↩
See also Florian Cramer, “Crapularity Hermeneutics: Interpretation as the Blind Spot of Analytics, Artificial Intelligence, and Other Algorithmic Producers of the Postapocalyptic Present,” in Clemens Apprich et al., Pattern Discrimination (Lüneburg: meson press 2018), 23– 58. ↩
See, for example, Peter Norvig’s claim that with machine learning, computer science has moved from mathematics to natural science, from doing logic to making observations. “Google Machine Learning Crash Course (MLCC),” Google Research, accessed October 28, 2021, https://developers.google.com/machine-learning/crash-course. ↩
Pedro Domingos, “A Few Useful Things to Know about Machine Learning,” Communications of the ACM 55:10 (2012): 78–87, here 79. ↩
Again, these features often correspond to existing social categories such as race, gender and class (s. above). ↩
Geoffrey Hinton describes feature learning using a Boltzmann machine as an example. See Hinton, “Where do features come from?”, Cognitive Science 38:6 (2014): 1083. Boltzmann and Bayesian Belief Networks are both types of graphical models, but they have significant differences in their structure, properties, and applications. ↩
Making the feature learning process more explicit with Bayesian learning is, for example, explored in Kirsten Fischer et. al., “Critical feature learning in deep neural networks”, arXiv (2024), https://doi.org/10.48550/arXiv.2405.10761. However, most of these attempts remain within the connectionist (i.e. inductive) paradigm. ↩
Douven, The Art of Abduction 17f. ↩
Donna Haraway, “Situated Knowledges: The Science Question in Feminism and the Privilege of Partial Perspective,” Feminist Studies 14:3 (1988): 575–99, here 592. ↩
Dan McQuillan, Resisting AI. An Anti-Fascist Approach to Artificial Intelligence (Bristol: Bristol University Press, 2022), 109. ↩
In fact, there are couple of initiatives that do deploy a Bayesian learning paradigm to build more robust and knowledge-based ML-systems, such as The Bayes-Duality Project or the Center for Human-Compatible Artificial Intelligence. ↩
Marcia K. Moen, “Peirce’s Pragmatism as Resource for Feminism,” Transactions of the Charles S. Peirce Society 27 (1991): 435–450. ↩
Wojtowicz and DeDeo, “From Probability to Consilience,” 982. ↩
This makes explanatory values also deeply problematic, as they easily turn into ‘explanatory vices’ that drive phenomena such as conspiracy theory and other pathological beliefs. Ibid. ↩
Charles S. Peirce, “The Law of Mind,” in The Essential Peirce, Vol. 1, ed. Peirce Edition Project (Bloomington/Indianapolis: Indiana University Press), 312–333, here 329. ↩
Luciana Parisi, “Das Lernen lernen oder die algorithmische Entdeckung von Information,” in Machine Learning. Medien, Infrastrukturen und Technologie der Künstlichen Intelligenz, edited by Christoph Engemann and Andreas Sudmann (Bielefeld: Transcript, 2018), 110. Translation by the author. ↩
According to Vygotsky, the mind or psyche develops through symbolic mediation (i.e. language) and allows for a dialectic relationship between the conscious, unconscious and physical. Following Marx’s concept of the fetish, Vygotsky claims that “the mental nature of man represents the totality of social relations internalized and made into functions of the individual and forms of his structure” (Lev S. Vygotsky, The History of the Development of Higher Mental Functions (New York: Plenum, 1997), 10). His project of a dialectic psychology between mental and physical aspects was also directed against Freud’s theory of the unconscious, which, according to Vygotsky, failed to bridge these two sides. ↩
Tyler Reigeluth and Michael Castelle, “What Kind of Learning Is Machine Learning?” in The Cultural Life of Machine Learning, edited by Jonathan Roberge, and Michael Castelle (Cham: Palgrave Macmillan, 2021), 79–115, here 80. ↩
Ibid., 81. ↩
See Adrian Mackenzie’s notion of machine learners (Adrian Mackenzie, Machine Learners: Archaeology of a Data Practice (Cambridge, MA: The MIT Press, 2017). On the social aspect of learning, see also Jean Lave and Etienne Wenger, Situated Learning. Legitimate peripheral participation (Cambridge: Cambridge University Press, 1991, in particular 48–49. ↩
Lev S. Vygotsky, Mind in Society. The Development of Higher Psychological Processes (Cambridge, MA: Harvard University Press, 1978), 84–91. The learning happens in the ‘zone of proximal development’, which is ‘the distance between the actual developmental level as determined by independent problem solving and the level of potential development as determined through problem solving under adult guidance or in collaboration with more capable peers’ (ibid. 86). ↩
The mutual learning process is particularly evident in today’s machine learning systems, which are not only trained on the data we create but also fine-tuned according to our applications. In addition, the learning process is guided by an extended version of the teacher-pupil-relationship, as exemplified by ‘Reinforcement Learning from Human Feedback’ (see Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei, “Deep reinforcement learning from human preferences,” arXiv (2023), https://doi.org/10.48550/arXiv.1706.03741). Here, the exchange between human annotator and the AI model is essential for the learning success. It is therefore inaccurate to say that machines simply ‘learn by themselves’ – just as it is to claim that humans learn by themselves. ↩
Alan M. Turing: “Can Digital Computers Think? (1951),” in The Essential Turing, ed. B. Jack Copeland (Oxford: Oxford University Press, 2004), 476–486. ↩
Charles S. Peirce, The Collected Papers of Charles Sanders Peirce, Vol. V: Pragmatism and Pragmaticism, ed. Charles Hartshorne and Paul Weiss (Cambridge: Harvard University Press, 1931–1935), 181. ↩
Luciana Parisi, “Critical Computation: Digital Automata and General Artificial Thinking,” Theory, Culture & Society 36:2 (2019): 89–121, here 110. ↩
This approach, of course, fits into the larger endeavour of so-called ‘explainable AI’, which so far has largely remained in the connectionist paradigm (i.e. neural networks, deep learning systems, transformer models). Also, more recent debates about a ‘general-purpose AI’, which draws on world modelling and probabilistic inference, refer to Bayesian models. See, for example, Karl Friston, Rosalyn J. Moran, Yukie Nagai, Tadahiro Taniguchi, Hiroaki Gomi, and Josh Tenenbaum, ”World Model Learning and Inference”, Neural Networks 144 (2021): 573–90. ↩
Louise Amoore, “Machine learning political orders,” Review of International Studies 49:1 (2023):20–36, here 28. ↩
Parisi, “Critical Computation,” 112. ↩
Even the paper Parisi refers to in terms of ‘automated meta-abductive reasoning’ seems to suggest that abduction is closer to causal rather than associative reasoning, favouring (without explicitly mentioning it) a Bayesian epistemology. See Katsumi Inoue, Andrei Doncescu and Hidetomo Nabeshima, Hidetomo, “Completing causal networks by meta-level abduction,” Machine Learning 91 (2013): 239–277. ↩
Such a causal framework would, according to Judea Pearl, also allow machines to ask counterfactual questions, enabling them to gain true (i.e. intuitive) intelligence. See Judea Pearl and Dana Mackenzie, The Book of Why (New York: Basic Books, 2018). ↩
Ranjodh Singh Dhaliwal, Théo Lepage-Richer, and Lucy Suchman, Neural Networks (Lüneburg/Minneapolis: meson press/University of Minnesota Press, 2024), 107. ↩