Many people like to promote an interpretation of the scientific method - let me call it the "Popperian interpretation" - that I find naive, oversimplified, and incomplete. In this picture, scientists
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
Let me present the scientific process as a more complex sequence or combination of the following procedures that may occur in many different orders and that may influence each other in many different ways:
Deduction
The second procedure, deductive reasoning, means that you already have a theory or a hypothesis and you deduce (or calculate) what it predicts for a particular situation. You often need to know and develop some mathematical (and numerical) methods. But you might say that in principle, all the wise tricks used in deductive reasoning could be replaced by the brute force of a powerful enough computer. Lattice QCD could be an example that you should have in mind. A computer should tell you what the predictions are and how accurately they have been determined.
Computer-assisted simulation is a kind of deduction. Paradoxically from a linguistic viewpoint, mathematical induction is another good example of deduction ;-) simply because the conclusion for all integers rigorously and certainly follows from the assumptions.
Verification
The third procedure, comparisons of the results with experience, is even more straightforward. You have some numbers that you have deduced from your theory and the experimenters measure the numbers in the real world. If they disagree and if you can trust both the experimenters as well as the deducers, the theory is dead. In reality, such a step usually follows the rules of fuzzy logic. A theory is often "supported" by the evidence or it is "disfavored". Its probability increases or decreases but it doesn't jump all the way up to 100% and it doesn't decrease all the way down to 0% either (although we may often get extremely close to these extreme values).
You might imagine that the Bayesian inference is - openly or secretly - used whenever we refine our opinions about the validity of a theory by checking it against the empirical evidence. Our prior probabilities that hypotheses are correct have to be raised if the hypotheses pass some experimental tests in a better way than expected by chance while they have to be lowered if the agreement is worse than expected by chance.
The best quantitative law is encoded in the Bayes formula but it is fair to say that scientists rarely calculate the probabilities accurately when they are just browsing around and comparing very different hypotheses. It's because the "accurate" probabilities would depend on the priors that are unknown, anyway.
It is not hard to guess that most of this article will be dedicated to the first procedure in the list, namely induction. Many discussions - online discussions as well as the interactions in the real life - have convinced me that most people (including those who claim to be interested in science) completely misunderstand induction: they really fail to grasp what it means to "think" as a scientist.
Induction
I would like to define inductive reasoning as the collection of all "creative" or "not quite mechanical" steps that scientists ever have to make in order to find something new about Nature or at least to get on the right track. Almost everything that has made various theorists "ingenious" or "different from the average people" is hiding in their art of induction.
But experimenters and deducers use induction at various places of their work, too. When a deducer designs her calculational algorithm or an experimenter constructs her new experimental apparatus, they are using a form of induction to achieve their goals, too.
Nevertheless, I will try to focus on the theorists' usage of induction during their search for more correct theories of Nature. But don't forget that if you replace the words "theory" or "statement about reality" by a "calculational framework" or a "useful experimental device" in the text below (and you replace "more likely" by "more useful"), most of the assertions will be relevant for the deducers and experimenter, too. But let's return to the world of theorists now.
More technically, induction is any type of reasoning in which a statement about reality is identified and determined to be more likely than random statements that use similar concepts - a reasoning that applies either logic or heuristic reasoning to existing facts about reality.
In this case, the probability that the newly constructed statement is correct is even harder to quantify than in the case of verification. Nevertheless, you might still imagine that there exists a form of Bayesian inference that quantifies the probability that the induced hypothesis is correct. Although the numerical value of the probability is usually ill-defined (mostly because of the completely uncertain priors), we may compare these probabilities in many cases and say e.g. that a candidate law that has been extracted from a more detailed set of observations is more likely to be true than a candidate law that is only based on a small number of observations.
The inductive reasoning that has always been necessary to make promising guesses in science (and elsewhere) has many kinds and in my definition, it includes these steps, among others:
Interpolation and extrapolation
Many or most insights about Nature are described in terms of functions of real variables, besides other concepts. Because Nature is known to be continuous in many different contexts, we have learned that it is reasonable to assume that the relevant functions are continuous, too. At least in many cases (especially in classical physics) we have very good reasons to think so, to say the least.
This principle allows us to guess that if experiments show that "y=3.05" for "x=21.5" and "y=3.07" for "x=21.7", it is pretty likely (or reasonable to expect) that "y" will be close to "3.06" for "x=21.6". The reasoning I used here is called interpolation. In the particular example, I interpolated the two measured points by a linear function.
The exact character of the function becomes increasingly unimportant if the measured values of "x" and "y" are sufficiently dense or close to each other and sufficiently "generic" (i.e. not special). The advantage of interpolation is that we don't have to make infinitely many measurements to know (or predict) something about the value of "y" for arbitrary values of "x". Note that if you didn't allow scientists to assume some kind of continuity that allows them to interpolate, you could always argue that they have absolutely no idea what happens for "x=21.6" because they have only measured what happens at two different values of "x".
In the example of interpolation above, we have made a guess about the behavior at intermediate values of "x". However, it is often necessary to make a reasonable first guess what happens at completely different values of "x". If we already have some candidate function (polynomials, exponentials, and other functions that are "mathematically natural") that describes what happens inside an interval on the "x" axis, we may boldly try to extend this function to other values of "x", too.
This process is called extrapolation. It is clearly less reliable than interpolation but it is still better than knowing nothing. Extrapolation is often needed to have some remotely sane initial idea about an unknown situation. Once other types of induction, to be discussed later, or new pieces of evidence are taken into account, we might actually be able to see that a certain function is much more justified than we would expect at the beginning. If you don't immediately know why one particular function should be the right one outside the measured interval of "x", it doesn't mean that it is forever impossible to know (or almost know) such a reason!
Generalization of several cases
You should realize that the previous arguments and methods - interpolation and extrapolation - have to be used all the time because otherwise we couldn't learn anything from our experience. Every new situation or experiment we care about is slightly different than the previous ones. Without some assumption that relevant functions are "mathematically natural" or at least "continuous" and/or "universal", we would only "know" what happens in the exact experimental situations that have been measured.
Every new planet, space shuttle etc. would be a completely new mystery. The law of gravity could always break down. In reality, it is not the case. The assumption of continuity - or even more special conditions about the relevant functions that describe natural phenomena (including e.g. analyticity in quantum field theory) - is a hypothesis that can be tested and that has been tested.
Overwhelming evidence can be found and has been found in many cases or most cases that shows, via Bayesian inference, if you wish, that this hypothesis of continuity (or even analyticity etc. in quantum field theory) is more likely to be true (or at least more useful) than a random statement about Nature you could make. In fact, it is almost certainly true, at least with some impressive accuracy. ;-)
There exist more general types of this reasoning. I included them under the umbrella of generalization. If you have seen the sunrise 10,000 times and you even have some indirect evidence that other people have seen it millions of times and your continent has witnessed it trillions of times, it is not quite unreasonable to assume that the Sun will rise tomorrow, too.
Now, 10,000 sunrises don't "prove" anything. But they are still an argument. In some sense, it is fair to think that the 10,000 sunrises imply that the probability that the Sun won't rise tomorrow is probably smaller than 1/10,001. Tomorrow is a random day among the 10,001 days that you will have seen by tomorrow and the probability that a special property that only holds for one day - not having any sunrise - is satisfied for "tomorrow" rather than the previous 10,000 days is about 1/10,001, assuming some democracy governing the days. Moreover, even Al Gore will agree that the cataclysm won't come tomorrow but only on the day after tomorrow. ;-)
I didn't use any nuclear physics. The argument that the catastrophe that starts tomorrow is unlikely because we haven't seen it for quite some time is only based on common sense. Well, not quite. It is also based on some simple observations that you (or people) have made 10,000 times. One should realize that simple observations, when combined with some "common sense" arguments, can often lead us to more solid predictions than arguments based on very non-trivial observations combined with excessively contrived theoretical reasoning.
Now, I would like to explain the relationship between interpolations, extrapolations, and generalizations. Interpolations and extrapolations only differ by the place on the "x" axis where we extend our observations (inside or outside the interval). Generalizations are, in fact, similar. The only difference is that the generalizations we discuss in this section require a new type of "x" that is discrete.
For our example involving the sunrise, "x" is the integer that counts the days in your life. The discrete function "y" is either equal to "sunrise does occur" or "sunrise doesn't occur" on a given day. And we are simply extrapolating from the previous days - values of "x" - to the future ones. It is thus a form of extrapolation. In other cases, the discrete values of "x" are not naturally ordered along an axis. For example if we "induce" Hubble's law of expansion for various galaxies, "x" identifies the particular galaxy and there is no natural way to assign real numbers or integers to galaxies.
Consequently, we can't say whether the generalization of Hubble's law from the observed galaxies to all galaxies is an example of interpolation or extrapolation. But you can still see that this generalization is a procedure that is closely related to interpolation and/or extrapolation.
These generalizations have been, much like interpolations and extrapolations, essential to figure out something about Nature. And they are still essential. All cases in which a rule or pattern applies to all members of a set of objects or phenomena belong to this category. Once again, arguments constructed in this way are not rigorous proofs. They depend on the assumption of "universality" much like interpolations depend on "continuity".
But much like many other useful assumptions, universality seems to be (at least effectively) true while it is of course falsifiable in principle. However, an overwhelming evidence suggests that it is a good guide (in many contexts - and, by a more conceptual kind of generalization, in classes of contexts).
Fusion of a group of older laws
This new type of reasoning is somewhat different. When we have two partially successful theories that describe two subsystems - or two (groups of) aspects of a physical system - we often need to "merge" these two theories.
In this case, it is obvious that the two theories we started with are incomplete because they only apply to subsystems (or to some aspects of a physical system). We only know pieces of the puzzle. In some cases, the pieces could be viable fragments of an older theory that has been falsified as a bloc but whose portions are promising enough to be recycled. To learn something about the whole puzzle, we clearly need more than the pieces.
The first step is to realize what is the relationship between the pieces. We must try to carefully divide the subsystems or the aspects of one physical system in between the two (or more) older theories and we must make reasonable qualitative guesses what happens near the boundary between the two (or more) domains of validity: a kind of interpolation is necessary here.
Because functions in Nature are typically continuous, it is usually not possible to sharply divide the domains of validity. Every theory X tends to employ some functions that also try to tell us something about physics outside the domain of validity of X. For this simple reason, the merger of two types of theories is often highly non-trivial and the number of possibilities how two (or more) theories may be merged is often severely limited: the correct theory must "tame" both older theories and force them to peacefully respect the regions outside their domains of validity.
This compatibility constraint often forces the older theories to "unify" - i.e. to merge into a greater entity in which the original two theories can no longer be strictly divided.
There are also cases in which the merger is straightforward. Sometimes the "merger" simply means that we only add the degrees of freedom and add the Hamiltonians (or actions) of two (or more) older theories. That's a typical example of a situation in which no unification occurs (and no unification is needed).
Application of a successful method in a new context
We are getting fancier. What is the method discussed here? Well, it is a scientific reincarnation of an "analogy". How does it work when we're looking for better theories? It is often possible to compare two "qualitatively" different theories that we have considered in the past, locate the difference between them, and present one of the theories (the more complete and/or more accurate one) as the other one that has undergone a certain operation.
Once we isolate this operation ("a method to improve a theory"), we may describe it in terms of general rules that may also be applied to other theories whose "siblings" are not yet known. This method is therefore a new kind of a "merger" that was discussed in the previous section. However, in this case, we are not merging two particular "models" but rather one model with an abstract method. This abstract method can be visualized as the difference between two other theories or models.
Quantization is the most obvious example of the procedure I am talking about. It is possible to compare the quantum theory of one particle with its classical counterpart, describe this difference as a set of replacements of concepts - quantization - and use these steps to another classical physical system, such as classical electrodynamics, to construct a new theory, in this case quantum electrodynamics.
There are more trivial examples of this sort of reasoning, for example the rewriting of a theory in certain useful variables - variables of a similar kind that have been successful in other contexts (although this technical step might be more useful for deducers rather than the creative theorists; new variables usually do not mean new physics). The main point is that the physicists have to combine not only models that have already been found but they must also combine models with principles, models with methods, and so forth.
This subcategory of inductive reasoning is perhaps the most "creative" one. It is capable to qualitatively change the character of our hypotheses and concepts and to increase their complexity most dramatically. If you compare scientific hypotheses to life forms, the currently discussed "merger" of models with principles that were extracted from other contexts are analogous to the most dramatic types of mutations, mergers of two DNA molecules, a change of the number of chromosomes, etc. Even in biology, many of these "large mutations" are not quite random but they are rather carefully tailored so that the resulting new life form is bound to be more interesting and viable than a random life form of a similar complexity.
Quantization and similar procedures have counterparts in rigorous mathematics, too. In set theory, we often need to construct larger sets or classes - recall Zermelo-Frenkel's set theory - from the bottom, including the set of all subsets of another set, the set of all functions from one set to another set, and so forth.
In other words, we are talking about all the procedures that often lead to possibly relevant, highly complex mathematical structures. Just like the mammals are "qualitatively" different from bacteria and rather "large mutations" had to take place in the past, cutting-edge quantum theories of gravity (i.e. stringy vacua) are qualitatively different from the classical mechanistic theories popular in the 17th century. Something had to change many times before people became able to propose and consider theories of the modern kind and I mostly include these improvements into this group of "application of successful principles to new situations".
If thousands of philosophers were trying to find the strangest possible theory for thousands of years, they would have never invented quantum mechanics (Sidney Coleman) simply because they would only be searching in the realm of naive, insufficiently abstract, not too complex, and insufficiently crazy ideas from a common sense viewpoint (for example, among different shapes of the classical atoms by Democritus): they could have never made the right guess. Nature had to force us to make our theories more abstract, more complex, and crazier in order to match Her extraordinary charm. Analogies have been the key to increase the conceptual sophistication of our hypotheses.
Search for general principles and constraints
While the previous mental step allows us to expand the "DNA of science" and create increasingly complicated monsters (some of which are clearly necessary to describe the reality properly or almost properly), we might often end up with too large monsters that are too arbitrary - too many huge convoluted animals with too long DNA molecules, if you wish. Complex animals are fine and necessary. But we still need a counterpart of the natural selection here.
Again, it is not true that new experimental tests are always needed to "kill" some of these convoluted animals. We usually have a lot of old experimental tests and most of the completely random new "convoluted animals" are almost certainly guaranteed to fail in these old tests. That's a great method to exterminate - or severely hurt - most of these big animals.
We must create and sharpen our tools to kill them. Because it is inconvenient to carry too many tools, it is a great idea to use a finite, manageable number of these tools. And to remember which of them are sharp ones. And use these tools all the time to get rid of wrong theories very quickly even if they are complicated and if others might incorrectly think that one needs centuries to kill such dinosaurs. With sharp and efficient tools, you can kill a lot dinosaurs very quickly.
In the previous paragraph, I was really talking about various experimental or cosmological constraints on "unusual" phenomena. Complicated enough theories, much like complicated enough animals, tend to have some "weak spots" that seem "unusual" and that can simplify the life of the hunters or the killers.
For example, many theories might predict some violations of the Lorentz symmetry. It is therefore useful to understand very well how to quantify the amount of Lorentz violation predicted by a given theory (the thickness of the skin of a big animal at some vulnerable spot or spots) and how to compare it with the experimentally known constraints (the length of our knife that can be enough to kill an animal). The knife is enough whenever the skin is thin.
Experimental or cosmological constraints are able to eliminate many sufficiently randomly created theories very quickly and we don't really have to make new experiments in most cases. Quite obviously, promising theories - big animals that have a chance to survive - must have rather special properties. And it is a good idea to look within these mantinels only.
So we are expanding the landscape of the DNA codes as the animals (theories) are getting more complex but we are also making the tools to kill them sharper. While the landscape of possible DNA codes is getting larger as the theories get more complex, the relative ratio of the viable theories within this landscape is getting smaller because we are also gaining an increasing number of tools to kill.
The creation of new, complex life forms is competing with the creation of new, sophisticated killing tools and both of these processes are necessary to make progress, much like their counterparts were needed for the evolution of life to end up with relatively intelligent (and otherwise valuable) animals like us. The main difference between life and theoretical physics is that the life forms can be getting increasingly complex indefinitely; theoretical physics is arguably or hopefully converging to a very specific final theory with a particular finite amount of complexity.
I would like to emphasize that the tools used to eliminate theories don't have to be simply understandable, experimental, quantitative constraints such as the limits on CPT or Lorentz symmetry violation. They may also be very abstract and qualitative principles. While the CPT symmetry or the Lorentz symmetry are examples of such principles - that force us to look at relativistic theories - and we may view them as "experimental" constraints, they are "theoretical" constraints at the same moment.
It was reasonable for Einstein to assume that the Lorentz symmetry had to be exactly valid (at least for a sensible approximation of the complete theory, it must be the case). So he simply required theories to be relativistic. This condition rules out a larger number (a majority, in most ways to count) of big animals that you could consider otherwise.
Also, various types of "consistency rules" must obviously be counted as the tools to kill. Unitarity must be satisfied, gauge anomalies must be absent, and a nearly flat & empty space must be a solution of your theory, among other things. The "large mutations" discussed previously can lead to many mathematical structures that don't satisfy even these basic rules. It takes a second to get rid of these lethally sick candidate theories.
Linearization and parameterization of the viable theories
Finally, there is one more important method that I included in the list. Because we have an increasing number of constraints - not only direct experimental constraints but also theoretical constraints that are able to instantly eliminate seemingly viable complex theories essentially without any experiment whatsoever, we are learning about the location of the correct theory within the landscape of possible theories increasingly accurately, relatively speaking.
While scientists usually test many hypotheses at the same time and only a very few of them have any chance to succeed - that's also the source of Popper's asymmetry when he assumes that the probabilities can only go down (for most hypotheses, the wrong ones, they go down) - we simultaneously want to keep the pool manageable and only consider limited enough, finite classes of hypotheses in which the right one doesn't get completely lost - that's why you shouldn't neglect that the probabilities also can go up (there exists a correct theory, after all) and why Popper's asymmetric description is flawed.
The wiggle room - i.e. the differences between promising candidates - are getting rather small. When you try to describe a small vicinity of some place in space, things may simplify. When you know the correct theory almost accurately, assuming a certain collection of basic concepts, it is usually a good idea to parameterize the correct theory by a few (or many) parameters that should be adjusted. Once you replace the promising neighborhood of a submanifold by the tangent space, you are clearly linearizing the problem.
Once you know the fields of the Standard Model, the correct theory is described in terms of the masses and couplings from the Lagrangian. For practical purposes, you may also show that the non-renormalizable couplings are irrelevant for easily testable predictions (they are also irrelevant in the technical sense) and you end up with a few dozens of parameters that matter.
Renormalizable quantum field theories with a few fields typically have a limited number of continuous adjustable dimensionless parameters. At some moment, the only task is to determine their values more accurately if you're only interested in particle physics below 100 GeV. The qualitative part of the physics research of this issue is over. What is left is rather boring. You have to adjust the very fine details and parameters. I would hesitate whether these final steps should be considered as induction.
However, the example of biology shows that the viable life forms must often be encoded in discrete data. The human DNA has a few billion bases. But if you self-confidently decide that the humans are the only intelligent animals above some IQ level that the Earth has seen so far, typical (and even less typical) human beings may be encoded in a much smaller number of bytes describing the properties or parameters of various genes. When we are getting very close to the full understanding of reality, it is very important to know the "legitimate playground" and to describe its structure and "geometry" as clearly as possible. New emergent concepts are useful to achieve this goal ("genes" in biology or "renormalizable couplings" in quantum field theory, besides millions of other scientific concepts).
The stringy landscape is of course far from being the only example of a parameterization of viable theories by discrete parameters. Such a parameterization is omnipresent in science. Conceivable perturbative theories of particle physics may be described in terms of gauge groups, representations in which scalars and fermions transform, and the continuous renormalizable couplings and masses. The discrete part of this data plays pretty much the same role as the discrete data that determine the vacuum in string theory.
In some sense, the number of discrete options is analogous and comparable in quantum field theory and in the stringy landscape. Just like quantum field theory itself doesn't predict the number of generations, string theory as we know it today doesn't predict the preferred topology of the Calabi-Yau manifold.
However, string theory completely eliminates the adjustable continuous dimensionless parameters. The only undetermined properties are the discrete parameters which is a gigantic progress in comparison with quantum field theory that has continuous parameters. And if you count the irrelevant ones (that become important at high energies), quantum field theory has infinitely many of these adjustable continuous parameters. Still, even in string theory, one has to go through the discrete candidates and find tools how to eliminate the incorrect ones.
It might be the case that such a vacuum selection is governed (completely or partially) by some new, mathematically described laws; if it is so, one could calculate what the right choice is. We don't know what the hypothetical rule exactly is at this moment. However, it might also be that no canonical calculation of the number of families and the representation of fermions etc. exists in string theory. If it doesn't exist, it is clear that we may only determine the right solution by a careful comparison of more detailed properties of the vacua with the experience, i.e. to use the old-fashioned verification to eliminate candidates.
While this "anthropic" scenario is often presented as the ultimate catastrophe because of the perceived "lack of predictivity", it wouldn't be the first time when experimental verification is needed to eliminate wrong models that talk about physics of these experiments. ;-) Our knowledge of low-energy physics could be enough to induce and deduce all detailed physical properties of Nature at the Planck scale, 15 orders of magnitude above the accessible energy scale (which is the regime we want to understand, almost by definition), by "pure thought" but it could also be insufficient. No one knows for sure: there exists partial evidence in both ways. This question doesn't really depend on any characteristic technical features of string theory; attempts to make this general question emotional or even to link it with the fate of string theory are irrational.
Note that there is no difference between string theory and quantum field theory - understood as broader frameworks or toolkits to propose models - in this particular aspect. When people were building the Standard Model, they had to determine how many fermions they had to include and what were the representations. That picked the general form of the Lagrangian from infinitely many discrete choices of a similar kind. In the string language, one makes similar discrete arguments to eliminate most of the vacua - the discrete parameters are just reshuffled and reorganized relatively to quantum field theory.
The task in string theory is more constrained because the continuous parameters are absent but it is arguably more complicated than in quantum field theory because of another reason: there may exist many discrete vacua that have the same low-energy limit (the limit we have already observed) and new observations (combined with complicated calculations) might be needed to figure out which of them is the right one. But the number of options we must go through is whatever it is.
The number is determined by Nature and the laws of mathematics that even She must respect, not by someone's wishful thinking. It is analogous to the number of possible human DNA codes. Someone could also demand that Nature together with the biologists only allow a couple of human DNA codes because humans are so special, as argued in the Holy Scripture and elsewhere. However, the number of similar human-like DNA codes is comparable to 4^{3 billion}. You can complain about it or write dumb books for dumb readers but that's the only thing you can do against the facts of Nature. The number of discretely separated currently alive candidate theories of the Universe, those that include quantum gravity, is much smaller than 4^{3 billion}. But we also know that if our "job competition" only "cuts" the set of candidates by a few general conditions, the "short list" will still contain many more entries than billions.
Nevertheless, what matters in science is whether a theory is correct or not and we know that string theory has not yet been falsified while local quantum field theories including gravity have already been falsified as a theory of the Planck scale because it is inconsistent in that regime. Consistency is much sharper a tool that can eliminate (kill) theories than any kind of a wishful thinking about the number of options that one should have.
It is plausible that I have forgotten some important types of inductive thinking that are crucial for science. For example, I haven't discussed
And that's the memo.
- make guesses (create hypotheses)
- falsify the wrong ones by observations
Commercial break: Sheldon Glashow's obituary of Sidney Coleman. Via Betsy Devine, thanks!However, there are serious problems with this simple picture. It doesn't tell you how the scientists actually "invent" or choose their guesses and it doesn't tell you that the guesses are almost never abandoned in their entirety. Also, the very idea that the likelihood that the guesses are right can only decrease - but not increase - is logically untenable because if the probability of "A" decreases, the probability of "non A" must inevitably increase. ;-)
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
Let me present the scientific process as a more complex sequence or combination of the following procedures that may occur in many different orders and that may influence each other in many different ways:
- Inductive reasoning (induction)
- Deductive reasoning (deduction)
- Comparisons of the deduced results with experience (verification)
Deduction
The second procedure, deductive reasoning, means that you already have a theory or a hypothesis and you deduce (or calculate) what it predicts for a particular situation. You often need to know and develop some mathematical (and numerical) methods. But you might say that in principle, all the wise tricks used in deductive reasoning could be replaced by the brute force of a powerful enough computer. Lattice QCD could be an example that you should have in mind. A computer should tell you what the predictions are and how accurately they have been determined.
Computer-assisted simulation is a kind of deduction. Paradoxically from a linguistic viewpoint, mathematical induction is another good example of deduction ;-) simply because the conclusion for all integers rigorously and certainly follows from the assumptions.
Verification
The third procedure, comparisons of the results with experience, is even more straightforward. You have some numbers that you have deduced from your theory and the experimenters measure the numbers in the real world. If they disagree and if you can trust both the experimenters as well as the deducers, the theory is dead. In reality, such a step usually follows the rules of fuzzy logic. A theory is often "supported" by the evidence or it is "disfavored". Its probability increases or decreases but it doesn't jump all the way up to 100% and it doesn't decrease all the way down to 0% either (although we may often get extremely close to these extreme values).
You might imagine that the Bayesian inference is - openly or secretly - used whenever we refine our opinions about the validity of a theory by checking it against the empirical evidence. Our prior probabilities that hypotheses are correct have to be raised if the hypotheses pass some experimental tests in a better way than expected by chance while they have to be lowered if the agreement is worse than expected by chance.
The best quantitative law is encoded in the Bayes formula but it is fair to say that scientists rarely calculate the probabilities accurately when they are just browsing around and comparing very different hypotheses. It's because the "accurate" probabilities would depend on the priors that are unknown, anyway.
It is not hard to guess that most of this article will be dedicated to the first procedure in the list, namely induction. Many discussions - online discussions as well as the interactions in the real life - have convinced me that most people (including those who claim to be interested in science) completely misunderstand induction: they really fail to grasp what it means to "think" as a scientist.
Induction
I would like to define inductive reasoning as the collection of all "creative" or "not quite mechanical" steps that scientists ever have to make in order to find something new about Nature or at least to get on the right track. Almost everything that has made various theorists "ingenious" or "different from the average people" is hiding in their art of induction.
But experimenters and deducers use induction at various places of their work, too. When a deducer designs her calculational algorithm or an experimenter constructs her new experimental apparatus, they are using a form of induction to achieve their goals, too.
Nevertheless, I will try to focus on the theorists' usage of induction during their search for more correct theories of Nature. But don't forget that if you replace the words "theory" or "statement about reality" by a "calculational framework" or a "useful experimental device" in the text below (and you replace "more likely" by "more useful"), most of the assertions will be relevant for the deducers and experimenter, too. But let's return to the world of theorists now.
More technically, induction is any type of reasoning in which a statement about reality is identified and determined to be more likely than random statements that use similar concepts - a reasoning that applies either logic or heuristic reasoning to existing facts about reality.
In this case, the probability that the newly constructed statement is correct is even harder to quantify than in the case of verification. Nevertheless, you might still imagine that there exists a form of Bayesian inference that quantifies the probability that the induced hypothesis is correct. Although the numerical value of the probability is usually ill-defined (mostly because of the completely uncertain priors), we may compare these probabilities in many cases and say e.g. that a candidate law that has been extracted from a more detailed set of observations is more likely to be true than a candidate law that is only based on a small number of observations.
The inductive reasoning that has always been necessary to make promising guesses in science (and elsewhere) has many kinds and in my definition, it includes these steps, among others:
- Interpolation and extrapolation
- Generalization of several cases
- Fusion of a group of older laws
- Application of a successful method to approximate laws (and the concepts they use) in order to find more accurate laws (and new concepts)
- Search for general constraints that restrict possible laws of Nature, including general principles
- Linearization or other simplification or parameterization of the "space of hypotheses" within the mantinels of viability found in the previous item, in order to adjust and improve the imperfect but promising theories and to simplify the navigation towards the truth
Interpolation and extrapolation
Many or most insights about Nature are described in terms of functions of real variables, besides other concepts. Because Nature is known to be continuous in many different contexts, we have learned that it is reasonable to assume that the relevant functions are continuous, too. At least in many cases (especially in classical physics) we have very good reasons to think so, to say the least.
This principle allows us to guess that if experiments show that "y=3.05" for "x=21.5" and "y=3.07" for "x=21.7", it is pretty likely (or reasonable to expect) that "y" will be close to "3.06" for "x=21.6". The reasoning I used here is called interpolation. In the particular example, I interpolated the two measured points by a linear function.
The exact character of the function becomes increasingly unimportant if the measured values of "x" and "y" are sufficiently dense or close to each other and sufficiently "generic" (i.e. not special). The advantage of interpolation is that we don't have to make infinitely many measurements to know (or predict) something about the value of "y" for arbitrary values of "x". Note that if you didn't allow scientists to assume some kind of continuity that allows them to interpolate, you could always argue that they have absolutely no idea what happens for "x=21.6" because they have only measured what happens at two different values of "x".
In the example of interpolation above, we have made a guess about the behavior at intermediate values of "x". However, it is often necessary to make a reasonable first guess what happens at completely different values of "x". If we already have some candidate function (polynomials, exponentials, and other functions that are "mathematically natural") that describes what happens inside an interval on the "x" axis, we may boldly try to extend this function to other values of "x", too.
This process is called extrapolation. It is clearly less reliable than interpolation but it is still better than knowing nothing. Extrapolation is often needed to have some remotely sane initial idea about an unknown situation. Once other types of induction, to be discussed later, or new pieces of evidence are taken into account, we might actually be able to see that a certain function is much more justified than we would expect at the beginning. If you don't immediately know why one particular function should be the right one outside the measured interval of "x", it doesn't mean that it is forever impossible to know (or almost know) such a reason!
Generalization of several cases
You should realize that the previous arguments and methods - interpolation and extrapolation - have to be used all the time because otherwise we couldn't learn anything from our experience. Every new situation or experiment we care about is slightly different than the previous ones. Without some assumption that relevant functions are "mathematically natural" or at least "continuous" and/or "universal", we would only "know" what happens in the exact experimental situations that have been measured.
Every new planet, space shuttle etc. would be a completely new mystery. The law of gravity could always break down. In reality, it is not the case. The assumption of continuity - or even more special conditions about the relevant functions that describe natural phenomena (including e.g. analyticity in quantum field theory) - is a hypothesis that can be tested and that has been tested.
Overwhelming evidence can be found and has been found in many cases or most cases that shows, via Bayesian inference, if you wish, that this hypothesis of continuity (or even analyticity etc. in quantum field theory) is more likely to be true (or at least more useful) than a random statement about Nature you could make. In fact, it is almost certainly true, at least with some impressive accuracy. ;-)
There exist more general types of this reasoning. I included them under the umbrella of generalization. If you have seen the sunrise 10,000 times and you even have some indirect evidence that other people have seen it millions of times and your continent has witnessed it trillions of times, it is not quite unreasonable to assume that the Sun will rise tomorrow, too.
Now, 10,000 sunrises don't "prove" anything. But they are still an argument. In some sense, it is fair to think that the 10,000 sunrises imply that the probability that the Sun won't rise tomorrow is probably smaller than 1/10,001. Tomorrow is a random day among the 10,001 days that you will have seen by tomorrow and the probability that a special property that only holds for one day - not having any sunrise - is satisfied for "tomorrow" rather than the previous 10,000 days is about 1/10,001, assuming some democracy governing the days. Moreover, even Al Gore will agree that the cataclysm won't come tomorrow but only on the day after tomorrow. ;-)
I didn't use any nuclear physics. The argument that the catastrophe that starts tomorrow is unlikely because we haven't seen it for quite some time is only based on common sense. Well, not quite. It is also based on some simple observations that you (or people) have made 10,000 times. One should realize that simple observations, when combined with some "common sense" arguments, can often lead us to more solid predictions than arguments based on very non-trivial observations combined with excessively contrived theoretical reasoning.
Now, I would like to explain the relationship between interpolations, extrapolations, and generalizations. Interpolations and extrapolations only differ by the place on the "x" axis where we extend our observations (inside or outside the interval). Generalizations are, in fact, similar. The only difference is that the generalizations we discuss in this section require a new type of "x" that is discrete.
For our example involving the sunrise, "x" is the integer that counts the days in your life. The discrete function "y" is either equal to "sunrise does occur" or "sunrise doesn't occur" on a given day. And we are simply extrapolating from the previous days - values of "x" - to the future ones. It is thus a form of extrapolation. In other cases, the discrete values of "x" are not naturally ordered along an axis. For example if we "induce" Hubble's law of expansion for various galaxies, "x" identifies the particular galaxy and there is no natural way to assign real numbers or integers to galaxies.
Consequently, we can't say whether the generalization of Hubble's law from the observed galaxies to all galaxies is an example of interpolation or extrapolation. But you can still see that this generalization is a procedure that is closely related to interpolation and/or extrapolation.
These generalizations have been, much like interpolations and extrapolations, essential to figure out something about Nature. And they are still essential. All cases in which a rule or pattern applies to all members of a set of objects or phenomena belong to this category. Once again, arguments constructed in this way are not rigorous proofs. They depend on the assumption of "universality" much like interpolations depend on "continuity".
But much like many other useful assumptions, universality seems to be (at least effectively) true while it is of course falsifiable in principle. However, an overwhelming evidence suggests that it is a good guide (in many contexts - and, by a more conceptual kind of generalization, in classes of contexts).
Fusion of a group of older laws
This new type of reasoning is somewhat different. When we have two partially successful theories that describe two subsystems - or two (groups of) aspects of a physical system - we often need to "merge" these two theories.
In this case, it is obvious that the two theories we started with are incomplete because they only apply to subsystems (or to some aspects of a physical system). We only know pieces of the puzzle. In some cases, the pieces could be viable fragments of an older theory that has been falsified as a bloc but whose portions are promising enough to be recycled. To learn something about the whole puzzle, we clearly need more than the pieces.
The first step is to realize what is the relationship between the pieces. We must try to carefully divide the subsystems or the aspects of one physical system in between the two (or more) older theories and we must make reasonable qualitative guesses what happens near the boundary between the two (or more) domains of validity: a kind of interpolation is necessary here.
Because functions in Nature are typically continuous, it is usually not possible to sharply divide the domains of validity. Every theory X tends to employ some functions that also try to tell us something about physics outside the domain of validity of X. For this simple reason, the merger of two types of theories is often highly non-trivial and the number of possibilities how two (or more) theories may be merged is often severely limited: the correct theory must "tame" both older theories and force them to peacefully respect the regions outside their domains of validity.
This compatibility constraint often forces the older theories to "unify" - i.e. to merge into a greater entity in which the original two theories can no longer be strictly divided.
There are also cases in which the merger is straightforward. Sometimes the "merger" simply means that we only add the degrees of freedom and add the Hamiltonians (or actions) of two (or more) older theories. That's a typical example of a situation in which no unification occurs (and no unification is needed).
Application of a successful method in a new context
We are getting fancier. What is the method discussed here? Well, it is a scientific reincarnation of an "analogy". How does it work when we're looking for better theories? It is often possible to compare two "qualitatively" different theories that we have considered in the past, locate the difference between them, and present one of the theories (the more complete and/or more accurate one) as the other one that has undergone a certain operation.
Once we isolate this operation ("a method to improve a theory"), we may describe it in terms of general rules that may also be applied to other theories whose "siblings" are not yet known. This method is therefore a new kind of a "merger" that was discussed in the previous section. However, in this case, we are not merging two particular "models" but rather one model with an abstract method. This abstract method can be visualized as the difference between two other theories or models.
Quantization is the most obvious example of the procedure I am talking about. It is possible to compare the quantum theory of one particle with its classical counterpart, describe this difference as a set of replacements of concepts - quantization - and use these steps to another classical physical system, such as classical electrodynamics, to construct a new theory, in this case quantum electrodynamics.
There are more trivial examples of this sort of reasoning, for example the rewriting of a theory in certain useful variables - variables of a similar kind that have been successful in other contexts (although this technical step might be more useful for deducers rather than the creative theorists; new variables usually do not mean new physics). The main point is that the physicists have to combine not only models that have already been found but they must also combine models with principles, models with methods, and so forth.
This subcategory of inductive reasoning is perhaps the most "creative" one. It is capable to qualitatively change the character of our hypotheses and concepts and to increase their complexity most dramatically. If you compare scientific hypotheses to life forms, the currently discussed "merger" of models with principles that were extracted from other contexts are analogous to the most dramatic types of mutations, mergers of two DNA molecules, a change of the number of chromosomes, etc. Even in biology, many of these "large mutations" are not quite random but they are rather carefully tailored so that the resulting new life form is bound to be more interesting and viable than a random life form of a similar complexity.
Quantization and similar procedures have counterparts in rigorous mathematics, too. In set theory, we often need to construct larger sets or classes - recall Zermelo-Frenkel's set theory - from the bottom, including the set of all subsets of another set, the set of all functions from one set to another set, and so forth.
In other words, we are talking about all the procedures that often lead to possibly relevant, highly complex mathematical structures. Just like the mammals are "qualitatively" different from bacteria and rather "large mutations" had to take place in the past, cutting-edge quantum theories of gravity (i.e. stringy vacua) are qualitatively different from the classical mechanistic theories popular in the 17th century. Something had to change many times before people became able to propose and consider theories of the modern kind and I mostly include these improvements into this group of "application of successful principles to new situations".
If thousands of philosophers were trying to find the strangest possible theory for thousands of years, they would have never invented quantum mechanics (Sidney Coleman) simply because they would only be searching in the realm of naive, insufficiently abstract, not too complex, and insufficiently crazy ideas from a common sense viewpoint (for example, among different shapes of the classical atoms by Democritus): they could have never made the right guess. Nature had to force us to make our theories more abstract, more complex, and crazier in order to match Her extraordinary charm. Analogies have been the key to increase the conceptual sophistication of our hypotheses.
Search for general principles and constraints
While the previous mental step allows us to expand the "DNA of science" and create increasingly complicated monsters (some of which are clearly necessary to describe the reality properly or almost properly), we might often end up with too large monsters that are too arbitrary - too many huge convoluted animals with too long DNA molecules, if you wish. Complex animals are fine and necessary. But we still need a counterpart of the natural selection here.
Again, it is not true that new experimental tests are always needed to "kill" some of these convoluted animals. We usually have a lot of old experimental tests and most of the completely random new "convoluted animals" are almost certainly guaranteed to fail in these old tests. That's a great method to exterminate - or severely hurt - most of these big animals.
We must create and sharpen our tools to kill them. Because it is inconvenient to carry too many tools, it is a great idea to use a finite, manageable number of these tools. And to remember which of them are sharp ones. And use these tools all the time to get rid of wrong theories very quickly even if they are complicated and if others might incorrectly think that one needs centuries to kill such dinosaurs. With sharp and efficient tools, you can kill a lot dinosaurs very quickly.
In the previous paragraph, I was really talking about various experimental or cosmological constraints on "unusual" phenomena. Complicated enough theories, much like complicated enough animals, tend to have some "weak spots" that seem "unusual" and that can simplify the life of the hunters or the killers.
For example, many theories might predict some violations of the Lorentz symmetry. It is therefore useful to understand very well how to quantify the amount of Lorentz violation predicted by a given theory (the thickness of the skin of a big animal at some vulnerable spot or spots) and how to compare it with the experimentally known constraints (the length of our knife that can be enough to kill an animal). The knife is enough whenever the skin is thin.
Experimental or cosmological constraints are able to eliminate many sufficiently randomly created theories very quickly and we don't really have to make new experiments in most cases. Quite obviously, promising theories - big animals that have a chance to survive - must have rather special properties. And it is a good idea to look within these mantinels only.
So we are expanding the landscape of the DNA codes as the animals (theories) are getting more complex but we are also making the tools to kill them sharper. While the landscape of possible DNA codes is getting larger as the theories get more complex, the relative ratio of the viable theories within this landscape is getting smaller because we are also gaining an increasing number of tools to kill.
The creation of new, complex life forms is competing with the creation of new, sophisticated killing tools and both of these processes are necessary to make progress, much like their counterparts were needed for the evolution of life to end up with relatively intelligent (and otherwise valuable) animals like us. The main difference between life and theoretical physics is that the life forms can be getting increasingly complex indefinitely; theoretical physics is arguably or hopefully converging to a very specific final theory with a particular finite amount of complexity.
I would like to emphasize that the tools used to eliminate theories don't have to be simply understandable, experimental, quantitative constraints such as the limits on CPT or Lorentz symmetry violation. They may also be very abstract and qualitative principles. While the CPT symmetry or the Lorentz symmetry are examples of such principles - that force us to look at relativistic theories - and we may view them as "experimental" constraints, they are "theoretical" constraints at the same moment.
It was reasonable for Einstein to assume that the Lorentz symmetry had to be exactly valid (at least for a sensible approximation of the complete theory, it must be the case). So he simply required theories to be relativistic. This condition rules out a larger number (a majority, in most ways to count) of big animals that you could consider otherwise.
Also, various types of "consistency rules" must obviously be counted as the tools to kill. Unitarity must be satisfied, gauge anomalies must be absent, and a nearly flat & empty space must be a solution of your theory, among other things. The "large mutations" discussed previously can lead to many mathematical structures that don't satisfy even these basic rules. It takes a second to get rid of these lethally sick candidate theories.
Linearization and parameterization of the viable theories
Finally, there is one more important method that I included in the list. Because we have an increasing number of constraints - not only direct experimental constraints but also theoretical constraints that are able to instantly eliminate seemingly viable complex theories essentially without any experiment whatsoever, we are learning about the location of the correct theory within the landscape of possible theories increasingly accurately, relatively speaking.
While scientists usually test many hypotheses at the same time and only a very few of them have any chance to succeed - that's also the source of Popper's asymmetry when he assumes that the probabilities can only go down (for most hypotheses, the wrong ones, they go down) - we simultaneously want to keep the pool manageable and only consider limited enough, finite classes of hypotheses in which the right one doesn't get completely lost - that's why you shouldn't neglect that the probabilities also can go up (there exists a correct theory, after all) and why Popper's asymmetric description is flawed.
The wiggle room - i.e. the differences between promising candidates - are getting rather small. When you try to describe a small vicinity of some place in space, things may simplify. When you know the correct theory almost accurately, assuming a certain collection of basic concepts, it is usually a good idea to parameterize the correct theory by a few (or many) parameters that should be adjusted. Once you replace the promising neighborhood of a submanifold by the tangent space, you are clearly linearizing the problem.
Once you know the fields of the Standard Model, the correct theory is described in terms of the masses and couplings from the Lagrangian. For practical purposes, you may also show that the non-renormalizable couplings are irrelevant for easily testable predictions (they are also irrelevant in the technical sense) and you end up with a few dozens of parameters that matter.
Renormalizable quantum field theories with a few fields typically have a limited number of continuous adjustable dimensionless parameters. At some moment, the only task is to determine their values more accurately if you're only interested in particle physics below 100 GeV. The qualitative part of the physics research of this issue is over. What is left is rather boring. You have to adjust the very fine details and parameters. I would hesitate whether these final steps should be considered as induction.
However, the example of biology shows that the viable life forms must often be encoded in discrete data. The human DNA has a few billion bases. But if you self-confidently decide that the humans are the only intelligent animals above some IQ level that the Earth has seen so far, typical (and even less typical) human beings may be encoded in a much smaller number of bytes describing the properties or parameters of various genes. When we are getting very close to the full understanding of reality, it is very important to know the "legitimate playground" and to describe its structure and "geometry" as clearly as possible. New emergent concepts are useful to achieve this goal ("genes" in biology or "renormalizable couplings" in quantum field theory, besides millions of other scientific concepts).
The stringy landscape is of course far from being the only example of a parameterization of viable theories by discrete parameters. Such a parameterization is omnipresent in science. Conceivable perturbative theories of particle physics may be described in terms of gauge groups, representations in which scalars and fermions transform, and the continuous renormalizable couplings and masses. The discrete part of this data plays pretty much the same role as the discrete data that determine the vacuum in string theory.
In some sense, the number of discrete options is analogous and comparable in quantum field theory and in the stringy landscape. Just like quantum field theory itself doesn't predict the number of generations, string theory as we know it today doesn't predict the preferred topology of the Calabi-Yau manifold.
However, string theory completely eliminates the adjustable continuous dimensionless parameters. The only undetermined properties are the discrete parameters which is a gigantic progress in comparison with quantum field theory that has continuous parameters. And if you count the irrelevant ones (that become important at high energies), quantum field theory has infinitely many of these adjustable continuous parameters. Still, even in string theory, one has to go through the discrete candidates and find tools how to eliminate the incorrect ones.
It might be the case that such a vacuum selection is governed (completely or partially) by some new, mathematically described laws; if it is so, one could calculate what the right choice is. We don't know what the hypothetical rule exactly is at this moment. However, it might also be that no canonical calculation of the number of families and the representation of fermions etc. exists in string theory. If it doesn't exist, it is clear that we may only determine the right solution by a careful comparison of more detailed properties of the vacua with the experience, i.e. to use the old-fashioned verification to eliminate candidates.
While this "anthropic" scenario is often presented as the ultimate catastrophe because of the perceived "lack of predictivity", it wouldn't be the first time when experimental verification is needed to eliminate wrong models that talk about physics of these experiments. ;-) Our knowledge of low-energy physics could be enough to induce and deduce all detailed physical properties of Nature at the Planck scale, 15 orders of magnitude above the accessible energy scale (which is the regime we want to understand, almost by definition), by "pure thought" but it could also be insufficient. No one knows for sure: there exists partial evidence in both ways. This question doesn't really depend on any characteristic technical features of string theory; attempts to make this general question emotional or even to link it with the fate of string theory are irrational.
Note that there is no difference between string theory and quantum field theory - understood as broader frameworks or toolkits to propose models - in this particular aspect. When people were building the Standard Model, they had to determine how many fermions they had to include and what were the representations. That picked the general form of the Lagrangian from infinitely many discrete choices of a similar kind. In the string language, one makes similar discrete arguments to eliminate most of the vacua - the discrete parameters are just reshuffled and reorganized relatively to quantum field theory.
The task in string theory is more constrained because the continuous parameters are absent but it is arguably more complicated than in quantum field theory because of another reason: there may exist many discrete vacua that have the same low-energy limit (the limit we have already observed) and new observations (combined with complicated calculations) might be needed to figure out which of them is the right one. But the number of options we must go through is whatever it is.
The number is determined by Nature and the laws of mathematics that even She must respect, not by someone's wishful thinking. It is analogous to the number of possible human DNA codes. Someone could also demand that Nature together with the biologists only allow a couple of human DNA codes because humans are so special, as argued in the Holy Scripture and elsewhere. However, the number of similar human-like DNA codes is comparable to 4^{3 billion}. You can complain about it or write dumb books for dumb readers but that's the only thing you can do against the facts of Nature. The number of discretely separated currently alive candidate theories of the Universe, those that include quantum gravity, is much smaller than 4^{3 billion}. But we also know that if our "job competition" only "cuts" the set of candidates by a few general conditions, the "short list" will still contain many more entries than billions.
Nevertheless, what matters in science is whether a theory is correct or not and we know that string theory has not yet been falsified while local quantum field theories including gravity have already been falsified as a theory of the Planck scale because it is inconsistent in that regime. Consistency is much sharper a tool that can eliminate (kill) theories than any kind of a wishful thinking about the number of options that one should have.
It is plausible that I have forgotten some important types of inductive thinking that are crucial for science. For example, I haven't discussed
- the method to qualitatively determine whether two phenomena, objects, issues, or events are causally related or not (organizing related phenomena as "nearby objects" or "causes and effects" by looking at and evaluating correlations and the time ordering of events),
- whether they have a similar origin (by searching for common patterns and transformations in between them as well as their ancestors), and
- Occam's razor (that can remove apparently useless parts of overly convoluted hypotheses that have been constructed in an excessively exuberant fashion).
And that's the memo.
Induction & how scientists think
Reviewed by DAL
on
May 17, 2008
Rating:
No comments: