Poverty of Stimulus Arguments and Behaviourism.

POVERTY OF STIMULUS ARGUMENTS AND BEHAVIOURISM

PART 1: CHALLENGES TO THE APS

SECTION 1: IMPLICATIONS FOR CHOMSKY AND QUINE

When it comes to the details of how children learn their first language there is a substantive difference between Chomsky and Quine. The primary difference between them centres on the role that they think reinforcement plays in a child learning his first language. The Quinean picture of a child learning his first language involves the child using his innate babbling instinct as he mouths various different words. The parent reinforces these emissions positively and negatively until the child’s pattern of verbal behaviour is battered into the external shape of his social environment. As Quine put it in Word and Object:

People growing up in the same language are like different bushes trimmed and trained to take the shape of identical elephants. The anatomical details of twigs and branches will fulfil the elephantine form differently from bush to bush, but the overall outward results are alike (1960, 8)

 

Linguistic nativist Chomsky disagrees with this Quinean picture. He thinks that the outward shape of language results not from the child’s faltering attempts at speech being corrected by his peers, but from the child using his innate universal grammar to structure the data of experience which the child contingently encounters.

            The issue between Chomsky and Quine on this point is a purely empirical one. In the last twenty years much detailed evidence has emerged which can be used to decide between the two theorists.  The central idea around which nativism was built has been poverty of stimulus arguments. Chomsky has argued that children display knowledge of language, and that this knowledge is not provided by the environment, and therefore it must be innate. Typically Chomsky uses the subject-auxiliary inversion rule to illustrate how poverty of stimulus arguments work.  In their paper ‘‘Empirical Assessment of Stimulus Poverty Arguments’’ Geoffrey Pullum and Barbara Scholz call the subject-auxiliary inversion rule the paradigm case which nativists use to illustrate poverty of stimulus arguments.  They cite eight different occasions that Chomsky uses the example (Chomsky 1965, 55-56; 1968, 51-52; 1971, 29-33; 1972, 30-33; 1975, 153-154; 1986, 7-8; 1988, 41-47). They also cite other Chomskian thinkers (including linguists such as Lightfoot, 1991, 2-4; Uriagereka, 1998, 9-10; Carstairs-McCarthy, 1999, 4-5; Smith, 1999, 53-54; Lasnik, 2000, 6-9; and psychologists such as Crain, 1991, 602; Macrus, 1993, 80; Pinker, 1994, 40-42, 233-234) who here endorsed the claim. They argue that this supposed instance of an APS is being passed around and repeated over and over again. No surprise, then, that without having knowledge of Pullum and Scholz’s article, I chose the subject-auxiliary inversion as my paradigm example of an APS.  I had fallen into the same pattern of passing around this well-worn example. In this paper, I examine whether the APS argument (as applied to syntactic knowledge) actually works. I will discuss how, if sound, the APS affects Quine’s view of language learning.  It will be shown how Quine’s theories on language acquisition will be affected if the APS turns out to be false.

            Pullum and Scholz (2002) show that poverty of stimulus arguments are used in a variety of not always consistent ways in the literature. Having surveyed some of the literature on the APS, they isolate what they believe to be the strongest version of the argument.  The argument they construct is as follows:

(A) Human infants learn their first languages either by data-driven learning or by innately-primed learning.

(B) If human infants acquire their first languages via data-driven learning, then they can never learn anything for which they lack crucial evidence.

(C) But infants do in fact learn things for which they lack crucial evidence

(D) Thus human infants do not learn their first languages by means of data-driven learning.

(E) Conclusion: Humans learn their first languages by means of innately primed learning.

 

This gloss on the APS is one that Chomsky would accept as an appropriate schematisation of the APS, though Chomsky believes that there is more evidence to support a belief in innate domain-specific knowledge than the APS[1]. Pullum and Scholz claim that the key to evaluating the soundness of the APS is premise (C) which is the empirical premise of the argument. So to evaluate the argument, they study the linguistic environment of children. Their aim is to check if there really is no evidence provided by the environment which the child can use to formulate a hypothesis of a particular rule. For example, will a child be presented with data such as ‘Is the man who is at the shop happy?’, which can help them learn the subject-auxiliary inversion? Pullum and Scholz’s research programme involves searching the Wall Street Journal corpus to discover if constructions which Chomsky claims a person could go much or all of their life without encountering are, in fact, more frequent than he would lead us to believe. However, prior to discussing what the evidence tells us about the frequency of the sentences, I first want to discuss what Quine would make of the APS as discussed by Pullum.

            The first premise of Pullum’s version of the APS is that a child learns language either by data-driven learning or by innately-primed learning.  Quine maintained throughout his entire philosophical career that our linguistic abilities are not distinct from our overall theory of the world. In fact, he has consistently maintained that learning a language is learning a theory of the world, and, furthermore that learning a scientific language is learning a more explicit regimented form of ordinary language. According to the picture presented in Word and Object, a child begins by babbling various different sounds and has these sounds reinforced in various different ways. Through the process of conditioning and reinforcement, the child eventually learns when it is appropriate to use which sounds. According to Quine, at this stage the child has not learned any concepts. Quine argues that through processes such as analogical reasoning, abstraction etc., children eventually learn to structure some of these sounds into syntactic units. It is only after we have mastered this syntax and can then speak of certain objects as being the same as or different than other objects, that we can be said to have grasped the concept of an object, and learned to speak about objects in the world. The important point is that for Quine, the processes which a child uses to learn a language are the same as the processes he uses to learn about the world. So Quine would not accept that language is learned by innately-primed learning (in the sense of innate domain-specific knowledge). The question of whether a child learns his first language by data-driven learning is a more complicated question on the Quinean picture.

            On some versions of data-driven learning, the child is presented as a passive observer of verbal behaviour. From when they are born (strictly speaking, when in utero as well), children are bombarded with verbal behaviour. So, on one data-driven learning picture, children (unconsciously) observe the various different patterns of verbal behaviour; circumstances of occurrence, order of occurrence, tone used etc. and unconsciously construct a model of the language they are presented with.

            Quine does not deny that the child uses such statistical methods to organise the data of experience; so in this sense he agrees with the statement that a child learns by data-driven learning. However, for Quine, the word ‘data’ has a much wider meaning than mere models constructed based on observed linguistic regularities. For Quine, an important part of the data is the type of reinforcement that the child receives. The child elicits utterances and receives various different types of reinforcement, either negative or positive depending on the appropriateness of the utterance. So on the Quinean picture, as the child is learning his first language, he might be reinforced for putting forth a question such as ‘Will Mama feed me?’. Now suppose the child had been constructing questions by moving the first auxiliary of various statements to the front of the sentence, and suppose further that the child had been positively reinforced for this behaviour. Given this state of affairs, the child will continue to emit questioning behaviour like this until he receives negative reinforcement. Now suppose that the Quinean child wants to ask a more complicated question; suppose he wants to discover whether the sentence ‘The man who is tall is sad’ is true. The child will continue along the pattern of previous questions and will turn the statement into a question and ask ‘Is the man who tall is sad?*. On the Quinean picture, this questioning behaviour will be negatively reinforced. The child will continue to try different constructions based on past experience and reinforcement until eventually their language output is moulded into the shape of the child’s community.  So for Quine, the data the child learns from is not merely observation of the statistical patterns of the language he is exposed to, but also includes the ways various constructions are reinforced negatively and positively.  The important point is that in order for Quine to accept premise (A) of Pullum and Scholz’s reconstruction of the APS, he would have to understand data in a wider manner than that obtained by mere passive observation.

            Quine would accept premise (B) as long as data-driven learning is considered in this expanded sense (reinforcement, plus statistical regularities in the environment).  Premise (C) is the crucial empirical premise: but infants do in fact learn things for which they lack crucial evidence. The “parade case” of linguistic nativists where children display knowledge where they have not been provided crucial evidence is the subject-auxiliary inversion. Quine emphasises induction, analogy, and reinforcement as the primary tools in language learning. He has never endorsed the claim that children have knowledge of the rules of language for which they have not received data from their linguistic environment. For Quine, any sentence a child utters is either learned inductively from the child’s PLD or is constructed through an analogy with previously heard utterances in the PLD. Through induction, analogical reasoning and reinforcement, the child will eventually arrive at the language of his peers. So Quine would deny the truth of the crucial empirical premise (C). Furthermore, premise (C) is a crucial test of Quine’s theory of language acquisition: if it could be demonstrated that a child has knowledge of a rule of language which was not learned by experience, analogy or reinforcement, then this would demonstrate that Quine’s theory of language acquisition is seriously incomplete.

 By reviewing the Wall Street Journal corpus, Pullum and Scholz have provided evidence that the constructions which Chomsky claims a child will never be exposed to in their lifetime do, in fact, occur. They used the Wall Street Journal because it is easy to obtain and free. People have justly complained that the Wall Street Journal is obviously not a representative of the type of data a child will be exposed to. To this they have replied that since Chomsky claimed that the type of sentences which a child needs to be exposed to in order to learn structure dependence rules are so rare that a child can go much or all of his life without encountering them, then the Wall Street Journal is therefore evidence that Chomsky is wrong on this point.  Geoffrey Sampson, in his book The ‘Language Instinct’ Debate, has provided evidence that the type of constructions which Chomsky claims are vanishingly rare occur in children’s books. Furthermore, he has searched the British National Corpus (including a search of child-parent interaction) and found hundreds of examples of the relevant constructions[2].

            Pullum et al. think that since they have shown that premise (C) is not in fact true, then the overall argument, while valid, is not sound and therefore the argument for linguistic nativism does not go through. Let us assume that Pullum is correct and that Chomsky’s argument is not sound: what then are the implications of this for Quine?

            As we have seen, Quine thinks that children learn language through data-driven learning, in a broad sense. One of the primary objections to the Quinean picture of language learning is that negative reinforcement does not play the role in language learning that Quine thinks it does.  A wide variety of experimental evidence has been put forward by psychologists who claim that this evidence shows that children are not corrected when they speak ungrammatical sentences (see for example, Marcus 1993, 53-85; Gropen, J., S. Pinker, et al.,  1989, 203-57; Crain, S. and M.Nakayama 1987, 113-25). At a superficial level, this seems to show that Quine’s picture of language acquisition is incorrect.  The empirical evidence seems to indicate that the picture of a child mouthing constructions such as ‘Is the man who tall is sad?’*, and receiving negative reinforcement is, in fact, incorrect. Therefore, one could conclude that even if Pullum is correct that the child is exposed to some examples which help the child learn the structure-dependent rule, this view will not help the Quinean conception of language learning.

            However, it does not automatically follow that because explicit reinforcement is not involved in language learning that a more subtle kind of reinforcement is not used. Whether or not reinforcement is explicitly used in learning complex grammatical utterances, it is unquestionable that children do receive positive reinforcement for speaking.  When a child begins to speak first, every utterance is encouraged and rewarded with affection. In Word and Object, Quine notes that any reinforcement that the child receives will be concomitant with a variety of different stimulations. As he writes:

The original utterance of ‘Mama’ will have occurred in the midst of sundry stimulations, certainly; the mother’s face will not have been all. There was simultaneously, we may imagine, a sudden breeze. Also there was the sound of ‘Mama’ itself, heard by the child from its own lips. (1960, 81)

 

So, for Quine, the effect of the reinforcement will be that the child will repeat the word in the presence of Mama’s face, in the presence of a mild breeze, and upon hearing the sound mama. However, the child will not receive reinforcement for saying ‘Mama’ in the presence of a sudden breeze, so will eventually stop emitting this behaviour.  The child will, however, receive reinforcement for saying ‘Mama’ in the presence of mama, and for repeating the word ‘Mama’ upon hearing someone near him speak it.  One helpful consequence of this type of reinforcement, according to Quine, is that the child who is being reinforced for repeating ‘Mama’ when someone says ‘Mama’ will, from the parent’s points of view, appear to be engaging in mimicry. If the child can recognise that he receives reinforcement not just for sounds used in certain appropriate contexts but also for mimicking the behaviour of his peers, then he  will have had a very useful tool reinforced. The child will have realised that it pays to listen to his peers and to try to imitate their behaviour. To this end the type of statistical abilities postulated by people such as Lappin and Clark will be obviously useful in helping the child learn his first language. Furthermore, if what is being reinforced is mimicking behaviour, then the fact that certain sentences which Chomsky claims do not occur in the data do, in fact, occur, this will obviously be of vital importance for Quine’s theory. Obviously, Quine’s mimicking theory will only work if the child experiences the constructions which he displays knowledge of. All of this is schematic. While it does not show that Quine’s theory of language acquisition is correct, it shows that recent research which purports to show that Chomsky’s APS arguments in syntax do not work, can also play a role in supporting Quine’s theory of language acquisition.

           

                      SECTION 2: RECENT CRITICISMS OF THE APS

The first criticism that I will consider is a logical argument which has been put forth by Geoffrey Sampson. In his The ‘Language Instinct’ Debate, Sampson claims that it doesn’t matter whether there is data which refutes Chomsky’s APS, because the argument is self-refuting in itself. He attributes to Chomsky the following claim: ‘Language has certain properties no evidence of which is available in the data to which we are exposed to when learning the language.’ He then asks how Chomsky can possibly know this. The adult’s conscious knowledge of the properties of the language is based on observations of the language, but Chomsky claims that such observations are insufficient to determine the properties of the language. Therefore, if there is a grammatical rule which a language learner rarely or never encounters in their data, then there seems no reason why a linguist would encounter such a grammatical rule either. Both the language learner and the linguist are exposed to the same data which Chomsky claims will not determine the rules of the language at all.

            In essence, what Sampson claims is that if there is no evidence in the data from which a child can learn the rule, then there is no evidence in the data which justifies the linguist in postulating the rule. Hence, for Sampson, the APS is self-defeating. One possible way for the linguist to overcome this difficulty would be to claim that he uses his innate knowledge of grammar as well as observation to discover that the rule obtains. However, Sampson correctly notes that to argue thusly is to beg the question against your opponent. So he concludes that the APS is either self-defeating or a mere question-begging stipulation.

            A key aspect of Sampson’s argument is his emphasis on Chomsky’s claim that children will never encounter certain constructions in their experience which could help them learn the relevant rule. He quotes the following statement of Chomsky’s given in a 1980 lecture:

The child could not generally determine by passive observation whether one or the other hypothesis is true, because cases of this kind rarely arise; you can easily live your whole life without ever producing a relevant example…you can go over a vast amount of data of experience without ever finding such a case… (1980, 121)

 

In the above quote Chomsky claims that sentences which confirm the subject-auxiliary inversion rule are virtually never encountered. Sampson then asks rhetorically: if such sentences are never encountered, what reason would we have to say the rule exists?

            The main difficulty with Sampson’s argument is that the data a professional linguist is exposed to obviously far exceed the data a language learner would encounter. A child from a professional background will be exposed to about 30 million word tokens by the time they are three years old[3]. A linguist from a similar background (assuming that he has completed a PhD and is around 27) will have been exposed to about 240 million word tokens[4]. So a typical linguist will have encountered at least nine times the number of words that a typical child has. Obviously, if we accept Chomsky’s claim that a child can go much or all of his life without encountering the relevant constructions, then the data the child is exposed to will be irrelevant. However, when Chomsky and other linguists discuss APS examples, what they typically state is that the data is insufficient for the child to learn the general rule, not that that there is no data at all.  So, bearing in mind the fact that a linguist is exposed to at the very least nine times the data that a language learner is, it is quite possible that the linguist will be exposed to enough examples to learn of the existence of the auxiliary inversion rule, while the child may not have been exposed to enough data to learn this rule from his PLD. Furthermore, the linguist will have access to other languages with which to compare his data from English. He will have conversational partners to discuss his findings with, and will have access to thousands of books and articles detailing the discoveries of other linguists. This evidence indicates that the linguist will have been exposed to much more than nine times the amount of linguistic data which your average child is. On these grounds, it is clear that Sampson’s argument is inconclusive at best. To show that Chomsky is making a claim that is self-refuting, Sampson needs to demonstrate that a child and a professional linguist are exposed to the same amount of linguistic data. Such a claim is of course patently absurd.

            While Sampson’s argument does not work as a demonstration that Chomsky’s APS is self-refuting, it does reveal a real weakness in Chomsky’s APS. Chomsky is making claims about the child’s PLD for which he has provided no evidence. So Sampson’s argument at least demonstrates the necessity of Chomsky providing evidence for the controversial claims he is making.

            Pullum and Scholz did the first detailed study of how often sentences relevant to the structure-dependent APS appear in the data a child is exposed to.  As I discussed above, they began by making the logic of the APS explicit by structuring it as a logical argument. They isolated the third premise which claims that data relevant to learning the structure-dependent nature of language do not occur enough in the child’s PLD for him to learn the relevant construction.  They set out to test this claim by checking a corpus of linguistic text; they used the Wall Street Journal as their corpus because it was freely and easily available.

            In order to test how often a construction is encountered by a child learning a language, it is first necessary to test how much linguistic data a child is exposed to. Pullum and Scholz relied on the work of the psychologists Hart and Risely, who in their 1995 Meaningful Differences in the Everyday Experiences of Young Children, detailed the amount of linguistic data a child is exposed to. Hart and Risely documented the vocabulary development of forty-two children aged 1-3. The authors noted the production and use of language of the children as well as the language they were exposed to.  They also noted that the amount of linguistic data a child is exposed to depends greatly on the socio-economic class that they belong to. According to their study, a child from a professional household will have been exposed to about 30 million word tokens. A child from a working-class family will have been exposed to 20 million word tokens. And a child from a family on welfare will have been exposed to 10 million word tokens.

            Pullum and Scholz also report findings from Hart and Risely’s book which indicates that 30% of the speech directed at children is in the form of interrogatives. Hart and Risely also estimate that the mean length of utterances directed to children is four words long. Pullum and Scholz then argue that if we take the statistic of a child whose family are on welfare, being exposed to 10 million word tokens, divided into sentences four words long, we arrive at the conclusion that the child is exposed to 2.5 million sentences every three years. And furthermore since 30% of those sentences are interrogatives, we can argue that the child is exposed to seven hundred and fifty thousand questions every three years, i.e. a quarter of a million questions per year. In their research of the Wall Street Journal they discovered that the questions relevant to learning the structure-dependent rule occur in 1% of interrogatives in the corpus. From this they conclude that a child will typically be exposed to seven thousand five hundred relevant examples in three years. This means that the child will be exposed to two thousand five hundred examples per year; therefore on average the child will be exposed to seven relevant questions a day. They conclude their paper by asking if seven relevant questions a day is enough to learn such a rule. Furthermore, they correctly claim that if nativists think that it is not, they need to explicitly set out a learning theory which shows why it is not.

            The obvious objection to the above argument is that Pullum gets his data from The Wall Street Journal, and such data is hardly representative of the linguistic experience of the child. Pullum cites some research which shows uniformity across linguistic texts as evidence that Wall Street Journal may, in fact, be representative of the child’s linguistic experience. However, the fact that the Hart and Risely research claims that child-directed sentences are typically four words long shows this to be incorrect. The average length of sentences in the WSJ will obviously be much longer than four words.  Geoffrey Sampson’s research taken from the British National Corpus (not available in America at the time Pullum and Scholz were writing) uses samples of speech between child and parent as well as the ordinary speech of adults, so it avoids some of the difficulties of Pullum and Scholz’s research.

            In his (2002) paper ‘‘Exploring the Richness of the Stimulus’’, Samson largely agrees with Pullum and Scholz’s research; however, he claims that while his research is complementary to theirs it is not subject to the same objections. He sampled the normal conversational speech which people typically have with each other and which a child is routinely exposed to. To this end he used the British National Corpus (henceforth BNC). He used the demographically sampled speech section of the BNC which he claimed contains 4.2 million words. This section of the BNC was constructed by giving recording equipment to individuals selected to be representative of the national population with respect to age, social class, and region (2002, 3). By exploring this corpus, Sampson aimed to avoid the criticisms directed at Pullum and Scholz which claimed that their corpus did not accurately represent the data a child is exposed to when learning a construction.

            Sampson begins his discussion by making a terminological point. Whereas Pullum and Scholz use the term ‘auxiliary verb’ as something that can be the main and sole verb of a clause, Sampson calls a verb ‘auxiliary’ only if it is followed by another verb. For this reason, while Pullum and Scholz would call the following sentences auxiliary inversions, Sampson would call them ‘verb-fronting questions’.

                                     

 

                       VERB-FRONTED CONSTRUCTIONS

 

Here we will discuss what Pullum and Scholz refer to as ‘auxiliary-initial clauses’. Poverty of stimulus theorists claim that children typically will not hear examples  of questions formed by fronting verbs which in the corresponding declarative statements are preceded by complex constituents. Sampson aimed to test what people actually say when speaking to each other. He did this to help him understand whether the poverty of stimulus theorists were correct. However, when trying to analyse the data he found an unexpected complication. There are two different types of verb-fronting sentences, both of which Pullum and Scholz include in their WSJ search. These different types of constructions occur in radically different magnitudes in spoken speech.

            The first type of verb fronting is of the following form:

(1) Will those who are coming raise their hands?

(1a) Those who are coming will raise their hands

Sampson reminds us that in the above constructions, the complex constituent is the subject of the fronted verb. So he calls sentences 1 and 1a verb-fronting sentences which involve complex preverbal subjects.

The second type of verb fronting has the following form:

(2) If you do not need this, can I have it?

(2a) If you do not need this, I can have it.

Sampson reminds us that in 2 the main clause is preceded by an adverbial clause. He calls sentences like 2 and 2a ‘‘verb-fronting sentences’’ involving initial adverbial clauses. He first begins to consider questions of the form of 2 which he calls initial adverbial clauses.

                                    

                          INITIAL ADVERBIAL CLAUSES

Sampson searched for adverbial initial clauses in the BNC-demographic (which contains 4.2 million words). He claimed that his search was not exhaustive because such an exhaustive search would be extremely difficult with this grammatical pattern and the BNC corpus. He did not offer any reasons why this particular grammatical pattern would make an exhaustive search so difficult. However, he did claim that such a detailed search was not necessary since Chomsky had claimed that ‘a person might go through much or all of his life without being exposed to a relevant construction’, and that therefore finding any examples of the constructions would refute Chomsky.

            In attempting to find such examples, Sampson targeted cases where the adverbial clause begins with if. He found twenty-two clear cases of initial adverbial clauses.  He furthermore claimed that Wh-questions could also be considered relevant. Wh-questions also involve moving an auxiliary of the main clause, rather than one in the preceding adverbial clause. And he claimed that that if this class is relevant, then he had a further twenty-three cases. However, he realised that counting Wh- questions would be controversial, so he only counted the twenty-two constructions which he found for initial adverbial clauses.

            Sampson uses Hart and Risely’s estimates of how many words a person is exposed to every three years. He takes the figure they provide that a working class person is exposed to twenty million words every three years.  This choice itself is controversial; there is no reason to focus on the stimuli that a working class child is exposed to rather than the stimuli that a professional child is exposed to, or the stimuli that a child from a family on welfare is exposed to. If we accept that children from linguistically deprived backgrounds develop normal linguistic abilities, then the figure of ten million should be used because children develop such abilities despite only being exposed to this amount of linguistic data. Furthermore, if the relevant constructions do not occur in the data, and children display competence of the rules, then this shows that the rule must be innate. However, Sampson would probably reply to this that the argument relies on the untested assertion that people from linguistically deprived environments have languages as richly structured as those of ordinary members of the linguistic community. Sampson has long argued against the dogma of convergence, the view that all speakers from all societies speak languages which are equally complex. He holds that if we are to establish that children from linguistically deprived environments have language as complex as their better educated colleagues, then we will need evidence to support this claim. And he holds further that nativists have so far not provided us with any evidence of this kind.

So to avoid begging the question against either nativists or anti-nativists, it is best to start, as Sampson does, with Hart and Risely’s figure of twenty million words every three years. So let us work out the numbers. Sampson found twenty-two constructions out of a corpus consisting of 4.2 million words. Using Hart and Risely’s data, we can estimate that the average length of each construction for a child up to three is four words long. So we can estimate that Sampson’s 4.2 million words amounts to about 1.1 million sentences in the corpus. Hart and Risely estimate that a working class child will be exposed to five million sentences (of four words long) in the first three years of their lives. So if Sampson finds the relevant data twenty-two times out of  1.1 million sentences, then we can expect that he will find at least one hundred and ten examples in five million sentences. This would work out at about thirty seven relevant examples per year. So a child could expect to encounter a relevant construction at least once every ten days.[5]

            The question which Pullum and Scholz raise in their paper can be fruitfully  asked of Sampson’s results: is one example every ten days enough for the child to learn the construction? The nativist who is claiming that innate domain-specific knowledge is the only explanation for our competence in the relevant construction owes us an answer as to why we cannot learn it from one example every ten days.  Typically nativists have not met this challenge; they have merely pointed to the supposed poverty of stimulus as evidence that the construction must be innate.  However, likewise, if anti-nativists claim that the relevant construction can be learned using some kind of data-driven learning, then they owe us a model of how this is done. Assessing whether such constructions can be learned by experience will require mathematical models of how learning from such few constructions is possible. Other possible tests may involve developing computer programmes which can learn from this amount of data. Such programmes have been developed already. So, for example, Clark and Eyraud (2007), Perfors et al. (2006), Reali and Christiansen (2005) have all developed programmes which can learn from less data than discovered by Pullum, Scholz and Sampson. I will review these models at the end of this paper. Ultimately what we have learned from this data is that Chomsky’s confident assertions that children cannot learn certain constructions from the data they experience have not been justified with enough evidence.

            The other type of verb fronting which Sampson discusses is the type of construction where the complex constituent is the subject of the fronted verb.  An example of this type of construction is:

 (1) Those who are coming will raise their hand.

(2) Will those who are coming raise their hand?

Here Sampson found some surprising results. Sampson discovered that on this point Chomsky was correct. In the 4.2 million word BNC, Sampson found no constructions of the relevant kind. However, he did not view this as providing support for Poverty of Stimulus theorists. He claimed, on the contrary, that the reason that the construction did not occur in the BNC is because the construction is not an idiom of ordinary English speech. It is rather an idiom of written English.

            Sampson’s search of the speech-directed portion of the corpus showed that the relevant construction never occurred in 4.2million word tokens. He did not do an exhaustive search of the written-language section of the BNC; instead he merely provided examples from random searches of the corpus. Here are some of the examples he found:

(14a) Did the fact that he is accompanied by a doctor on the campaign trail help to lose him last week’s TV showdown with Clinton? CAT.00742 (Punch magazine, 1992)

(15b) Did Mr Mortimer, 69, who has an Equity card, enjoy himself? CBC.08606 (Today newspaper, 1992)      

(16c) ‘Is the lady who plays Alice a child or a teenager?’ asked my six-year-old’ B0300647. (Alton Herald newspaper, Farnham, Surrey, 1992)

(17d) Is a clause which is known to be unenforceable in certain circumstances an unreasonable one? J6T.00908 (R. Christou, Drafting Commercial Agreements, Longman, 1993)

(18e) Will whoever is ripping the pages out of the stony new route book please grow up. CG2.1379 (Climber and Hill Walker magazine, George Outram and Co., Glasgow, 1991)

(2002, 18)

Sampson thinks that these examples show that children do not typically form questions using auxiliary fronting when speaking. He argues that constructing questions using auxiliary fronting is restricted to written questions. However, he does not provide any evidence as to how often such constructions occur in written work. His primary point is that, in order for an APS theorist to use a lack of examples in speech of  yes/no questions  formed by fronting a main-clause verb as evidence for innate knowledge, they have to rule out the possibility that children learn the rule from written language.  The fact that people will judge certain constructions as grammatical, despite not encountering them in spoken language, is not that important if the person has encountered them in written language. Here, in short, Sampson is shifting the burden of proof onto the APS theorist to show that the child cannot learn such rules from written language. And in absence of such a proof, he is assuming that the APS does not hold.

            What Sampson and Pullman and Scholz have shown is that the “parade case” of APS as put forth by Chomsky does not offer clear evidence at all.  Obviously much more research is needed on the topic. The important point to note is that this APS has been shown to be incorrect in claiming that children learn a particular rule in the absence of experience. Hence, this particular APS does not establish that Quine’s conception of language learning is incorrect. The question of the viability of Quine’s story of how the child learns his first language remains open. Nor of course can this APS be used to support Chomsky’s claim that we need to postulate innate domain-specific knowledge to explain language acquisition.

                                    SECTION 3: NEGATIVE EVIDENCE

In order to achieve a complete picture of what contemporary evidence tells us about the debate between Chomsky’s and Quine’s picture of language learning, we will now need to evaluate what the state of play is in regard to the issue of negative evidence. Most linguists believe that the issue of negative evidence is crucial to understanding language acquisition. The issue of negative evidence centres on the fact that children do not typically encounter ungrammatical sentences which are marked as such. A child will not, for example, hear a sentence such as ‘Is the child who beside the man is happy?’ along with a tag to indicate that the sentence is deviant. So the question arises as to how children know that these sentences are ungrammatical. The children are not presented with these sentences and told they are ungrammatical. Nor (so the theory goes) do they produce these ungrammatical sentences only to be systematically corrected by their peers. Hence, it is argued that the only way to explain how a subject tested by a linguist can clearly tag certain sentences as grammatical, and certain as ungrammatical, is to postulate innate domain-specific linguistic knowledge.

A key premise in the above argument is that children are not systematically corrected for their grammatical mistakes by their peers. This claim goes back to the experimental research of Brown and Hanlon (1970). In particular, Crain and Nakayama (1987) test the claim that children try out grammatical theories and weed out the false ones through explicit teaching from their peers.

Crain and Nakayama elicited yes/no questions from children between the ages of 3;2 and 5;11 in response to prompts such as Ask Jabba if the boy who is watching Mickey Mouse is happy. They found that (with different frequencies at different ages) children sometimes produced correct forms such as (15a) and they sometimes produced various incorrect forms, one example being (15b). However, they never produced the kind of incorrect form predicted by the ‘structure-independent hypothesis’ such as (15c). They offered this as support for the theory of innate linguistic knowledge[6].  Below are the two examples which children sometimes produced, and the structure-independent hypothesis which children never reproduced:

(15a) Is the boy that is watching Mickey Mouse happy?

(15b) Is the boy who’s watching Mickey Mouse is happy?

(15c) Is the boy who watching Mickey Mouse is happy? (2002, 20)

 

Examples like (15c) are the type of production which one would predict based on the structure-independent hypothesis. However utterances like these do not ever occur. In particular they claim that this shows that children are innately predisposed to prefer structure-dependent rules to organise the data of experience (Sampson 2002, 20). So here the issue is that children do not try out constructions like (15c) and have them criticised by their peers.  They automatically construct questions like (15a) and (15b) which are structure-dependent rules. That is, independent of poverty of stimulus considerations, Crain and Nakayama claim to have shown that children do not construct structure-independent rules, and so do not receive any negative evidence that sentences (15c) are ungrammatical. If we add to Crain and Nakayama’s claim the fact that children do not hear sentences like (15c) spoken yet know that they are ungrammatical, we have an argument for innate knowledge based on a lack of negative evidence.

            The argument of Crain and Nakayama is of vital importance to this paper. It offers support to Chomsky’s claim that children are born with an innate language faculty. It also contradicts Quine’s picture of a child learning the rules of syntax through positive and negative reinforcement. Obviously, if children do not utter constructions such as (15c), then Quine’s claim that such constructions are shown to be incorrect through negative reinforcement must be false[7]

            The supposed lack of negative evidence in the instance of auxiliary inversion may not be as damming to Quine’s picture of language learning as it appears. The data which Pullum and Scholz gathered from the Wall Street Journal indicates that in the case of subject-auxiliary inversion, children encounter about seven relevant constructions every day. Using statistical reasoning,[8] the child exposed to this type of experience from passive observation alone would within a few days of birth have evidence that the structure-dependent hypothesis was superior to the structure-independent one. A child who was unconsciously analysing the data of experience would then not even try the structure-independent rule of (15c) though he may try structure-dependent rules such as (15a) and (15b).

            However this argument fails when one takes account of Geoffrey Sampson’s work. His data shows that examples of verb-fronted sentences (excluding cases where the subordinate clause precedes the main subject which we discussed above) in actual speech is zero. So the child cannot learn the rule inductively. If Crain and Nakayama are correct that children never try out the barred interpretation, then this indicates that they are correct that the rule is innate.

            So it could be argued that Crain and Nakayama’s result combined with Sampson’s show that Quine’s conception of language acquisition is incorrect. A critic could argue that Sampson and Crain and Nakayama’s research shows that Quine is incorrect because they show that induction and negative and positive reinforcement do not play any role in learning that this particular rule is incorrect. However, given that Crain and Nakayama’s experiment only relates to the subject-auxiliary inversion, it could be argued that it is obviously not equipped to rule out stimulus/response learning entirely either. However, there is a more fundamental reply which could be raised to this experiment. The reply is Geoffrey Sampson’s and uses data from how people actually speak which is used to cast doubt on Crain and Nakayama’s experiment.

            Sampson has raised objections to this experiment based on his discoveries that children do not typically form questions using auxiliary inversion. He correctly notes that based on his corpus research children would not be expected to reply in the manner they do in the experiment. Crain and Nakayama use the fact that children never try out (15c) to support their claim that children do not use the structure-independent hypothesis. However, Sampson points out that in ordinary speech, as revealed by his corpus analysis, people do not use auxiliary inversion to form questions. According to his analysis, children should form the question in the following manners:

(16a) Is he happy, the boy who’s watching Mickey Mouse?

(16b) The boy who’s watching Mickey Mouse is happy, isn’t he?

Sampson correctly notes that, since we know from the corpus analysis that children do not typically form questions like (15a,b) in speech, it is odd that children would answer in this way in the experiment. He points out that Crain and Nakayama only give figures for children’s ‘correct’ question formation, so it is impossible to tell whether children tried out ‘(16a), and (16b)’. He further speculates that the fact that children use an idiom of speech not ever found in ordinary discourse may indicate that the children were primed for the experiment. Sampson’s discussion does not refute Crain and Nakayama’s experiment; however, it does demonstrate that the experiment is far from conclusive. So Crain and Nakayama’s experiment does not refute Quine’s trial and error position of language acquisition. The Crain and Nakayama experiment would need to be replicated and done in different cultures to be viewed as anything more than a suggestive idea. The other experiment which is typically offered as evidence that children do not learn their language through trial and error is Brown and Hanlon’s paper.

Brown and Hanlon’s 1970 paper purports to show that children do not learn by explicit instruction. However, their paper has negligible impact on Quine’s position, because behaviourism is not committed to reinforcement being explicit. In fact contemporary research in psycholinguistics supports the view that much language instruction is implicit rather than explicit. In the section below I will enumerate experimental research which bears on this point.

                     SECTION 4: EVIDENCE OF NEGATIVE EVIDENCE

 

In their 2003 paper ‘‘Adult Reformulations of Child Errors as Negative Evidence’’, Chouinard and Clark constructed an experiment which was designed to test whether adults were implicitly instructing their children about the rules of their language. The test aimed to discover if adults were using side sequences and embedded constructions as ways to correct the children’s utterances. On pages 9 and 10 of their paper they gave the following examples of side sequences and embedded corrections.

An example of a side sequence: (indented sequence is the correction).

(1) Roger: now-, um do you and your husband have a j-car.

           Nina: have a car?

            Roger: yeah.

       Nina: no-                                               (Startvik and Quirk 1980: 8.2a 335)

An example of embedded corrections.

(2) Customer in a hardware store looking for a piece of piping:

Customer: Mm, the wales are wider apart than that.

Salesman: Okay, let me see if I can find one with wider threads.

(looks through stock) how’s this?

Customer: Nope, the threads are even wider than that.

 

They claimed that adults made use of side sequences and embedded corrections to correct children’s errors and to keep track of what the children meant to say. The adult reformulations indicate to children (a) that they have made an error, (b) what the error was, and (c) the form needed to correct the error. 

In their experiment they set out to test the following four claims:

(1)Negative evidence is available in adult reformulations.

(2)Negative evidence is available to children learning different languages, and for different types of errors.

(3) More reformulations are available to younger children.

(4) Children detect and make use of the corrections in reformulations. (2003, 12)

 

                              METHODS USED IN THE EXPERIMENT

The experimenters got their data from five corpora in the CHILDES Archive. Three of the children were acquiring English (Abe from the Kuczaj corpus, Sarah from the Brown corpus, and Naomi from the Sachs corpus) and two were acquiring French (Philippe from the Leveille and Suppes corpus and Gregoire from the Champaud corpus) (Ibid., 13).  In order to analyse child errors, the experimenters included all spontaneous child utterances in the transcript, with the exception of utterances with unintelligible speech and child utterances preceded or followed by unintelligible speech on the part of adults. The experimenters first tested whether the children’s strings were adequate. If the string contained an error, they categorised what sort of an error it was, i.e. morphological, syntactic etc. They then checked whether the next adult utterance was a reformulation. The utterance was a reformulation if it repeated in corrected form the portion of the child’s utterance which had contained an error. They further coded the correction by noting whether it was side sequence or an embedded correction. They finally checked whether children took up the repeated change that had been made, rejected it or tacitly accepted it.[9]

            For the analysis of conventional child utterances, they took a random sample of 200 utterances for every age slice for each child. They identified all the error-free child utterances in the sample and tabulated how many of the constructions were replayed by the adult in the next turn. If the adult just repeated what the child said, they called it a replay (ibid., 14). They got two different researchers to code each transcript and they agreed on their codes 90% of the time. Where the two researchers disagreed, they resolved their disagreement by discussion.

            Once they had coded the transcripts, they coded each of the lines for detailed analysis. For each of the children, they enumerated the total coded lines and the total number of erroneous utterances. They then divided the data into age slices to track developmental trends.

                                        RESULTS

The following is a list of their results which bear on the five hypotheses which they put forth at the beginning of their paper.

(1) Negative evidence is available in adult reformulations.

They devised a table to represent the four different age slices of the three children in the English corpora, and they divided the adult replies into conventional and erroneous. They found that adults repeated erroneous utterances far more than they repeated conventional utterances. On average, they repeat erroneous utterances more than twice as often as they repeat conventional utterances. More interestingly, the percentage of corrections of erroneous utterances is extremely high. In the age slice 2.0-2.5, (of the three English-speaking children) the following pattern was observed: Abe had 67% erroneous utterances reformulated, Sarah had 65%, Naomi had 48%. In the French corpora, Philippe had 67% reformulations, and Gregoire had 60%. So in the age range 2.0-2.5, most of the children had at least 60% of their utterances reformulated, and Naomi, who had the lowest number of reformulations still received almost 50% corrections of her reformulations. Out of the other age-slices, the lowest number of reformulations for incorrect utterances was for the ages of 3.6 to 3.11. Here there was not enough data to complete the French reformulations; however, the reformulations for the English speakers were as follows. Abe received 28% reformulations for incorrect utterances, Sarah 41%, and Naomi 20%. So it is certain that children do receive reformulations of incorrect utterances. Even in the worst case that of Naomi between the ages of 3.6-3.11 20% of her incorrect utterances were corrected. Below, I discuss whether 20% correction is enough for a child to learn various different rules. Of the corrections given side sequences made up the majority of corrections the children heard as opposed to embedded corrections. Chouinard and Clark (ibid., 21) claimed that of the five children Abe, Sarah, Naomi, Philippe and Gregoire, the amount of side sequence corrections respectively was 57%, 70%, 70%, 73%, and 62%. In other words the reformulations in the majority of cases were designed to check what the child had meant.

NEGATIVE EVIDENCE IS AVAILABLE FOR CHILDREN LEARNING DIFFERENT LANGUAGES, AND FOR DIFFERENT TYPES OF ERRORS

The study found that negative evidence was available for each of the children whether they were French or English. Furthermore, negative evidence was used at comparable rate whether the error was phonological, morphological, lexical or syntactic. And again, reformulations occurred at a much higher rate than repeating of conventional utterances.

MORE  REFORMULATIONS ARE AVAILABLE TO YOUNGER CHILDREN

In general, this prediction was shown to be correct. Adults tend to decrease their reformulations as children get older and make fewer mistakes. However, there was one exception to this trend: as Naomi got older, her errors were reformulated more. So this question needs to be looked into further.

CHILDREN DETECT AND MAKE USE OF THE CORRECTIONS IN REFORMULATIONS

Obviously, just because adults use reformulations it does not follow that such reformulations are understood and used by the children. Evidence that children understand and use such reformulations can only be discovered by noting how the children respond to them. Chouinard and Clark discuss four possible ways that children could respond to an adult reformulation: (1) they can take up the reformulation explicitly by repeating it and, in doing so, correcting at least part of their original utterance; (2) they may overtly reject the adult’s reformulation, thereby signalling that the parent has misinterpreted what the child intended, and when the parent tries a different reformulation the child may accept it; (3) after hearing a conversation, they may acknowledge it at the start of the next turn in the conversation. (4) they can simply continue with the conversation without overtly acknowledging the change or taking it up; such continuations could be counted as tacit acceptances of adult reformulations. Overall, the responses where children acknowledged a reformulation or repeated new information, alongside those where they either took up or else rejected the formulation were as follows: Abe 56%-72%, Sarah 25%-38%, Naomi 39%-100%, Philippe 39%-75%, Gregorie 25%. By any standard, this shows that children do attend to reformulations a sizable percentage of the time.

                                          GENERAL DISCUSSION

Four of the children used in the experiment had a parent who was college-educated; however, as Chouinard and Clark acknowledge, it is unclear whether this experiment will generalise across social classes.[10]Furthermore, only two cultures were used in the experiment, so it is unclear whether the experiment can generalize across cultures. Chouinard and Clark further discuss the oft cited evidence from Ochs and Schieffelin (1984), who claim that in some cultures negative evidence is not presented to children, because the child’s parents do not interact with the child till they are competent speakers. Their paper is of central importance because it is usually offered as key evidence in favour of the nativist’s argument for innate domain-specific linguistic knowledge. Usually nativists will point to Brown and Hanlon’s 1970 paper, as well as Crain and Nakayama’s 1987 paper, to demonstrate that children to not receive explicit negative evidence. However, typically anti-nativists point out that, while there is evidence that children do not receive explicit instruction, they do in fact receive implicit instruction. To this nativists reply that, while this may be so in our culture, it is certainly not so in all cultures. To demonstrate this point, they cite Och and Schieffelin’s paper. It is claimed that since all of the members of the various different human cultures learn a language, and only some receive negative evidence, then negative evidence is not a key factor in learning language. This objection is clearly relevant to the work of Chouinard and Clark as they only considered two cultures in their experiment. However, Chouinard and Clark consider this objection (ibid., 39) and offer a criticism of it. Ochs and Schieffelin had claimed that in Kaluli and Samoan cultures parents do not converse with children who are not yet competent users of language. They claim that, contrary to what is typically believed, Ochs and Schieffelin’s paper is in fact largely consistent with their own findings. It is true that in the Kaluli and Samoan cultures adults do not converse with children who are not yet competent speakers of the language. However two points need to be made about this. Firstly, the fact that adults do not engage in conversation with children of this age may not be as important as sometimes considered if older children in the community converse with the younger children of the community.[11] Secondly, even if parents do not converse with children who are not yet competent users of language, it does not follow that they do not correct their language use. Chouinard and Clark cite a section of Ochs and Schieffelin which indicates that children of the Kaluli do indeed receive negative feedback:

Kaluli mothers pay attention to the form of their children’s utterances. Kaluli correct the phonological, morphological, or lexical form of an utterance or its pragmatic or semantic meaning. (1984, 293)

 

Chouinard and Clark claim that in the Kaluli culture feedback takes the form of adults telling the children what to say on different occasions. So, for example, the adult would prefix a child’s statement with the instruction ‘elema’ (meaning ‘say like that’). If the child makes a statement which is grammatically incorrect when talking to another person, the adult will face that person and say ‘elema’ followed by the grammatically correct utterance. So clearly in this culture explicit instruction is used to teach the child how to speak.  However, contrary to what is typically reported, Ochs and Schieffelin do not provide evidence against anti-nativist theories of language acquisition.

            The overall conclusion of Part 1 of this paper is that contemporary evidence is still largely consistent with the picture of language acquisition sketched by Quine in the 1960’s. The arguments by Chomsky and those influenced by his paradigm have not shown that Quine’s view of language acquisition is incorrect.  However, there are some nativists who claim that Pullum, Scholz, and Sampson, et al. are guilty of attacking a straw man.  It is claimed, that the APS they criticise is not really the APS which Chomsky uses. In Part 2 of this paper I will consider whether the alternative conception of the APS argued for by John Collins avoids the counter arguments by Pullum et al. I will further analyse how this alternative APS affects Quine’s picture of language acquisition. I will then discuss Chomsky’s latest attempt to defend the poverty of stimulus argument and will demonstrate that it fares no better than earlier versions of the argument.

                                PART 2: CONCEPTIONS OF THE APS

                                            INTRODUCTION

In this part 1 of this section, I consider an argument by the philosopher and Chomsky scholar John Collins which purports to show that Pullum et al. have misunderstood the nature of Chomsky’s APS. Collins directs his arguments against Fiona Cowie, a philosopher who in her book What’s Within, uses Pullum and Sampson’s data to argue against Chomsky. Since Cowie uses Pullum’s reconstruction of Chomsky’s APS Collins’s reply to her can be taken as a reply to Sampson, Pullum and Scholz.

            I will illustrate precisely what Collins takes the real APS to be. I will then show that it is Collins who misconstrues the APS, not Pullum et al. I will therefore argue that, contrary to what some Chomskians claim, Pullum et al. raise serious difficulties with the research program of generative linguistics. Furthermore these difficulties with Chomsky’s programme show that despite the rhetoric employed by Chomsky, Quine’s alternative conception of language acquisition is still very much a live option.

                            SECTION 1: THE REAL APS?

 

John Collins’s paper ‘‘Cowie on the Poverty of Stimulus’’ is a review of Fiona Cowie’s book What’s Within. Collins focuses on the sections of her book which criticize Chomsky’s poverty of stimulus arguments.  He argues that Cowie has seriously misunderstood the aims and methods of Chomsky’s paradigm in linguistics.  Much of Cowie’s criticisms of Chomsky are derived from data from a 1996 paper which Pullum gave called ‘‘Learnability, Hyperlearnability and Poverty of Stimulus’’. In this paper, Pullum argued that the data which a child is exposed to when learning his first language is not as impoverished as Chomsky and his followers claimed.  While Cowie presents Pullum as claiming that his data refutes nativism, Pullum in a later paper (2002 co-authored with Scholz) chastised Cowie for this, and argued that their data only show that more research is needed into the child’s PLD, not that nativism has been refuted. I am including Collins’s criticism of Cowie in this section, because his reply to Cowie’s use of Pullum’s data indicates that he thinks that both Cowie and Pullum have misconstrued the nature of the APS. Other authors, for example Legate and Yang (2002), try to meet Pullum and Scholz’s challenge by showing that the data Pullum and Scholz discovered is, in fact, insufficient to learn the structure-dependence rule.  Collins, in contrast, argues that Pullum’s construal of the APS is incorrect and that because of this his data does not cast doubt on the real APS.

            Thus, for example, Collins writes:

Cowie, to be fair, does have Pullum’s reconstruction of the ‘Chomskian argument’ in mind. Pullum presents the argument so as to refute it, but Cowie finds it an ‘irresistible target’, for it is ‘so much more clearly and forcefully stated than the nativists own versions’. The nativists’ versions are not ‘clearly and forcefully stated’, I have suggested, because no-one serious is interested in knock-down arguments; there are certain empirical and theoretical constraints and a substantive proposal to satisfy them.  (2003., 21-22)

 

I turn next to Collins’s criticism of Pullum’s APS

            COLLINS ON PULLUM AND SCHOLZ’S VERSION OF THE APS

 

Pullum and Scholz’s version of the APS is admirably clear. It isolates the key empirical premise of the argument, and proceeds to analyse the empirical data to check whether the key premise is in fact correct. 

            Collins however does not view Pullum and Scholz’s APS as representing Chomsky’s real APS. One of his first criticisms of Pullum and Scholz’s construal is that it implies that Chomskians are searching for a knockdown argument which will prove that language is innate. So, on their conception, a key premise of the APS is that a child has knowledge which he could not have learned from his environment. According to them, if this key premise is proven correct, then Chomsky has a knockdown argument for innateness.  We have seen in Part 1 of this chapter that Chomsky does indeed use auxiliary inversion to support his claim that children know a rule of language that they could not have learned from their linguistic environment. Collins does not deny that Chomsky has sometimes argued in this manner. However, according to Collins, when Chomsky argues thusly, he is merely using the auxiliary inversion as an expository device to indicate the way linguists’ reason. Chomsky did not intend to use it as a knockdown argument against the empiricist.

            Critics of Chomskian nativism have expressed frustration with this mode of arguing. They argue that every time a candidate is presented as evidence for nativism, and is then shown to be inadequate, the nativist replies that this example was not the real argument for nativism. Thus Cowie vents her frustration at this perceived dishonest mode of nativist argumentation:

The nativist- say, Chomsky- articulates a version of the argument. The empiricist counters it by pointing to its evidential short-falls and/or its failure to do justice to empiricism’s explanatory potential. But no sooner is one rendition of the APS cut down than myriad other variations on the same argumentative theme spring up to take its place. For every non-obvious rule of grammar (and most of them are non-obvious), there is an argument from poverty of stimulus standing by to make a case for nativism. And for every such argument (or at least for the ones I have seen), there are empiricist counter examples of exactly the kinds we have reviewed in this chapter, waiting, swords at the ready, to take it on.  (1999, 203)

 

When Cowie speaks of a line of rules being put forth and refuted by the empiricist, she is clearly thinking of Pullum and Scholz’s version of the APS. Such postulated rules are the subject-auxiliary inversion, subject (verb-object) asymmetry, anaphoric one etc. The frustration which Cowie feels is that the subject-auxiliary rule was used by Chomsky on countless occasions, and by many others influenced by him. It is unquestionably used as the paradigm example of an innate rule,[12] so to be told that it is not the real argument is frustrating to say the least.  However, Cowie massively overstates the strength of her position here. The paradigm example of auxiliary inversion is taken to indicate that a child has an innate preference for structure-dependent rules. Pullum and Scholz have shown that Chomsky has overstated the strength of his position by claiming that occurrences which would indicate that the structure dependence rule is the correct one are vanishingly rare.  From the fact that Chomsky has overstated the strength of his position, it does not follow that the empiricist has proven that he is wrong.  Pullum and Scholz themselves correctly claim that their data is suggestive at best, and they call for further studies into the PLD. Their research certainly is not a knockdown argument against nativism; it is rather a timely reminder that, polemics aside, the case for nativism has not been proven. In this sense, Legate and Yang (2002) is an attempt to provide justification for the nativist research programme by situating it in a comparative setting. They consider a construction that both sides admit is learned (the use of null subjects) and analyse the PLD empirically to see how many examples of null subjects the child is presented with when learning the rule. They then compared the result of this with the amount of times the child is presented with the subject-auxiliary inversion. Based on their comparison, they claimed that the child was presented with less than half the amount of evidence in the case of auxiliary inversion. So they argued (incorrectly in my view) that the data the child was presented with was not enough for the child to learn the structure-dependent rule of question formation. I do not want to discuss Legate and Yang’s paper here. I merely want to show that Cowie’s claim that each innate rule postulated by Chomsky et al. has been refuted by attention to the PLD is false. On the contrary, all that has been shown is that more attention to the PLD needs to be given by both nativists and empiricists.  Legate and Yang’s paper can be seen as a nativist response to this request by empiricists such as Pullum.

            Collins obviously would not deny that more research needs to be done on the PLD. He would presumably welcome Yang and Legate’s attempt to answer the criticisms of Pullum and Scholz. However, he would also argue that Cowie and Pullum and Scholz have misunderstood the nature of Chomsky’s APS and, that replies such as Legate and Yang’s, while useful, concede too much to the empiricist by accepting their reconstrual of the APS.  Using Pullum and Scholz’s neat deductive reconstrual of the APS, Cowie seems to view nativists as merely putting forth an empirical premise into the argument, only to have it refuted, and to then put a different empirical premise in response to this and so on. Each time the empirical premise is shown to be false, another one is added as quickly. Collins argues that this way of viewing the matter badly misconstrues things; on his view of the matter, the APS is not the deductively neat argument that Pullum and Cowie et al. think that it is.

            Collins’s reconstrual of the APS is less aesthetically pleasing than Pullum’s deductive version. However, it does represent a type of APS which Chomsky has used from time to time. Nonetheless, I will argue that Chomsky primarily uses the APS which Pullum et al, critique.

            Collins begins by noting certain features of our linguistic competence. He notes the obvious fact that all humans (bar congenital defect) acquire a particular language, while no other animal does. Furthermore, if you move children from their birth place to another country, they will end up speaking a different language, while all other animals will speak no language no matter where they are brought up.  He acknowledges that these considerations do not militate towards nativism; on the contrary, they merely show that the language we do acquire must be acquired as a result of some innate species-specific machinery. This fact is merely a truism and is accepted by both sides of the debate. It in no way shows that innate domain specific knowledge is required for a child to learn his first language.

            However, the above facts do indicate a problem which any linguistic theory worth its salt must solve. The problem in a nut shell is this; how do we construct a theory of language acquisition which is both descriptively and explanatorily adequate. Collins puts it in the following way:

The descriptive adequacy, therefore, of a general theory of linguistic competence would appear to involve a delineation of the seemingly infinite variety of languages upon which a child may fixate. On the other hand, if our general theory is to be explanatorily adequate, then we need to explain how a child may fixate on any point in this infinity without any such point being favoured prior to the child’s exposure to language. (2003, 30)

 

Any theory of linguistic competence needs to deal with this criterion. Collins correctly notes that these constraints do not of themselves tell us what (1) the child’s initial state is, (2) what his final state is, nor (3) what data the typical child is exposed to which helps it move from the initial state to the final state. If there were only one language, for example English, we could answer questions (1) and (2) instantly. The child’s initial state would be English and his end state would also be English. According to Collins, question (3) would also be answered, because the child would need no data to decide amongst languages, as there would only be one language that the child could represent.

            Obviously there are more languages than English. In fact, if one considers English in different epochs, the English spoken by Chaucer, by Shakespeare, by Orwell then one must face the fact that English itself consists of more than one language. Rough estimates make it 7000 languages today[13]. If one wants to explain a child’s linguistic competence, then he needs to account for the fact that a child born in a different place or time could learn any of the 7000 languages spoken today. Furthermore, the child could learn any of the different languages spoken in other eras, or the various possible languages to be spoken in the future.  So a descriptively adequate theory will have to account for children’s linguistic competence which enables them to acquire the different types of possible or actual human languages. What is the initial state of the child that makes it possible for him to grasp any of these languages he is exposed to? If in order for the child to learn a language he needs the capacity to represent the rules of the language, then the more languages there are which the child can learn, the more inclusive must be the child’s initial ability to represent the grammar. The child will of course also need to have the capacity to represent the rules of possible human languages. So we will need finer and finer data in the particular child’s environment to help the child decide which language he is supposed to learn. This data will also need to be so detailed that it stops the child from keying into other languages that it is possible for him to learn.

            The difficulty with this approach is with the necessity of postulating richer and richer data to explain the child zeroing in on their grammar.  The reason that this is a problem becomes apparent when we consider the data which a linguist has at his disposal when he is trying to discover the nature of UG or of a particular I-language such as English.  The linguist has as much data on the grammar that he could wish for. He has the ability to reflect on it theoretically. He can compare the language with a variety of other languages. Yet, despite all of this data, the linguist still cannot discover the rules which govern the English language.  Collins claims that if we argue that the child learns his language through data-driven learning we will be claiming that the child who learns English has enough data to figure out what linguists have been unable to figure out over the last 2000 years:

But here’s the rub! The linguist has as much data on the grammar of English, say, as he could wish for, he also has the capacity to reflect on it, theoretically or otherwise, and the advantage of comparing it with data from other languages, but he still cannot figure out the grammar of English – that is inter alia, we have linguistics for! If, then, we content ourselves with the bland remark about nativism, we are led to think of the child who successfully acquires English as having enough data to figure out what self-reflective linguistic inquiry has been banging its head against for the last couple of millennia. Something is wrong. (ibid., 4)

 

 He argues that the only explanation for the child achieving what linguists cannot achieve through thousands of years of inquiry is the postulation of children being born with innate apparatus:

What the child’s innate equipment is required to do, it seems, is actively constrain its ‘choices’ as to what is part of the language to be attained. But no child is wired to target any particular language: the child can make the right ‘choices’ about any language with equal ease. This suggests that children must begin with ‘knowledge’ specific to language, i.e., the data to which the child is exposed is ‘understood’ in terms of prior linguistic concepts as opposed to general concepts of pattern frequency, say. If this is so, then we can see how a child may acquire a language even though the data itself is too poor to determine the language: the child needs no evidence for much of the knowledge it brings to the learning situation. In crude terms, children always make the right ‘hypotheses’ as a function of their genetic endowment. Thus, since the child can fixate on any language in the face of a poverty of stimulus about each language, and all languages are acquirable, children all begin with the same universal linguistic knowledge. This is the poverty of stimulus. (ibid., 5)

 

 

                 

                 SECTION 2: THE STRUCTURE OF COLLINS’ APS

So Collins’s reconstrual of the APS is as follows:

P1: Language is either acquired through data-driven learning or innately primed learning.

P2: All human children acquire language.

P3: No non-humans acquire language.

C1: Therefore language is acquired because of a unique property of human children not shared with non-humans.

P4: The range of languages it is possible for human children to acquire is infinite.

P5: All linguists using data-driven learning have not discovered a complete grammar of one language.

P6: All human children with less data available have acquired a particular language.

P7: Therefore either human children are smarter than linguists or human children do not acquire language through data-driven learning.

P8: Human children are not smarter than the linguists.

C3: Therefore human children do not acquire language through data-driven learning.

P9: If human children do not acquire language through data driven learning, then the fact that the child acquires a particular language as opposed to other possible languages cannot be explained through data-driven learning.

C4: Therefore human children acquire their particular language through innately primed learning.

 

The first three premises of Collins’s argument are correct and the conclusion is true as well. However, I have serious difficulties with Premise 4. Nothing in either Chomsky or Collins’s argument has proven that the number of languages it is possible for humans to acquire is infinite. A more sensible claim for premise 4 would be that there are an extremely large number of languages which it is possible for people to learn. Furthermore, it is difficult to see how this claim can be fitted into the overall structure of the argument. Or rather, it is obvious what role the claim is meant to play in the argument, but it is difficult to fit this role into our argument schema.  The role it plays is that once we have shown that  the child does not learn his language through data-driven learning, then it is difficult to see how the child arrives at the particular language he does, as opposed to the countless other possible languages he is capable of learning. The Collins/Chomsky solution is that the child is born with certain universal principles which are subject to parametric variations, and this explains the possible languages which humans can learn. So the child is born in the initial state UG and his experiences trigger various different parameters which results in the child arriving at his steady state, his I-language, i.e. English, French etc.  If one took Premise 4 and Premise 8 out of the argument, it would still go through as valid because of Premise 1 and C3. Given that the overall argument could go through without P4 or P8 one may want to ask why the premises are in the argument in the first place. The answer is, that without the premise, our theory will not be explanatorily adequate, i.e., it will not explain both the diversity of languages acquired and the mode of acquiring them.  So we will want our meta-argument to express that our object argument is designed to meet the criterion of descriptive and explanatory adequacy.

METACRITERION: An argument for an innate language faculty must match the criterion of descriptive and explanatory adequacy.

With our object argument and our meta-criterion in place, we have Collins’s APS ready to evaluate. The argument has at least seven premises, some of which are uncontroversial, like P2, and some more controversial like P4. Some of the premises are disjunctive, and may seem controversial because they leave out alternatives and assume that certain processes can only occur in one of two different ways.

            Premise 9 and C4 are key aspects of the argument. Premise 9  states that if the child does not acquire language through data-driven learning, then without innate domain-specific knowledge, we cannot explain how the child arrives at the particular language he does as opposed to the countless other languages it is possible for him to acquire. This is certainly true. If the child does not learn from the PLD then there is no reason bar innate constraints that he would target the correct language.  C4 states that the child does not learn his language from the PLD so must therefore key in on the correct language through innate domain specific knowledge. C4 is derived from P5-8 which states that using data-driven learning linguists working over 50 years have not converged on the correct grammar for English. Given that each child acquires English in a few years with much fewer data available to them, it follows that unless children are smarter than linguists they did not learn their language from the PLD.

            Overall, the argument as set out by Collins is not very convincing.  It does not amount to a deductive proof or a knockdown argument as Collins acknowledges. The argument aims to set out the facts of language acquisition which we need to build our theory around. For example, P5-8 does seem to indicate that unless children are smarter than the thousands of linguists working on generative grammar over the past fifty years, then they cannot be learning the language from the PLD. However, given that children born in different linguistic environments do arrive at different languages, the PLD is obviously a factor in how children learn their language. The tension between these facts of language acquisition is what a linguist needs to accommodate. However, to do this, one needs to set out what the structure of each I-language is, what they have in common, and where they differ. The theory of Principles and Parameters, which states that children are born with a UG that consists of fixed principles some of which are subject to parametric variation, aims to accommodate this fact. On this theory, the different parameters are set by experience, while the universal principles are innate.

            Hence, in Collins’ view, the APS does not depend on every child lacking this or that datum. As Collins construes the APS, it depends on the fact that linguists have access to a much greater PLD than children do. Yet children can quickly arrive at the grammar of their language while linguists over generations have failed to isolate the correct grammar of any language.  Nonetheless, Collins is not claiming that facts about the PLD are unimportant for the generative grammarian. Rather, he is claiming that we can only sensibly interpret the importance of each particular datum in the light of facts about UG and the particular I-language of a particular speaker. As set out by Collins, the APS is overcome by the postulation of a UG. This UG consists of invariant principles that a child is born with and that are subject to parametric variation, depending on the experiences of the child. Thus a principle of UG would be that all phrases have a head and a complement. The child is born knowing this. However, it is the child’s experiences with their PLD which determine whether the phrases are head first (English) or head last (Japanese).

            So the order of explanation would be the following. First, discover the structure of particular languages and try to ascertain what principles are shared by the different languages of the world.  Second, discover the ways these languages differ from one another, and construct a theory in terms of parametric variations that can explain these differences. When one has the bones of the principles and parameters theory set up, one is then in a position to explain how this or that datum results in a particular construction being acquired. As presented by Collins, the APS is a set of considerations which leads one to postulate a UG subject to parametric variation. The particular details are to be formulated within linguistic theory as the various different principles and parameters are discovered. Each discovery will either tell for or against the solution to the APS put forth by Chomsky et al.

            Collins’s version of the APS does not work. The disparity between the child’s ability to learn from their PLD and the linguist’s ability to construct an explicit grammar of this language need not be explained in terms of innate domain-specific knowledge. We do not need to claim that a two-year-old child is smarter than teams of linguists researching grammar over thousands of years. Nor do we have to claim that the child has more data available to him than the linguist. The difficulty with Collins’s argument is that it equates an organism’s ability to acquire a competence in x, with an organism’s ability to form an explicit theory of his competence in x. It does not follow that because an organism has difficulty in forming an explicit theory of a particular competence x by extensively studying datum Y that competence in x cannot be acquired from datum Y. Collins’s argument therefore fails because it unjustifiably equates an explicit theory of a competence with an implicit ability to acquire a competence. It is possible, for example, that children use unconscious statistical abilities which help them learn the rules of their language from our PLD. These statistical abilities may not be accessible to consciousness. Our ability to unconsciously detect patterns in our environment may outstrip our ability to construct explicit theories about these patterns. This bare possibility could turn out to be empirically false; however, whether it is or not is an empirical question. Collins’s argument, as he stated it, gives us no reason whatsoever to hold that innate domain specific knowledge must be wired into the child.

                   

                SECTION3: WHICH APS DOES CHOMSKY WORK WITH?

             Having argued that Collins’s version of the APS is not a particularly strong argument, I now want to consider whether Collins’ or Pullum et al.’s versions of the APS correctly characterize Chomsky’s APS. While, Chomsky does seem to argue from the same general considerations which Collins has outlined above, he also joins these arguments with APS’s of the kind that Pullum and Cowie consider. Any intellectually satisfying characterization of Chomsky’s APS must explain why he felt it was not only necessary to argue from the general considerations like the ones Collins points to but also uses APS’s like the ones Pullum et al. critiqued.

            In his paper ‘‘Linguistic Nativism’’, Collins uses the Principles and Parameters model of language acquisition which Chomsky developed in the 1980’s.  When discussing the APS in this era, Chomsky constantly made unsubstantiated claims about the PLD. These claims need to be noted and outlined if we are to really understand Chomsky’s APS. Pullum and Scholz (2002) refer to the auxiliary inversion as the experimental crux of the APS. They cite numerous different places that Chomsky and his followers have used auxiliary inversion as an example of the APS. Collins admits that Chomsky does indeed argue like this in various different places. However, he claimed that when Chomsky argues like this he is not making a claim about the PLD; but is rather setting up a challenge to the empiricist.  Collins claims that Chomsky wants to ask the empiricist what is it about the child’s PLD which helps him converge on the correct grammar, and why this data is insufficient for the linguist to learn the same grammar. In this section I will examine Chomsky’s actual writing to see if this interpretation of his APS is correct.

            Since the 1980’s, Chomsky has been labelling the problem of language acquisition as Plato’s Problem. He characterises this problem by quoting  Bertrand Russell’s question ‘How comes it that human beings, whose contacts with the world are so brief and personal and limited come to know as much as they do’ (1986a, xxv). Chomsky argues that these questions arise in the particular sphere of language acquisition in the same way that they do in general epistemology. He claims that when it comes to language acquisition, the solution to the problem is to postulate innate knowledge. He calls the APS Plato’s Problem because he feels that Plato’s discussion of the child displaying knowledge of geometry which he has not previously been taught is a good instance of an APS. It is interesting how he characterises the situation of the slave:

This experiment raises a problem that is still with us: How was the slave boy able to find truths of geometry without instruction or information? (1986b, 4)

 

The key words here are ‘without instruction or information’; Chomsky repeatedly claims that children know various principles and rules for which they received no instruction or information. He further claims that this is the APS and it is overcome by the postulation of innate knowledge. In this context, whether a particular construction is in the PLD or not is extremely important. Likewise, it is important to determine whether the child gains instruction explicitly or implicitly through positive and negative reinforcement. On this construal, the APS does not appear to be merely a challenge to the empiricist but rather to be an explicit claim about the PLD which is either true or false. I recognise that Chomsky’s vague sketch of what he thinks the APS is need not correlate with how he uses the APS in his linguistics. So in order to evaluate how Chomsky uses the APS as opposed to what he states the APS is, we will need to situate the APS within the context of him describing the rules of a language.

            In his Language and Problems of Knowledge (hence forth LPK), Chomsky considers both English and Spanish in detail. He tries to distinguish what rules they share from those that exist only in one language and hence are presumably learned from the PLD. Throughout his discussion, he makes claims about the nature of the PLD the child has available and the order that the child will learn the data in.  One of the first examples Chomsky offers of Plato’s Problem in LPK concerns a-phrases and reflexive pronouns in Spanish. He begins his discussion by illustrating a particular rule of natural language. He then considers how a child using analogical reasoning would apply this rule to other constructions. He claims that a child reasoning using analogy would create constructions which are incorrect by the lights of the native speakers of the language. So here Chomsky’s arguments bear directly on Quine’s model of language learning. Quine had claimed that analogy along with induction and reinforcement play a key role in language learning. However, Chomsky is here claiming that when the details of language are looked at closely, we see that a learning model based on analogy will make incorrect predictions about the type of sentences ordinary language speakers will find grammatical. He then claims that the child will not try out the false constructions which are derived by analogical reasoning only to receive negative reinforcement. He claims further that the child receives no data from his environment which helps him learn the correct rule. However, he concludes that the rule must be innate. This claim again runs contrary to Quine’s views on how a child learns his language.

            In LPK, pg 12-20 Chomsky illustrates what he believes to be a clear case of Plato’s Problem. He begins by discussing simple sentences of Spanish, giving their direct translation in English, and a paraphrase of the translation in ordinary English. The first sentences he discusses are:

(1) Juan arregla el carro.

‘Juan fixes the Car.’

(2) Juan afetia a Pedro.

     Juan shaves to Pedro.

     ‘Juan shaves to Pedro.’

 

Chomsky notes that sentences (1) and (2) illustrate an interesting fact about a language such as Spanish. He points out that in Spanish, while an animate object of a sentence is preceded by a preposition ‘a’ (to), an inanimate object such as ‘el carro’ does not need a preposition before it. He claims that this feature of language is not shared by similar Romance languages such as Italian. He then goes on to consider more complex sentences involving causative constructions, which also feature the verbs ‘afetia’ and ‘arregla’.

(3) Juan hizo (arreglar el carro).

Juan made (fix the car).

‘Juan had someone fix the car.’

(4) Juan hizo (afeitar a Pedro).

Juan made (shave to Pedro).

‘Juan had someone shave Pedro.’

 

It should be noted from above that the subject of the complement clause is unexpressed, and so is interpreted as someone unspecified. However, Chomsky notes the subject may be explicitly expressed as in (5)

(5)Juan hizo (arreglar el carro a Maria).

Juan made (fix the car to Maria).

‘Juan had Maria fix the car.’

We can see the difference between the English and the Spanish versions of the proposition in (5). In Spanish the subject of the embedded clause is an adjoined propositional phrase (a Maria), whereas in the English sentence Maria appears before the verb. Chomsky asks us to try and construct an analogue to (5) using the phrase afeitar a Pedro, instead of arreglar el carro. Doing this we get (6):

(6)Juan hizo (afeitar a Pedro a Maria).*

Juan made (shave to Pedro to Maria).

‘Juan had Maria shave Pedro.’

 

Here Chomsky notes that sentence (6), constructed on analogy with sentence (5), is an unacceptable sentence. So a child using the Quinean process of analogical synthesis would in this case construct a grammatically deviant sentence. However, this fact in isolation tells us nothing about how a child learns this fact of Spanish. An analysis of children’s PLD and a study of children’s linguistic performance would be needed before we rule out a Quinean conception of language learning. Whether Spanish children try out successive ‘a phrases’ in speech, only to receive negative reinforcement is a question which can only be answered by studying actual performance data, or through constructing experiments. Until then, the question of whether a child tries to construct a sentence like (6) based on analogy with (5) only to receive negative reinforcement remains an open question. Chomsky claims that the reason that sentence (6) is unacceptable is because in Spanish there is a rule which bars two a-phrases from appearing together. He then sums up what he thinks we have learned so far from this brief analysis:

Summarizing, we have general principles, such as the principle for forming causative and other embedded constructions and the principle of barring successive a-phrases; principles that admit some variation in interpretation, such as the embedded clause property; and low-level rules differentiating very similar languages, such as the rule that requires insertion of a in Spanish before an animate object. Of course, these levels are not exhaustive. The interaction of such rules and principles determines the form and interpretation of the expressions of language. (1988b, 15)

 

Having given this brief outline of some simple rules of Spanish, Chomsky discusses how the child acquires these rules. He claims generally that there are three factors to consider when trying to understand how a child acquires the rules of language: (1) the genetically determined principles of the language faculty; (2) the genetically determined general learning mechanisms; (3) the linguistic experience of the child growing up in a speech community. In relation to the rules he discussed above, Chomsky speculates that the rule of a-insertion before animate objects is an idiosyncratic rule of Spanish which is learned from experience.   Given that a- insertion before animate objects is not a feature of other closely related romance languages, he holds that it must be learned from the PLD through processes which we do not as of yet understand. He speculates further that the rule which makes (6) unacceptable, has its source either entirely in the language faculty, or in a combination of the language faculty and experience. He claims that the embedded clause property must be a parameter which needs some experience to be learned because it does not occur in all languages. He speculates further that such embedded clausal complements which occur are not learned but result from general principles of the language faculty. At no point in his analysis, does he offer evidence in favour of this interpretation. He merely asserts a series of propositions which are presumably meant to be taken on faith until he justifies them later in the text.

            He then goes on to consider further examples. He asks us to change (2) ‘Juan afeita a Pedro’, by replacing ‘Pedro’ with a reflexive element. There are he claims two choices for a reflexive:  se or si mismo. He asks us to consider here just the first of these, and to replace Pedro with se.

(7) Juan afeita a se.*

Juan shaves to himself.

However, (7) is not a proper sentence. Chomsky notes that the element se is what is technically called a clitic, a form that cannot stand alone but must attach to some verb.  According to Chomsky, there is a rule of Spanish that moves se from its ordinary position as direct object of afeitar, attaching it to the verb, yielding:

(8) Juan se afeita.

      Juan self-shaves.

     ‘Juan shaves himself.’

 

So the reflexive form corresponding to (2), would then be (8) rather than (7). Note here that on a Quinean account of language acquisition, the child would probably try (7) on analogy with (2) receive negative reinforcement, and somehow have to work out that (8) is the correct rule.  Chomsky then asks us to combine the causative and reflexive constructions, replacing Pedro in (4) by the clitic se,

Yielding:

(9) Juan hizo (afeitar a se).

Juan made (shave to self).

Chomsky notes that since se is a clitic it must attach to a verb, and that there are two different ways that this could be done: se could attach to ‘shave’ or to ‘made’. He notes that in all dialects of Spanish, it is normal to attach it to ‘made’, though only in some is it allowable to attach it to ‘shave’. He sticks to discussing the more common case, where se attaches to ‘made’; this is obviously a simplifying assumption though nothing of much importance attaches to the assumption for our present purposes. He goes on to note that (10) will be the correct transformation of (9):

(10) Juan se hizo (afeitar).

       Juan self-made (shave).

      ‘Juan had someone shave him (Juan).’

He notes that in (10) the embedded complement of the causative verb is subjectless, as in (3) and (4). But of course the subject of the complement can be explicit, appearing as an a-phrase. However, he argues that if the subject of the complement is say, los muchachos (the boys), we would expect to derive:

(11) Juan se hizo (afeitar a los muchachos).*

       Juan self-made (shave to the boys).

      ‘Juan had the boys shave him (Juan).’

Unfortunately, while (10) is an acceptable sentence, sentence (11) is not . So a child trying to derive (11) based on the analogy with (10) will construct an unacceptable sentence. Again, a child trying to learn the rules of language using analogy will end up using a deviant sentence such as (11).

            So what conclusion does Chomsky draw from these facts about Spanish?  He notes first that the examples give rise to Plato’s Problem. He also claims that such facts show the hopelessness of claiming that language is acquired using analogy. He goes on to make the following empirical claim about the acquisition of these facts for a Spanish child:

The question then is how speakers come to know these facts. Surely it is not the result of some specific course of training or instruction; nothing of the sort occurs in the course of normal language acquisition. Nor does the child erroneously produce or interpret the sentences (11) or (12) ‘by analogy’ to (10) and (5), leading to correction of this error by the parent or other teacher; it is doubtful that anyone has undergone this experience and it is certain that not everyone that knows the facts has done so.  (ibid., 21)

 

Here we have in Chomsky’s own words an example of Plato’s Problem.  The question we need to ask ourselves is whether this version of Plato’s Problem should be understood the way Pullum and Scholz claim or whether it is merely, as Collins argues, a challenge to the empiricist. It should be noted that when discussing Plato’s Problem in this context, Chomsky makes three unsupported empirical claims: (1) children do not construct new sentences like (11) using analogy and induction; (2) children do not incorrectly utter sentences with the structure of (11); (3) children are not corrected by their peers for constructing such utterances. If there is evidence to support these claims, Chomsky does not produce any. He merely argues that such things never happen, and we are presumably meant to take him at his word. If he wanted to establish that sentences like (11) are never produced, he would need an extensive corpus analysis to justify such a claim.  He has never provided such an analysis. Short of corpus analysis we have no justification for the claim that children never utter such constructions. Chomsky would probably argue that we know that children do not erroneously produce examples like (11) and receive negative reinforcement. To justify this claim, he could cite Brown and Hanlon (1970) who have shown that correction for bad grammar is rarely provided, and when it is provided it rarely has any effect.  However, recent studies on implicit instruction undermine this claim. So Chomsky has given us no reason to assume that this particular APS works.

            It is impossible to read this particular APS as a challenge to the empiricist. On the contrary, it is more accurately read as a supposed refutation of empiricism. If one did want to view it as a challenge, it seems to be a very strange challenge. The challenge could be construed as follows: Chomsky makes arbitrary unsupported claims about the PLD. The challenge is that the empiricist has to find evidence to refute Chomsky’s arbitrary unsupported claims, and until such evidence is provided, we are to assume that Chomsky is correct. Such a challenge is clearly absurd. The burden of proof is obviously on Chomsky to provide evidence to support his claims, not to merely point out that empiricists haven’t yet refuted his unsupported claims.

            Here it could be argued that I am being a bit unfair on Chomsky.  He does after all offer some evidence to support his claim: for example, the evidence from Crain and Nakayama (1987) and Brown and Hanlon (1970). However, such a defence of Chomsky is historically inaccurate. In his 1968 paper ‘‘Linguistic Contributions: Present’’[14] Chomsky discusses how auxiliary inversion illustrated a particular instance of the Universal Rule that all sentences are structure dependent. Here Chomsky makes the following claims:

 There is no a priori reason why human language should make use exclusively of structure-dependent operations, such as English interrogation, instead of structure-independent operations, such as O1, O2, and O3. One can hardly argue that the latter are more ‘complex’ in some absolute sense; nor can they be shown to be more productive of ambiguity or more harmful to communicative efficiency. Yet no human language contains structure-independent operations among (or replacing) the structure-dependent grammatical transformations. The language-learner knows that the operation that gives 71 is a possible candidate for a grammar, whereas, O1, O2, and O3, and any operations like them, need not be considered as tentative hypotheses…Careful consideration of such problems as those sketched here indicates that to account for the normal use of language we must attribute to the speaker-hearer an intricate system of rules that involve mental operations of a very abstract nature, applying to representations that are quite remote from the physical signal. We observe, furthermore, that knowledge of language is acquired on the basis of degenerate and restricted data and that it is to a large extent independent of intelligence and of wide variations in individual experience. (1972a, 54-56)

 

Here Chomsky is making untested claims about the child’s PLD; he is also making unsupported assertions about the structure of all human languages. Chomsky claims the child knows that only 71 is a possible grammar whereas O1, O2, and O3 are not. Here he is implicitly making a claim about the child’s linguistic performance. The only possible evidence that a child does not consider O1, O2, and O3 is that a child never mouths sentences structured according to the rules of O1 etc. Characteristically Chomsky has not offered any evidence. It is important to note that he is making these claims two years before Brown and Hanlon published their paper, and nineteen years before Crain and Nakayama’s paper was published. So here Chomsky is making claims for which he has provided absolutely no evidence.  If such claims are interpreted as a challenge to the empiricist, they are a poor challenge indeed.

            Chomsky’s APS arguments typically rely on claims that the child does not have access to this or that datum. It is claimed that if the child were learning a particular construction by analogy with previously heard constructions they would produce barred sentences such as x or y. He then argues that children never produce sentences such as x or y, and that therefore negative or positive reinforcement cannot play any role in learning a particular construction.[15] However he does not offer any performance data to indicate how children actually speak in particular circumstances. So his claim that children do not offer certain deviant sentences cannot be substantiated until the relevant research is done.

 Chomsky does sometimes argue from general considerations in the way Collins does. However, when doing his linguistics, Chomsky typically makes claims about lack of reinforcement, limited fragmentary data, and how analogy and induction are insufficient to learn certain constructions. Over-all, neither version of the APS casts much doubt on empiricist models of language learning. The APS which Pullum et al consider do not tell us either way whether nativism is true or not. What the research done by Pullum et al. shows is that much more empirical data is needed if we are to discover how children learn their first language. Furthermore, it shows that Chomsky’s lack of interest in performance data cannot be justified. If we are to construct an accurate theory of language acquisition, we will need to consider actual linguistic behaviour, and the circumstances of such behaviour occurring. Collins’s version of the APS really offers no compelling reason to accept nativism. So linguistic nativism has not been justified through any of the poverty of stimulus arguments I have seen so far.

Before leaving this topic I discuss a recent defence of the poverty of stimulus argument which Chomsky has mounted. Chomsky has co-authored a paper with Berwick, Pietroski, and Yankama called ‘‘Poverty of Stimulus Revisited’’ (2011). This paper does not address the primary criticisms which are raised against the APS in this thesis. In fact, the content of their paper would lead one to believe that Chouinard and Clark, Pullum and Scholz, and Geoffrey Sampson do not exist.[16] However, since this is Chomsky’s most up-to-date defence of the APS I will consider it in detail. I use it to demonstrate that Chomsky’s most up-to-date defence does not in any way meet the concerns which I have raised in this chapter.

                 

                PART 3: CHOMSKY’S LATEST DEFENCE OF THE APS

In his (2011) paper ‘‘Poverty of Stimulus Revisited’’ Chomsky, Berwick, Pietroski, and Yankama offer a defence of the APS against recent criticisms. The paper is divided into five sections:

(1) An introduction

(2) A discussion of empirical foundations

(3) A discussion of their minimalist solution to the empirical issues

(4) A consideration of three alternatives to their approach

(5) A conclusion

 

In Section 2 of their paper, they label what they take to be the central theory neutral facts which need to be explained. They claim that facts about auxiliary inversion in polar interrogatives which reveal the structure dependence of linguistic rules generalises to other rules of natural language. They discuss facts such as constrained ambiguity: consider the following four sentences:

(6) Darcy is easy to please.

(7) Darcy is eager to please.

(8) Bingley is ready to please.

(9) The goose is ready to eat.

 

They claim that children intuitively know that (6) and (7) are unambiguous. In (6), ‘Darcy’ is the object of the sentence and the sentence means that it is easy for others to please Darcy. In contrast, in (7) ‘Darcy’ is the subject of the sentence, and the sentence means that Darcy is eager to please others. Sentences (8) and (9) are ambiguous. ‘Bingley’ can be taken as the subject of the sentence or the object; it can mean Bingley is ready to please someone else, or that Bingley is ready to be pleased. Likewise ‘the goose’ can be taken as the subject or the object of the sentence; thus (9)  can mean that the goose is ready to eat something else, or that, the goose is ready to be eaten. Further examples of constrained ambiguity are sentences such as:

(10) The boy saw the man with binoculars.

(11) The senator from Texas called the donor.

 

These sentences are two-ways ambiguous instead of three-ways ambiguous. They argue that such examples reveal the structure dependence of language, in the same way as polar interrogatives do.  Chomsky et al. also point to the fact that some sentences have zero readings but are not mere word-salad, while other sentences which are word-salad declaratives can be turned into word salad interrogatives using auxiliary inversion. Having discussed the various different cases of constrained ambiguity, they note that they are concerned (at this point) with the knowledge that people have acquired, not with how such knowledge is acquired.

            It should be noted that there are problems with their claims that they are only concerned with the knowledge acquired. First, their belief in the fact that certain sentences have zero interpretation, one interpretation and or two interpretations is obviously derived from tests which are done on people’s intuitions of grammaticality.  However, in order for such tests to be considered an accurate sample of people’s competence, we need statistics to support them. I agree with their interpretations of the facts; however, neither my intuitions nor the intuitions of a few linguists can be used on their own to form the foundation of a linguistic theory. Such intuitions need to be justified statistically. We need statistics which support the claim that people of different ages, and different socio-economic backgrounds have intuitions of the acceptability and unacceptability of the sentences which are used to support the belief that constrained ambiguity is a fact of natural language. Such statistics need to make explicit any gradience of acceptability/unacceptability which occurs in different socio-economic environments, and different age groups. With this statistical background in place, they are then in a position to say whether constrained ambiguity is something that all speakers accept. Until such time as this is done, their supposed theory neutral facts which they claim must be explained by any linguistic theory, have not been shown to be an actual fact of natural language. Furthermore, appeal to people’s intuitions of grammaticality needs to be tested against performance data. Corpus analysis of child-adult interaction, adult-adult interaction, and the linguistic interaction of people from different socio-economic backgrounds needs to be tested. If we are to say that people have intuitions that x is the case, we need to demonstrate that they perform as though they have such intuitions. And if performance data and competence data are at odds, then performance data clearly trumps people’s intuitions of how they believe they perform.

            This discussion demonstrates that Chomsky et al. are not merely pointing out facts about language that any theory must explain, rather, they are in fact making claims about what people know which they have not justified by appeal to empirical evidence.  The fact that they do not provide statistical tests to determine whether people of different ages and socio-economic backgrounds have the same linguistic competence demonstrates that from the outset they are presuming that the intuitions of a few linguists are shared across the board. So far from giving a theory-neutral description of the facts of natural language, they are, in fact, from the outset presupposing a particular model of the nature of language.

            They go on to discuss the following sentences:

(21) hiker, lost, kept, walking, circles.

(22) The hiker who was lost kept walking in circles.

(23) The hiker who lost was kept walking in circles.

 

They note that given (21) we would expect (22) to be the declarative as opposed to (23). However they also ask us to consider the following case:

(24) Was the hiker who lost kept walking in circles?

They note that even if we focus on (21) we still read (24) as the interrogative version of (23). From this fact they conclude that one way or the other, the auxiliary verb was is associated with the matrix verb kept- and not lost. They claim that considerations of coherence alone should lead one to construct a sentence like:

(25) Was ((the hiker who- lost) (kept walking in circles))

as opposed to

(24) Was ((the hiker who lost) (-kept walking in circles))

They note that this shows that the relevant constraint trumps considerations of coherence. However, here again they are not merely stating theory-neutral facts. It is true that I share their intuition that (24) is the interrogative form of (23); however, my intuitions are obviously going to be contaminated by my research into various different APS’s. For Chomsky et al. to draw the conclusion they want, they need statistics to back up their claim that people of all ages and all socio-economic backgrounds share the intuition, however they provide no such statistical evidence.

The next phenomenon which they consider is constrained homophony. They discuss the following sentence.

(25) Can eagles that fly eat?

(26) (Can (eagles that – fly) eat))

(27) (Can ((eagles that fly) (-eat)))

 

 They hold that (27) reveals the correct structure of (25), not (26). Since children cannot see the bracketing of (25) Chomsky et al. ask how children can know that (27) reveals the correct structure and not (26)? To further elaborate this point, they go on to consider do replacing the auxiliary verb can, since do bears morphological tense (did) but is otherwise semantically null. So they indicate the actual position of interpretation with dv, and the logically coherent but incorrect position by dv*, now using this notion freely to indicate constraints on ambiguity/homophony.

(28) (do (eagles that dv* fly) dv eat)

They claim that this notation is entirely descriptive and reveals that (28) is unambiguous. This, then, raises a poverty of stimulus consideration: how do children know that dv* is a barred interpretation but that dv is not a barred interpretation? However, yet again Chomsky presents a certain supposed fact as a theory-neutral description without presenting any evidence to support this claim. They have not provided any statistical evidence of people’s acceptability judgements across ages and socio-economic backgrounds; nor do they present any performance data. So the whole of Chomsky et al.’s claims about the theory neutral empirical facts which any theory must deal with stands on an extremely weak foundation.

            They go on to claim that other languages such as German and Irish respect the same constraints. Yet again they provide no empirical evidence to support this claim. They then consider further examples of these constraints and claim again (unconvincingly) that they are merely producing theory neutral facts which any theory must explain. They cite four constraints which must be met by any theorist wanting to explain these supposedly theory-neutral facts:

(1) Yield the correct pairings, for unboundedly many examples of the sort described.

(2 Yield the correct structures, for the purposes of interpretation, for those examples

(3) Yield the correct language-universal patterning of possible/impossible pairings

(4) Distinguish v-pairings from w-pairings, in part, while also accounting for their shared constraints.

They argue that if one cannot meet the constraints of 1-4 then one has not got an accurate linguistic theory which can explain the relevant linguistic data.

They then proceed to outline their own account of how these linguistic facts are best explained. They explain these facts in terms of their minimalist programme. Once they outline their minimalist alternative they then discuss three contemporary attempts to explain the above facts using domain-general knowledge. They outline the three rival theories and demonstrate weakness in all three theories. Having satisfied themselves that they have refuted their rivals they proclaim that their minimalist explanation is the best explanation of the above-mentioned facts. They conclude by arguing that after fifty years their poverty of stimulus argument still stands.

            The three empiricist alternative theories which they evaluate are:

 

(1) STRING- SUBSTITUTION FOR ACQUISITION (CLARK AND EYRAUD)

In brief, Clark and Eyraud following Zellig Harris postulate ‘discovery procedures’ for grammars. Their inference algorithm when given examples like (37a) and (37b) will correctly generate examples like (37c) and exclude examples like (37d).

(37a) Men are happy.

(37b) Are men happy?

(37c) Are men who are tall happy?

(37d) *Are men who tall are happy?

 

The method works by weakening the standard definition of syntactic congruence, positing that if two items u and v can be substituted for each other in a single sentence context, then they can be substituted for each other in all sentence contexts.  C and E call this notion weak substitutability.

Chomsky et al. claim that this method fails for two different reasons:

(A) It fails for English even when restricted to only strings that a language generates.

(B) It does not address the original APS which depends on which structures a language generates.

(2)BAYESIAN MODEL SELECTION OF GRAMMARS (PERFORS ET AL)

In a 2011 paper Perfors et al. (henceforth PTR) wrote a paper which aimed to address the question of domain-general versus domain-specific theories of how natural language grammar is acquired. In their paper they considered Chomsky’s famous example of auxiliary inversion which had been used as a paradigm example of an APS since the sixties. PTR argued that using a Bayesian model selection of grammars they could demonstrate that the structure dependence of language which was revealed by auxiliary inversion could be learned through a Bayesian model. The Bayesian model is a domain-general theory of acquisition, so the fact that it can learn the structure dependence of grammar purports to show that Chomsky’s APS does not work. It supposedly reveals that language acquisition does not require domain specific-knowledge.

            PTR’s model argues for a notion of a ‘‘Bayes learnable’’ grammar. Their model specifies a hypothesis space consisting of three different grammar types and uses a notion of Bayesian probability to decide amongst them on the basis of a sample from the corpus[17]. The three different grammar types they propose are: (1) Flat grammars which generate strings of a corpus directly from a single non-terminal symbol S, (2) Probabilistic (right) regular grammars (PRGs), (3) Probabilistic context free grammars ((1) and (2) are structure-dependent grammars while (3) is a structure- independent grammar)(ibid., 43). They construct a grammar of each type to generate the sentences of the corpus, and score each grammar with a Bayesian probabilistic matrix. They use the CHILDES corpus as data for training and evaluating the grammars of their respective types (ibid., 43). From their tests, they discovered that Probabilistic Context Free grammars are better able to predict the corpus with a smaller grammar than their rivals, and they are better at handling new constructions not contained in the corpus (ibid., 50).  The only learning prior which PTR use is a preference for a shorter, more compressed hypothesis. And Clark and Lappin correctly note that this learning prior is clearly domain-general. So given, that PTR’s model prefers the structure-dependent hypothesis over the structure-independent one, we have evidence against Chomsky’s original APS. Contrary to what Chomsky claimed, a learner using a domain-general procedure, can indeed, learn the structure- dependent rule for natural language.

            Chomsky et al. reply to this argument as follows:

But even if a Bayesian learner can acquire grammars that generate structured expressions, given child-directed speech but no additional language-specific knowledge, this does not yet make it plausible that such a learner can acquire grammars that exhibit constrained ambiguity of the sort illustrated in Section 2.  In particular children acquire grammars that generate expressions in accord with specific structure-dependent rules that govern interpretations…The question is whether learners can induce that expressions are generated in these human ways.(2011, 19)

 

Here Chomsky et al. are claiming the issue is not whether domain-general procedures can key in on structure-dependent rules, but rather whether the domain-general procedures used by PTR can capture more complicated phenomena such as constrained ambiguity.  Chomsky et al. claim that PTR’s model does not include or suggest any hypothesis about how expressions are generated according to the language-specific constraints which they discussed above. They argue that if one wants to address the real poverty of stimulus argument, then one needs to address the full range of examples which they discussed in Section 2 of their paper, not merely the simple examples of polar interrogatives.

It could be argued that Chomsky et al. are being unfair to PTR here. Throughout his career, Chomsky has claimed that a particular datum cannot be learned from experience, so must therefore be explained in terms of innate domain- specific knowledge. Then when PTR construct a model which can generate the expressions without domain-specific knowledge, Chomsky et al. argue that this fact is irrelevant because there are some further facts about language which the model cannot capture. Here, again, we are back to Cowie’s criticism, that every time a nativist has an APS refuted by an empiricist; the nativist can simply point out some other non-obvious fact which he claims cannot be explained in terms of domain-general learning, and when this claim is refuted another example is manufactured on the spot. The real difficulty with this approach is that it shifts the burden of proof onto the empiricist, any non-obvious fact of language is automatically assumed to illustrate an APS, and the empiricist must refute this claim. However, the burden of proof should not be shifted this way. The burden of proof is on both the nativist and the empiricist. It should not be assumed that some complicated fact of language can automatically be explained in either a nativist or an empiricist manner. Such issues are entirely empirical and should be judged based on the ability of either side to construct accurate models to explain the relevant data.

(3) Learning from bigrams, trigrams, and neural networks.  (Reali and Christiansen)

Realli and Christiansen in their 2005 paper ‘‘Uncovering the Richness of the Stimulus’’ constructed models which aimed to test whether yes/no questions could be learned by domain-general procedures. They used three models: (1) A bigram statistical model, (2) A trigram statistical model, and (3) a simple recurrent network model.  They used a child-directed speech as training data for the three models.

                         BIGRAM STATISTICAL MODEL

Realli and Christiansen computed the frequency word bigrams and then the overall sentence likelihood for any word sequence, even for previously unseen word sequences (ibid, 25). This sentence likelihood was then used to select between opposing test sentence pairs similar to Are men who are tall happy- Are men who tall are happy, the idea being that sentences with the correct auxiliary fronting would have greater likelihood than those with incorrect auxiliary fronting.  Chomsky et al. note that in RC’s experiment, which was done on 100 test pairs, the bigram likelihood calculation successfully chose the correct grammatical form 96% of the time. However Chomsky et al. cite the work of Kam, Stoyneshka, Tornyova, Fodor and Sakas (2008), which shows that the strong success is a result of contingent facts of English and not with the bigram model itself. They note that the model exploits the fact that who and that are homographs, which are unclear as to whether they are pronouns/relativisers (ibid., 26) When Kam et. al correct this bias, the performance of the models decreased significantly.

                              RC’S TRIGRAM MODEL

The trigram model uses a test similar to the one used by the bigram model. It furthermore exhibits a similar level of success. However Chomsky et al. argue that, like the bigram model, the trigram model achieves its success because of contingent facts of English and not because of the model itself. They construct experiments themselves to test this claim. These experiments show that the performance of the model drops significantly once the English-specific bias is accounted for.

              LEARNING FROM SIMPLE RECURRENT NETWORKS

Reali and Christiansen constructed a further experiment using a Serial Recurrent Network to try and learn a particular construction. These Serial Recurrent Networks contained a hidden context layer. Reali and Christiansen trained 10 different networks on the Bernstein corpus, and then tested whether they could discriminate between grammatical versus ungrammatical minimal pairs (2011, 28). So, for example, they tested whether the networks could correctly discriminate between:

(1) Is the boy who is hungry nearby?

(2) Is the boy who hungry is nearby?

Reali and Christiansen recoded the actual words into 1 of 14 possible parts of speech categories, e.g. Det (the), N (boy), PRON (who), V (is), ADJ (hungry), PREP (nearby) etc. (ibid, 28). Chomsky et al. note that Serial Recurrent Networks output a distribution over possible predicted outputs after processing each word. Reali and Christiansen tested their networks by providing the part of speech prefix of test sentences up to the point at which grammatical versus ungrammatical divergence would occur. They then checked the predicted output of trained networks over all word categories to see whether the network activation weight was assigned the grammatical continuation as opposed to the ungrammatical one. Reali and Christiansen confirmed that the grammatical continuation was an order of magnitude higher than the ungrammatical one. In other words, the Serial Recurrent Network was able to predict the correct (1) as opposed to the incorrect (2). Reali and Christiansen take this as evidence that their Serial Recurrent Network can learn the rule for structure dependence using this connectionist model.

            However Chomsky et al. (following on from the work of Kam et al.) suggest that Reali and Christiansen’s results may result from simple brute statistical facts:

In other words, one might wonder whether the success of the SRNs in selecting, for example, V as opposed to ADJ as above might also be attributable simply to ‘‘brute statistical facts’’. Kam et al’s and our findings above suggest that bigram information alone could account for most of the statistical regularity that the networks extract. (ibid, 29)

 

Chomsky et al. tested this claim by analysing the Bernstein corpus to test whether there was a difference between the number of times a PRON- V occurs and to the number of times that a PRON-ADJ occurs. They found that the PRON-V occurs 2,504 times in the corpus, while the PRON- ADJ occurs 250 times in the corpus.  So they claim that using a bigram statistical model they can easily predict the next occurrence of the grammatical sentence. They make the following point:

Since SRNs are explicitly designed to extract sequentially organised statistical patterns, and given that the is-is question types can be so easily modelled by sequential two-word patterns, this is not at all surprising. Indeed, it is difficult to see how SRNs could possibly fail in such a statistically circumscribed domain. (Ibid., 32)

 

They then go on to note that it remains to be seen how Serial Recurrent Networks will deal with a more complex interrogative such as:

(1)  Is the boy who was holding his plate crying?

This example is of the sort explained by Crain and Nakayama. Here the matrix verb is differs from the relative clause auxiliary was. Chomsky et al. note that until such time as Reali and Christiansen can construct a model which can handle cases of this kind, we can conclude that the Serial Recurrent Network results are far from compelling. However this reply of Chomsky et al. is dubious, because, as I discuss now, Reali and Christiansen solve the APS which was raised by Chomsky in his 1975 book Reflections on Language.

            In that book Chomsky writes:

We gain insight into UG, hence LT (H,L), whenever we find properties of language that can reasonably be supposed not to have been learned. (175, 30).

 

The example which Chomsky uses to illustrate this point is

(1) The man is tall – is the man tall?

(2) The book is on the table – is the book on the table?

Chomsky notes that a scientist will observe that children form questions in the ways indicated by (1) and (2). The scientist may form the following hypothesis to explain the above fact:

Hypothesis 1: The child processes the declarative sentence from its first word (i.e., from ‘’left to right’’), continuing until he reaches the first occurrence of the word ‘’ is’’ (or others like it: ‘’may’’, ‘’will’’, etc.); he then proposes the occurrence of ‘’is’, producing the corresponding question (with some concomitant modifications of the form which need not concern us) (ibid, 31).

 

He then argues that this procedure will lead the scientist into making incorrect predictions when it comes to more complicated sentences. He asks us to consider the following sentences:

(3) The man who is tall is in the room- is the man who is tall in the room?

(4) The man who is tall is in the room- is the man who tall is in the room?*

Obviously a scientist who was using hypothesis (1) would generate the incorrect sentence (4). Chomsky claims that a scientist will note that children never make mistakes like (4) and will therefore conclude that hypothesis (1) is incorrect. A reasonable scientist, he notes, will therefore try out a different hypothesis. Hypothesis (2), according to Chomsky, will be a structure-dependent hypothesis which analyses words into phrases. This hypothesis will differ from the structure-independent hypothesis which merely involves analysing words into the property of earliest defined word sequence.

            He further argues that, by any reasonable standards, hypothesis (1) is simpler than hypothesis (2). Yet children supposedly unerringly use the structure-dependent hypothesis as opposed to the structure-independent hypothesis.  Chomsky makes four points: (1) the child will never experience constructions which are relevant to helping him learn the correct rule; (2) the child never makes mistakes like sentence (4); (3) the child is not trained to learn the correct rule; (4) the correct rule is more complex than the incorrect one. Based on these considerations, Chomsky argues that the structure-dependence rule must be innate. Now I do not want to repeat the various arguments against this view that I have already voiced earlier in the thesis. However, in light of Chomsky’ et al’s new paper, there are some new points which need to be clarified. Reali and Christiansen managed to construct a Serial Recurrent Network which could learn the is-is construction through training. This demonstrates that Reali and Christiansen’s model has solved the original poverty of stimulus model which was raised by Chomsky in 1975. Chomsky et al. ask whether the model can handle the types of constructions discussed by Crain and Nakayama’s 1987 paper. This is an interesting question. However, the fact that Reali and Christiansen have not answered it does not take away from their achievement. They have demonstrated it is possible for a domain-general learner to grasp a linguistic rule that Chomsky in 1975 claimed could only be acquired through domain specific learning priors. This fact is important. It illustrates Cowie’s point of how easy it is for a nativist to manufacture a new APS in the face of a refutation of their original claim.

            In this paper, I have evaluated whether Chomsky’s APS has refuted Quine’s conception of language acquisition. By reviewing the best arguments which have been put forth by Chomsky in defence of the APS, I concluded that the APS has not refuted Quine’s conception of language acquisition. However, the evidence which I have reviewed does not indicate which conception of language is the correct one. There is much research which needs to be done before we can decide whether the nativist or the empiricist conception of language acquisition is the correct one.


[1] Independent of his APS argument, Chomsky cites various different strands to support his belief in a language faculty. He speaks of the supposed universals which exist in all the languages of the world. He also points out that  the language faculty grows at a fixed rate and that general intelligence is not affected by the loss of the ability to speak and understand language. These points have all been contested in the literature. However, I will not discuss them here as my main concern is with the evidence which Chomsky uses to support his APS claims.

[2] I will deal with the frequency of the constructions later in the paper and whether there are enough constructions for a child to learn the rules. For now I want to focus in a schematic way on what these findings mean to a Quinean picture of language learning

[3] See Hart and Risley’s book Meaningful Differences in the Everyday Experiences of Young Children 1995 for a discussion of the linguistic input an average child is exposed to.

[4] Hart and Risley’s data refers to children hearing spoken language around them. An educated adult who is an avid reader would have access to much more data per year than the data spoken to him by his peers. So a twenty seven year old would have at least nine times the data a three year old has learning his language in terms of heard linguistic utterances. However if you count the data the adult receives from reading, the data the adult is exposed to would be much higher than nine times that of a  three-year old child.

[5] Like Pullum and Scholz, Sampson  interprets his corpus data using Hart and Riselys figures. One difficulty with this is that Hart and Risely estimate that the sentences a child encounters are 4 words long. The few examples Sampson  published from his corpus research contain sentences which are 10 words long. So doubt could be cast on whether his corpus is representative of the childs PLD. Obviously much further research is needed to clarify this matter; however, like Pullum and Scholz,  Sampson is to be applauded for begining this research into the PLD instead of ignoring it like Chomsky.

[6] In this section I am following Sampson’s  (2002, 20) reconstruction of Crain and Nakayama.

[7] Quine does not anywhere discuss auxillary inversion; however, his constant emphasis on the fact that language is a social art in which people’s utterances are beaten into shape through reinforcement from their peers shows that he thinks that all our linguistic rules are structured through positive and negative reinforcement from our peers.

[8] Later in the paper, I will discuss Reali and Christiansons mathematical models which demonstrate that it is possible for a child to learn structure-dependent rules with even less data than Pullum and Scholz discovered.

[9] The study used only adults who were the children’s parents.

[10] See Pinker 1994 on the Negro families who do not speak to their children.

[11] Whether the older children of such a community do, in fact, engage with the younger children in a manner in which they can use to learn the rules of their language is an empirical question. The point is that Ochs and Schieffelin’s paper does not rule out this possibility.

[12] The innate rule is obviously structure dependence not the auxiliary inversion rule, which is of course not a rule of all languages.

[13] When I say that there are an estimated 7000 languages known today, I am speaking of E-languages. Given the difficulties of individuating E-languages which Chomsky has repeatedly discussed, it is unclear how to calculate how many different languages there are known at a given time or how many have been known or are possible to know. Internal to Chomsky’s theories the question should be rephrased as how many I-languages can be derived from UG based on permitted parametric variation. Answering such a question is obviously impossible until we have a definitive worked out conception of UG.

[14] This paper is in Chomsky’s book Language and Mind pg 21-56.

[15] In Chomsky’s Knowledge of Language he repeatedly makes APS claims whose structures are similar to those outlined by Pullum et al. In this book as well he offers no empirical evidence to support his claims. See Knowledge of  Language pgs:55, 62, 78, 90, 145-149

[16] While they do mention Pullum and Scholz’s paper, they do not consider its impact for  Chomsky’s particular APS.

[17] Here I am following Clark and Lappin’s description of PTR on page 43-45 of their Linguistic Nativism and the Poverty of Stimulus. 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s