When Central Limit Theorem breaks downCentral Limit Theorem TailsQuestion about standard deviation and central limit theoremDoes the central limit theorem apply to these probability density functions?Central limit theorem: applicability for assumptions of different testsCentral Limit Theorem for Normal Distribution of Negative BinomialCentral limit theorem heuristicsCentral Limit Theorem for square roots of sums of i.i.d. random variablesUnderstanding the Central Limit Theorem (CLT)Central Limit Theorem and Moment Generating FunctionsHow can the central limit theorem hold for distributions which have limits on the random variable?

Why does a car's steering wheel get lighter with increasing speed

What does "rhumatis" mean?

Issue with units for a rocket nozzle throat area problem

PTIJ: Sport in the Torah

What can I do if someone tampers with my SSH public key?

What is the oldest European royal house?

Who has more? Ireland or Iceland?

Is "cogitate" used appropriately in "I cogitate that success relies on hard work"?

How to make sure I'm assertive enough in contact with subordinates?

An Undercover Army

Why isn't P and P/poly trivially the same?

How do you make a gun that shoots melee weapons and/or swords?

How spaceships determine each other's mass in space?

School performs periodic password audits. Is my password compromised?

When Central Limit Theorem breaks down

Ultrafilters as a double dual

What is Tony Stark injecting into himself in Iron Man 3?

Should I apply for my boss's promotion?

Generating a list with duplicate entries

Why does this boat have a landing pad? (SpaceX's GO Searcher) Any plans for propulsive capsule landings?

Cycles on the torus

Does the US political system, in principle, allow for a no-party system?

What is better: yes / no radio, or simple checkbox?

Is it appropriate to ask a former professor to order a library book for me through ILL?



When Central Limit Theorem breaks down


Central Limit Theorem TailsQuestion about standard deviation and central limit theoremDoes the central limit theorem apply to these probability density functions?Central limit theorem: applicability for assumptions of different testsCentral Limit Theorem for Normal Distribution of Negative BinomialCentral limit theorem heuristicsCentral Limit Theorem for square roots of sums of i.i.d. random variablesUnderstanding the Central Limit Theorem (CLT)Central Limit Theorem and Moment Generating FunctionsHow can the central limit theorem hold for distributions which have limits on the random variable?













2












$begingroup$


Let say I have following numbers

4,3,5,6,5,3,4,2,5,4,3,6,5

I sample some of them, say, 5 of them, and calculate sum of 5 samples.
Then I repeat that over and over to get many sums, and I plot the values of sums in histogram, which will be Gaussian as Central Limit Theorem.



But when they are following numbers, I just replaced 4 with some big number,

4,3,5,6,5,3,10000000,2,5,4,3,6,5

Sampling sum of 5 samples from these never becomes Gaussian in histogram, but more like a split and becomes two Gaussians.



Is there any paper or research that mentioned this?
Thank you










share|cite|improve this question









$endgroup$
















    2












    $begingroup$


    Let say I have following numbers

    4,3,5,6,5,3,4,2,5,4,3,6,5

    I sample some of them, say, 5 of them, and calculate sum of 5 samples.
    Then I repeat that over and over to get many sums, and I plot the values of sums in histogram, which will be Gaussian as Central Limit Theorem.



    But when they are following numbers, I just replaced 4 with some big number,

    4,3,5,6,5,3,10000000,2,5,4,3,6,5

    Sampling sum of 5 samples from these never becomes Gaussian in histogram, but more like a split and becomes two Gaussians.



    Is there any paper or research that mentioned this?
    Thank you










    share|cite|improve this question









    $endgroup$














      2












      2








      2


      1



      $begingroup$


      Let say I have following numbers

      4,3,5,6,5,3,4,2,5,4,3,6,5

      I sample some of them, say, 5 of them, and calculate sum of 5 samples.
      Then I repeat that over and over to get many sums, and I plot the values of sums in histogram, which will be Gaussian as Central Limit Theorem.



      But when they are following numbers, I just replaced 4 with some big number,

      4,3,5,6,5,3,10000000,2,5,4,3,6,5

      Sampling sum of 5 samples from these never becomes Gaussian in histogram, but more like a split and becomes two Gaussians.



      Is there any paper or research that mentioned this?
      Thank you










      share|cite|improve this question









      $endgroup$




      Let say I have following numbers

      4,3,5,6,5,3,4,2,5,4,3,6,5

      I sample some of them, say, 5 of them, and calculate sum of 5 samples.
      Then I repeat that over and over to get many sums, and I plot the values of sums in histogram, which will be Gaussian as Central Limit Theorem.



      But when they are following numbers, I just replaced 4 with some big number,

      4,3,5,6,5,3,10000000,2,5,4,3,6,5

      Sampling sum of 5 samples from these never becomes Gaussian in histogram, but more like a split and becomes two Gaussians.



      Is there any paper or research that mentioned this?
      Thank you







      central-limit-theorem






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked 3 hours ago









      JimSDJimSD

      285




      285




















          2 Answers
          2






          active

          oldest

          votes


















          1












          $begingroup$

          Let's recall, precisely, what the central limit theorem says.




          If $X_1, X_2, cdots, X_k$ are independent and identically distributed random variables, then $fracX_1 + X_2 + cdots + X_kk$ converges in distribution to a normal distribution (*).




          When we have a static list of numbers like



          4,3,5,6,5,3,10000000,2,5,4,3,6,5


          and we are sampling by taking a number at random from this list, to apply the central limit theorem we need to be sure that our sampling scheme satisfies these two conditions of independence and identically distributed.



          • Identically distributed is no problem: each number in the list is equally likely to be chosen.

          • Independent is more subtle, and depends on our sampling scheme. If we are sampling with replacement, then we violate independence. It is only when we sample without replacement that the central limit theorem is applicable.

          So, if we use with replacement sampling in your scheme, then we should be able to apply the central limit theorem. At the same time, you are right, if our sample is of size 5, then we are going to see very different behaviour depending on if the very large number is chosen, or not chosen in our sample.



          So what's the rub? Well, the rate of convergence to a normal distribution is very dependent on the shape of the population we are sampling from, in particular, if our population is very skew, we expect it to take a long time to converge to the normal. This is the case in our example, so we should not expect that a sample of size 5 is sufficient to show the normal structure.



          Three Normal Distributions



          Above I repeated your experiment (with replacement sampling) for samples of size 5, 100, and 1000. You can see that the normal structure is emergent for very large samples.



          (*) Note there are some technical conditions needed here, like finite mean and variance. They are easily verified to be true in our sampling from a list example.






          share|cite|improve this answer











          $endgroup$












          • $begingroup$
            Thank you for a very quick and perfect answer. Idea of CLT, replacement, the need for more samples when data distribution is skewed,... It is very clear now. My original intention of question is, just as you mentioned, the case when one large number is included without replacement and the number of sampling is fixed. It behaves very differently, and therefore we need to consider "conditional" CLT for the case a large number is sampled and the case not sampled. I wonder if there is any research or prior work for that.. But thank you anyway.
            $endgroup$
            – JimSD
            8 mins ago


















          2












          $begingroup$

          First of all, the size of each sample should be more than $5$ for the CLT approximation to be good. A rule of thumb is a sample of size $30$ or more. With the population of your first example, $30$ is in fact OK.



          pop <- c(4,3,5,6,5,3,4,2,5,4,3,6,5)
          N <- 10^5
          n <- 30
          x <- matrix(sample(pop, size = N*n, replace = TRUE), nrow = N)
          x_bar <- rowMeans(x)
          hist(x_bar, freq = FALSE, col = "cyan")
          f <- function(t) dnorm(t, mean = mean(pop), sd = sd(pop)/sqrt(n))
          curve(f, add = TRUE, lwd = 2, col = "red")


          enter image description here



          In your second example, because of the shape of the population distribution (it's too much skewed; see guy's comment bellow), samples of size $30$ won't give you a good approximation for the distribution of the sample mean using the CLT.



          pop <- c(4,3,5,6,5,3,10000000,2,5,4,3,6,5)
          N <- 10^5
          n <- 30
          x <- matrix(sample(pop, size = N*n, replace = TRUE), nrow = N)
          x_bar <- rowMeans(x)
          hist(x_bar, freq = FALSE, col = "cyan")
          f <- function(t) dnorm(t, mean = mean(pop), sd = sd(pop)/sqrt(n))
          curve(f, add = TRUE, lwd = 2, col = "red")


          enter image description here



          But, with this second population, samples of, say, size $100$ are fine.



          pop <- c(4,3,5,6,5,3,10000000,2,5,4,3,6,5)
          N <- 10^5
          n <- 100
          x <- matrix(sample(pop, size = N*n, replace = TRUE), nrow = N)
          x_bar <- rowMeans(x)
          hist(x_bar, freq = FALSE, col = "cyan")
          f <- function(t) dnorm(t, mean = mean(pop), sd = sd(pop)/sqrt(n))
          curve(f, add = TRUE, lwd = 2, col = "red")


          enter image description here






          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            It’s not the variance that is problem. One way of getting rigorous control is using the ratio of the third central moment to the standard deviation cubed, as in the Berry-Esseen theorem.
            $endgroup$
            – guy
            3 hours ago











          • $begingroup$
            Perfect. Added. Tks.
            $endgroup$
            – Paulo C. Marques F.
            3 hours ago










          • $begingroup$
            Thank you for a quick, visual, and perfect answer with a code. I was very surprised how quick it was! I was not aware of the appropriate number of sampling. I was thinking of the case where the number of sampling is fixed.
            $endgroup$
            – JimSD
            13 mins ago










          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "65"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f396493%2fwhen-central-limit-theorem-breaks-down%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1












          $begingroup$

          Let's recall, precisely, what the central limit theorem says.




          If $X_1, X_2, cdots, X_k$ are independent and identically distributed random variables, then $fracX_1 + X_2 + cdots + X_kk$ converges in distribution to a normal distribution (*).




          When we have a static list of numbers like



          4,3,5,6,5,3,10000000,2,5,4,3,6,5


          and we are sampling by taking a number at random from this list, to apply the central limit theorem we need to be sure that our sampling scheme satisfies these two conditions of independence and identically distributed.



          • Identically distributed is no problem: each number in the list is equally likely to be chosen.

          • Independent is more subtle, and depends on our sampling scheme. If we are sampling with replacement, then we violate independence. It is only when we sample without replacement that the central limit theorem is applicable.

          So, if we use with replacement sampling in your scheme, then we should be able to apply the central limit theorem. At the same time, you are right, if our sample is of size 5, then we are going to see very different behaviour depending on if the very large number is chosen, or not chosen in our sample.



          So what's the rub? Well, the rate of convergence to a normal distribution is very dependent on the shape of the population we are sampling from, in particular, if our population is very skew, we expect it to take a long time to converge to the normal. This is the case in our example, so we should not expect that a sample of size 5 is sufficient to show the normal structure.



          Three Normal Distributions



          Above I repeated your experiment (with replacement sampling) for samples of size 5, 100, and 1000. You can see that the normal structure is emergent for very large samples.



          (*) Note there are some technical conditions needed here, like finite mean and variance. They are easily verified to be true in our sampling from a list example.






          share|cite|improve this answer











          $endgroup$












          • $begingroup$
            Thank you for a very quick and perfect answer. Idea of CLT, replacement, the need for more samples when data distribution is skewed,... It is very clear now. My original intention of question is, just as you mentioned, the case when one large number is included without replacement and the number of sampling is fixed. It behaves very differently, and therefore we need to consider "conditional" CLT for the case a large number is sampled and the case not sampled. I wonder if there is any research or prior work for that.. But thank you anyway.
            $endgroup$
            – JimSD
            8 mins ago















          1












          $begingroup$

          Let's recall, precisely, what the central limit theorem says.




          If $X_1, X_2, cdots, X_k$ are independent and identically distributed random variables, then $fracX_1 + X_2 + cdots + X_kk$ converges in distribution to a normal distribution (*).




          When we have a static list of numbers like



          4,3,5,6,5,3,10000000,2,5,4,3,6,5


          and we are sampling by taking a number at random from this list, to apply the central limit theorem we need to be sure that our sampling scheme satisfies these two conditions of independence and identically distributed.



          • Identically distributed is no problem: each number in the list is equally likely to be chosen.

          • Independent is more subtle, and depends on our sampling scheme. If we are sampling with replacement, then we violate independence. It is only when we sample without replacement that the central limit theorem is applicable.

          So, if we use with replacement sampling in your scheme, then we should be able to apply the central limit theorem. At the same time, you are right, if our sample is of size 5, then we are going to see very different behaviour depending on if the very large number is chosen, or not chosen in our sample.



          So what's the rub? Well, the rate of convergence to a normal distribution is very dependent on the shape of the population we are sampling from, in particular, if our population is very skew, we expect it to take a long time to converge to the normal. This is the case in our example, so we should not expect that a sample of size 5 is sufficient to show the normal structure.



          Three Normal Distributions



          Above I repeated your experiment (with replacement sampling) for samples of size 5, 100, and 1000. You can see that the normal structure is emergent for very large samples.



          (*) Note there are some technical conditions needed here, like finite mean and variance. They are easily verified to be true in our sampling from a list example.






          share|cite|improve this answer











          $endgroup$












          • $begingroup$
            Thank you for a very quick and perfect answer. Idea of CLT, replacement, the need for more samples when data distribution is skewed,... It is very clear now. My original intention of question is, just as you mentioned, the case when one large number is included without replacement and the number of sampling is fixed. It behaves very differently, and therefore we need to consider "conditional" CLT for the case a large number is sampled and the case not sampled. I wonder if there is any research or prior work for that.. But thank you anyway.
            $endgroup$
            – JimSD
            8 mins ago













          1












          1








          1





          $begingroup$

          Let's recall, precisely, what the central limit theorem says.




          If $X_1, X_2, cdots, X_k$ are independent and identically distributed random variables, then $fracX_1 + X_2 + cdots + X_kk$ converges in distribution to a normal distribution (*).




          When we have a static list of numbers like



          4,3,5,6,5,3,10000000,2,5,4,3,6,5


          and we are sampling by taking a number at random from this list, to apply the central limit theorem we need to be sure that our sampling scheme satisfies these two conditions of independence and identically distributed.



          • Identically distributed is no problem: each number in the list is equally likely to be chosen.

          • Independent is more subtle, and depends on our sampling scheme. If we are sampling with replacement, then we violate independence. It is only when we sample without replacement that the central limit theorem is applicable.

          So, if we use with replacement sampling in your scheme, then we should be able to apply the central limit theorem. At the same time, you are right, if our sample is of size 5, then we are going to see very different behaviour depending on if the very large number is chosen, or not chosen in our sample.



          So what's the rub? Well, the rate of convergence to a normal distribution is very dependent on the shape of the population we are sampling from, in particular, if our population is very skew, we expect it to take a long time to converge to the normal. This is the case in our example, so we should not expect that a sample of size 5 is sufficient to show the normal structure.



          Three Normal Distributions



          Above I repeated your experiment (with replacement sampling) for samples of size 5, 100, and 1000. You can see that the normal structure is emergent for very large samples.



          (*) Note there are some technical conditions needed here, like finite mean and variance. They are easily verified to be true in our sampling from a list example.






          share|cite|improve this answer











          $endgroup$



          Let's recall, precisely, what the central limit theorem says.




          If $X_1, X_2, cdots, X_k$ are independent and identically distributed random variables, then $fracX_1 + X_2 + cdots + X_kk$ converges in distribution to a normal distribution (*).




          When we have a static list of numbers like



          4,3,5,6,5,3,10000000,2,5,4,3,6,5


          and we are sampling by taking a number at random from this list, to apply the central limit theorem we need to be sure that our sampling scheme satisfies these two conditions of independence and identically distributed.



          • Identically distributed is no problem: each number in the list is equally likely to be chosen.

          • Independent is more subtle, and depends on our sampling scheme. If we are sampling with replacement, then we violate independence. It is only when we sample without replacement that the central limit theorem is applicable.

          So, if we use with replacement sampling in your scheme, then we should be able to apply the central limit theorem. At the same time, you are right, if our sample is of size 5, then we are going to see very different behaviour depending on if the very large number is chosen, or not chosen in our sample.



          So what's the rub? Well, the rate of convergence to a normal distribution is very dependent on the shape of the population we are sampling from, in particular, if our population is very skew, we expect it to take a long time to converge to the normal. This is the case in our example, so we should not expect that a sample of size 5 is sufficient to show the normal structure.



          Three Normal Distributions



          Above I repeated your experiment (with replacement sampling) for samples of size 5, 100, and 1000. You can see that the normal structure is emergent for very large samples.



          (*) Note there are some technical conditions needed here, like finite mean and variance. They are easily verified to be true in our sampling from a list example.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited 3 hours ago

























          answered 3 hours ago









          Matthew DruryMatthew Drury

          26k263105




          26k263105











          • $begingroup$
            Thank you for a very quick and perfect answer. Idea of CLT, replacement, the need for more samples when data distribution is skewed,... It is very clear now. My original intention of question is, just as you mentioned, the case when one large number is included without replacement and the number of sampling is fixed. It behaves very differently, and therefore we need to consider "conditional" CLT for the case a large number is sampled and the case not sampled. I wonder if there is any research or prior work for that.. But thank you anyway.
            $endgroup$
            – JimSD
            8 mins ago
















          • $begingroup$
            Thank you for a very quick and perfect answer. Idea of CLT, replacement, the need for more samples when data distribution is skewed,... It is very clear now. My original intention of question is, just as you mentioned, the case when one large number is included without replacement and the number of sampling is fixed. It behaves very differently, and therefore we need to consider "conditional" CLT for the case a large number is sampled and the case not sampled. I wonder if there is any research or prior work for that.. But thank you anyway.
            $endgroup$
            – JimSD
            8 mins ago















          $begingroup$
          Thank you for a very quick and perfect answer. Idea of CLT, replacement, the need for more samples when data distribution is skewed,... It is very clear now. My original intention of question is, just as you mentioned, the case when one large number is included without replacement and the number of sampling is fixed. It behaves very differently, and therefore we need to consider "conditional" CLT for the case a large number is sampled and the case not sampled. I wonder if there is any research or prior work for that.. But thank you anyway.
          $endgroup$
          – JimSD
          8 mins ago




          $begingroup$
          Thank you for a very quick and perfect answer. Idea of CLT, replacement, the need for more samples when data distribution is skewed,... It is very clear now. My original intention of question is, just as you mentioned, the case when one large number is included without replacement and the number of sampling is fixed. It behaves very differently, and therefore we need to consider "conditional" CLT for the case a large number is sampled and the case not sampled. I wonder if there is any research or prior work for that.. But thank you anyway.
          $endgroup$
          – JimSD
          8 mins ago













          2












          $begingroup$

          First of all, the size of each sample should be more than $5$ for the CLT approximation to be good. A rule of thumb is a sample of size $30$ or more. With the population of your first example, $30$ is in fact OK.



          pop <- c(4,3,5,6,5,3,4,2,5,4,3,6,5)
          N <- 10^5
          n <- 30
          x <- matrix(sample(pop, size = N*n, replace = TRUE), nrow = N)
          x_bar <- rowMeans(x)
          hist(x_bar, freq = FALSE, col = "cyan")
          f <- function(t) dnorm(t, mean = mean(pop), sd = sd(pop)/sqrt(n))
          curve(f, add = TRUE, lwd = 2, col = "red")


          enter image description here



          In your second example, because of the shape of the population distribution (it's too much skewed; see guy's comment bellow), samples of size $30$ won't give you a good approximation for the distribution of the sample mean using the CLT.



          pop <- c(4,3,5,6,5,3,10000000,2,5,4,3,6,5)
          N <- 10^5
          n <- 30
          x <- matrix(sample(pop, size = N*n, replace = TRUE), nrow = N)
          x_bar <- rowMeans(x)
          hist(x_bar, freq = FALSE, col = "cyan")
          f <- function(t) dnorm(t, mean = mean(pop), sd = sd(pop)/sqrt(n))
          curve(f, add = TRUE, lwd = 2, col = "red")


          enter image description here



          But, with this second population, samples of, say, size $100$ are fine.



          pop <- c(4,3,5,6,5,3,10000000,2,5,4,3,6,5)
          N <- 10^5
          n <- 100
          x <- matrix(sample(pop, size = N*n, replace = TRUE), nrow = N)
          x_bar <- rowMeans(x)
          hist(x_bar, freq = FALSE, col = "cyan")
          f <- function(t) dnorm(t, mean = mean(pop), sd = sd(pop)/sqrt(n))
          curve(f, add = TRUE, lwd = 2, col = "red")


          enter image description here






          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            It’s not the variance that is problem. One way of getting rigorous control is using the ratio of the third central moment to the standard deviation cubed, as in the Berry-Esseen theorem.
            $endgroup$
            – guy
            3 hours ago











          • $begingroup$
            Perfect. Added. Tks.
            $endgroup$
            – Paulo C. Marques F.
            3 hours ago










          • $begingroup$
            Thank you for a quick, visual, and perfect answer with a code. I was very surprised how quick it was! I was not aware of the appropriate number of sampling. I was thinking of the case where the number of sampling is fixed.
            $endgroup$
            – JimSD
            13 mins ago















          2












          $begingroup$

          First of all, the size of each sample should be more than $5$ for the CLT approximation to be good. A rule of thumb is a sample of size $30$ or more. With the population of your first example, $30$ is in fact OK.



          pop <- c(4,3,5,6,5,3,4,2,5,4,3,6,5)
          N <- 10^5
          n <- 30
          x <- matrix(sample(pop, size = N*n, replace = TRUE), nrow = N)
          x_bar <- rowMeans(x)
          hist(x_bar, freq = FALSE, col = "cyan")
          f <- function(t) dnorm(t, mean = mean(pop), sd = sd(pop)/sqrt(n))
          curve(f, add = TRUE, lwd = 2, col = "red")


          enter image description here



          In your second example, because of the shape of the population distribution (it's too much skewed; see guy's comment bellow), samples of size $30$ won't give you a good approximation for the distribution of the sample mean using the CLT.



          pop <- c(4,3,5,6,5,3,10000000,2,5,4,3,6,5)
          N <- 10^5
          n <- 30
          x <- matrix(sample(pop, size = N*n, replace = TRUE), nrow = N)
          x_bar <- rowMeans(x)
          hist(x_bar, freq = FALSE, col = "cyan")
          f <- function(t) dnorm(t, mean = mean(pop), sd = sd(pop)/sqrt(n))
          curve(f, add = TRUE, lwd = 2, col = "red")


          enter image description here



          But, with this second population, samples of, say, size $100$ are fine.



          pop <- c(4,3,5,6,5,3,10000000,2,5,4,3,6,5)
          N <- 10^5
          n <- 100
          x <- matrix(sample(pop, size = N*n, replace = TRUE), nrow = N)
          x_bar <- rowMeans(x)
          hist(x_bar, freq = FALSE, col = "cyan")
          f <- function(t) dnorm(t, mean = mean(pop), sd = sd(pop)/sqrt(n))
          curve(f, add = TRUE, lwd = 2, col = "red")


          enter image description here






          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            It’s not the variance that is problem. One way of getting rigorous control is using the ratio of the third central moment to the standard deviation cubed, as in the Berry-Esseen theorem.
            $endgroup$
            – guy
            3 hours ago











          • $begingroup$
            Perfect. Added. Tks.
            $endgroup$
            – Paulo C. Marques F.
            3 hours ago










          • $begingroup$
            Thank you for a quick, visual, and perfect answer with a code. I was very surprised how quick it was! I was not aware of the appropriate number of sampling. I was thinking of the case where the number of sampling is fixed.
            $endgroup$
            – JimSD
            13 mins ago













          2












          2








          2





          $begingroup$

          First of all, the size of each sample should be more than $5$ for the CLT approximation to be good. A rule of thumb is a sample of size $30$ or more. With the population of your first example, $30$ is in fact OK.



          pop <- c(4,3,5,6,5,3,4,2,5,4,3,6,5)
          N <- 10^5
          n <- 30
          x <- matrix(sample(pop, size = N*n, replace = TRUE), nrow = N)
          x_bar <- rowMeans(x)
          hist(x_bar, freq = FALSE, col = "cyan")
          f <- function(t) dnorm(t, mean = mean(pop), sd = sd(pop)/sqrt(n))
          curve(f, add = TRUE, lwd = 2, col = "red")


          enter image description here



          In your second example, because of the shape of the population distribution (it's too much skewed; see guy's comment bellow), samples of size $30$ won't give you a good approximation for the distribution of the sample mean using the CLT.



          pop <- c(4,3,5,6,5,3,10000000,2,5,4,3,6,5)
          N <- 10^5
          n <- 30
          x <- matrix(sample(pop, size = N*n, replace = TRUE), nrow = N)
          x_bar <- rowMeans(x)
          hist(x_bar, freq = FALSE, col = "cyan")
          f <- function(t) dnorm(t, mean = mean(pop), sd = sd(pop)/sqrt(n))
          curve(f, add = TRUE, lwd = 2, col = "red")


          enter image description here



          But, with this second population, samples of, say, size $100$ are fine.



          pop <- c(4,3,5,6,5,3,10000000,2,5,4,3,6,5)
          N <- 10^5
          n <- 100
          x <- matrix(sample(pop, size = N*n, replace = TRUE), nrow = N)
          x_bar <- rowMeans(x)
          hist(x_bar, freq = FALSE, col = "cyan")
          f <- function(t) dnorm(t, mean = mean(pop), sd = sd(pop)/sqrt(n))
          curve(f, add = TRUE, lwd = 2, col = "red")


          enter image description here






          share|cite|improve this answer











          $endgroup$



          First of all, the size of each sample should be more than $5$ for the CLT approximation to be good. A rule of thumb is a sample of size $30$ or more. With the population of your first example, $30$ is in fact OK.



          pop <- c(4,3,5,6,5,3,4,2,5,4,3,6,5)
          N <- 10^5
          n <- 30
          x <- matrix(sample(pop, size = N*n, replace = TRUE), nrow = N)
          x_bar <- rowMeans(x)
          hist(x_bar, freq = FALSE, col = "cyan")
          f <- function(t) dnorm(t, mean = mean(pop), sd = sd(pop)/sqrt(n))
          curve(f, add = TRUE, lwd = 2, col = "red")


          enter image description here



          In your second example, because of the shape of the population distribution (it's too much skewed; see guy's comment bellow), samples of size $30$ won't give you a good approximation for the distribution of the sample mean using the CLT.



          pop <- c(4,3,5,6,5,3,10000000,2,5,4,3,6,5)
          N <- 10^5
          n <- 30
          x <- matrix(sample(pop, size = N*n, replace = TRUE), nrow = N)
          x_bar <- rowMeans(x)
          hist(x_bar, freq = FALSE, col = "cyan")
          f <- function(t) dnorm(t, mean = mean(pop), sd = sd(pop)/sqrt(n))
          curve(f, add = TRUE, lwd = 2, col = "red")


          enter image description here



          But, with this second population, samples of, say, size $100$ are fine.



          pop <- c(4,3,5,6,5,3,10000000,2,5,4,3,6,5)
          N <- 10^5
          n <- 100
          x <- matrix(sample(pop, size = N*n, replace = TRUE), nrow = N)
          x_bar <- rowMeans(x)
          hist(x_bar, freq = FALSE, col = "cyan")
          f <- function(t) dnorm(t, mean = mean(pop), sd = sd(pop)/sqrt(n))
          curve(f, add = TRUE, lwd = 2, col = "red")


          enter image description here







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited 3 hours ago

























          answered 3 hours ago









          Paulo C. Marques F.Paulo C. Marques F.

          17.2k35497




          17.2k35497







          • 1




            $begingroup$
            It’s not the variance that is problem. One way of getting rigorous control is using the ratio of the third central moment to the standard deviation cubed, as in the Berry-Esseen theorem.
            $endgroup$
            – guy
            3 hours ago











          • $begingroup$
            Perfect. Added. Tks.
            $endgroup$
            – Paulo C. Marques F.
            3 hours ago










          • $begingroup$
            Thank you for a quick, visual, and perfect answer with a code. I was very surprised how quick it was! I was not aware of the appropriate number of sampling. I was thinking of the case where the number of sampling is fixed.
            $endgroup$
            – JimSD
            13 mins ago












          • 1




            $begingroup$
            It’s not the variance that is problem. One way of getting rigorous control is using the ratio of the third central moment to the standard deviation cubed, as in the Berry-Esseen theorem.
            $endgroup$
            – guy
            3 hours ago











          • $begingroup$
            Perfect. Added. Tks.
            $endgroup$
            – Paulo C. Marques F.
            3 hours ago










          • $begingroup$
            Thank you for a quick, visual, and perfect answer with a code. I was very surprised how quick it was! I was not aware of the appropriate number of sampling. I was thinking of the case where the number of sampling is fixed.
            $endgroup$
            – JimSD
            13 mins ago







          1




          1




          $begingroup$
          It’s not the variance that is problem. One way of getting rigorous control is using the ratio of the third central moment to the standard deviation cubed, as in the Berry-Esseen theorem.
          $endgroup$
          – guy
          3 hours ago





          $begingroup$
          It’s not the variance that is problem. One way of getting rigorous control is using the ratio of the third central moment to the standard deviation cubed, as in the Berry-Esseen theorem.
          $endgroup$
          – guy
          3 hours ago













          $begingroup$
          Perfect. Added. Tks.
          $endgroup$
          – Paulo C. Marques F.
          3 hours ago




          $begingroup$
          Perfect. Added. Tks.
          $endgroup$
          – Paulo C. Marques F.
          3 hours ago












          $begingroup$
          Thank you for a quick, visual, and perfect answer with a code. I was very surprised how quick it was! I was not aware of the appropriate number of sampling. I was thinking of the case where the number of sampling is fixed.
          $endgroup$
          – JimSD
          13 mins ago




          $begingroup$
          Thank you for a quick, visual, and perfect answer with a code. I was very surprised how quick it was! I was not aware of the appropriate number of sampling. I was thinking of the case where the number of sampling is fixed.
          $endgroup$
          – JimSD
          13 mins ago

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Cross Validated!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f396493%2fwhen-central-limit-theorem-breaks-down%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Frič See also Navigation menuinternal link

          Identify plant with long narrow paired leaves and reddish stems Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?What is this plant with long sharp leaves? Is it a weed?What is this 3ft high, stalky plant, with mid sized narrow leaves?What is this young shrub with opposite ovate, crenate leaves and reddish stems?What is this plant with large broad serrated leaves?Identify this upright branching weed with long leaves and reddish stemsPlease help me identify this bulbous plant with long, broad leaves and white flowersWhat is this small annual with narrow gray/green leaves and rust colored daisy-type flowers?What is this chilli plant?Does anyone know what type of chilli plant this is?Help identify this plant

          fontconfig warning: “/etc/fonts/fonts.conf”, line 100: unknown “element blank” The 2019 Stack Overflow Developer Survey Results Are In“tar: unrecognized option --warning” during 'apt-get install'How to fix Fontconfig errorHow do I figure out which font file is chosen for a system generic font alias?Why are some apt-get-installed fonts being ignored by fc-list, xfontsel, etc?Reload settings in /etc/fonts/conf.dTaking 30 seconds longer to boot after upgrade from jessie to stretchHow to match multiple font names with a single <match> element?Adding a custom font to fontconfigRemoving fonts from fontconfig <match> resultsBroken fonts after upgrading Firefox ESR to latest Firefox