avatarSaran

Summary

The webpage provides an overview of the essential statistical concepts for aspiring data scientists, covering the definition of statistics, types of statistics, population versus sample, errors in statistics, and various sampling methods.

Abstract

The article "Learn Statistics for data science" outlines the fundamental statistical knowledge required for a data science career. It defines statistics as the discipline of collecting, organizing, and analyzing data to extract meaningful insights. The text distinguishes between descriptive and inferential statistics, emphasizing the importance of understanding both population and sample data. Parameters are described as characteristics of the entire population, while statistics are derived from samples. The article also discusses the challenges of sampling errors and non-sampling errors, the use of a sampling frame, and different sampling techniques such as simple random sampling, stratified sampling, systematic sampling, and convenience sampling. Each method is explained with examples to facilitate comprehension, and the article concludes with a call to action for readers to explore further resources on data science and an AI service recommendation.

Opinions

  • The author suggests that understanding statistics is crucial for data science, implying that it is a foundational skill for anyone entering the field.
  • The article implies that while descriptive statistics are useful for summarizing data, inferential statistics are essential for making predictions or inferences about a population based on a sample.
  • There is an emphasis on the practical difficulties of working with entire populations, suggesting that samples are often more feasible for statistical analysis.
  • The author indicates that both sampling errors and non-sampling errors are inherent risks in statistical analysis, highlighting the importance of careful sampling design and accurate measurement.
  • The recommendation of the AI service ZAI.chat at the end of the article suggests that the author believes in the value of cost-effective AI tools for enhancing performance in data science tasks.

Learn Statistics for data science

Fundamentals of statistics you need to know to start your data science career

What is Statistics? 🤔

Generally, statistics are applied for collecting, organizing and analyzing the data(a piece of information). In other words, analyzing the numerical information and extracting information from data.

A Statistic is a measure that tells only a Sample of the population. Eg: Conducting a Quiz to Question random (voluntary) people.

Descriptive Statistics And Inferential Statistics:

Descriptive Statistics: It involves the method of picturing information observed from samples and population

Inferential statistics: Method of using information from a sample to conclude the population.

  • Statistics deals with population and sample.

Parameter:

A parameter (P) describes the enter population Eg: Conducting a quiz to question random (voluntary) people

Population Vs sample:

Population :

  • A population is a group of people or objects so each member in a group is considered a population. The population is a collection of all items which can be denoted by N.
  • Taking population and performing the calculation is extremely difficult because the more time is consumed, the higher cost is required and hard to observe.

Sample :

A sample is a piece of information taken from the population. To perform “inferential statistics” we take a sample from the population. In other words, a sample is a subset of a population.

Error in Statistics:

We can face two different types of errors while performing statistics

  • Sampling error
  • Non-sampling error

Sampling error occurs when the population mean will differ from our sample mean. This mistake is made by a statistician(analyst)

Non-sampling error: The occurrence of non-sampling error is caused by poor sample design and inaccurate measurement. This error is the goal that we want to avoid.

Sampling Frame:

This indicates the list of individuals from the sample is selected. we can say a list of students enrolled in a college is an example. Sometimes list may be either physical or theoretical.

Types of Sampling

Simple Random Sampling

Stratified Sampling

Systematic sampling

Convenience Sampling

i. Simple Random Sample[SRS]

Generally, SRS is a randomly selected subset of a population that “every sample has an equal chance of being selected”. the major advantage in simple random sampling is we can also randomly assign numbers to the population, but numbers must be unique not as identical.

Student ID card” is a good example of a simple random sample, because every individual student has a different ID card number.

ii. Stratified Sampling:

The population(N) is split into “non-overlapping or well-separated groups (strata), then simple random sampling is done for each group to form a sample(n).

iii. Systematic Sampling :

  • Every nth (specific) individual from the population(N) is placed in the sample(n).
  • Example: Let us consider Specifically taking the 7th value from the data set and adding them. like in a supermarket the staff member only giving rewards to the 7th customer,14th customer, 21st customer..so on, which means you are performing the systematic sampling.

iv. Convenience Sampling :

Convenience sampling is one of the easiest samplings of all, picking the easiest way of getting your sample. In other words “ easily obtained individuals from the population are placed in the sample(n)”.

  • Convenience sampling is also called voluntary response sampling.

Thank you..

Read more articles here:

Technology
Mathematics
Statistics
Education
Data Science
Recommended from ReadMedium