Learn Statistics for data science

Summary

The webpage provides an overview of the essential statistical concepts for aspiring data scientists, covering the definition of statistics, types of statistics, population versus sample, errors in statistics, and various sampling methods.

Abstract

The article "Learn Statistics for data science" outlines the fundamental statistical knowledge required for a data science career. It defines statistics as the discipline of collecting, organizing, and analyzing data to extract meaningful insights. The text distinguishes between descriptive and inferential statistics, emphasizing the importance of understanding both population and sample data. Parameters are described as characteristics of the entire population, while statistics are derived from samples. The article also discusses the challenges of sampling errors and non-sampling errors, the use of a sampling frame, and different sampling techniques such as simple random sampling, stratified sampling, systematic sampling, and convenience sampling. Each method is explained with examples to facilitate comprehension, and the article concludes with a call to action for readers to explore further resources on data science and an AI service recommendation.

Opinions

The author suggests that understanding statistics is crucial for data science, implying that it is a foundational skill for anyone entering the field.
The article implies that while descriptive statistics are useful for summarizing data, inferential statistics are essential for making predictions or inferences about a population based on a sample.
There is an emphasis on the practical difficulties of working with entire populations, suggesting that samples are often more feasible for statistical analysis.
The author indicates that both sampling errors and non-sampling errors are inherent risks in statistical analysis, highlighting the importance of careful sampling design and accurate measurement.
The recommendation of the AI service ZAI.chat at the end of the article suggests that the author believes in the value of cost-effective AI tools for enhancing performance in data science tasks.

What is Statistics? 🤔

Generally, statistics are applied for collecting, organizing and analyzing the data(a piece of information). In other words, analyzing the numerical information and extracting information from data.

A Statistic is a measure that tells only a Sample of the population. Eg: Conducting a Quiz to Question random (voluntary) people.

Population Vs sample:

Population :

A population is a group of people or objects so each member in a group is considered a population. The population is a collection of all items which can be denoted by N.

Taking population and performing the calculation is extremely difficult because the more time is consumed, the higher cost is required and hard to observe.

Sample :

A sample is a piece of information taken from the population. To perform “inferential statistics” we take a sample from the population. In other words, a sample is a subset of a population.

Error in Statistics:

We can face two different types of errors while performing statistics

Sampling error

Non-sampling error

Sampling error occurs when the population mean will differ from our sample mean. This mistake is made by a statistician(analyst)

Non-sampling error: The occurrence of non-sampling error is caused by poor sample design and inaccurate measurement. This error is the goal that we want to avoid.

i. Simple Random Sample[SRS]

Generally, SRS is a randomly selected subset of a population that “every sample has an equal chance of being selected”. the major advantage in simple random sampling is we can also randomly assign numbers to the population, but numbers must be unique not as identical.

“Student ID card” is a good example of a simple random sample, because every individual student has a different ID card number.

iii. Systematic Sampling :

Every nth (specific) individual from the population(N) is placed in the sample(n).

Example: Let us consider Specifically taking the 7th value from the data set and adding them. like in a supermarket the staff member only giving rewards to the 7th customer,14th customer, 21st customer..so on, which means you are performing the systematic sampling.

Learn Statistics for data science

Fundamentals of statistics you need to know to start your data science career

What is Statistics? 🤔

Descriptive Statistics And Inferential Statistics:

Parameter:

Population Vs sample:

Error in Statistics:

Sampling Frame:

Types of Sampling

i. Simple Random Sample[SRS]

ii. Stratified Sampling:

iii. Systematic Sampling :

iv. Convenience Sampling :

Python for everyone

How to become an expert in python programming

How regression applied in data science

read about logistic regression for free in just 3 minutes