잔차 제곱합

작성자

익명

작성일

2025.07.27

조회수

버전

잔차 제곱합 회귀 분석 최소제곱법 결정 계수 통계학 ANOVA 모델 평가 과적합 중급

📋 문서 버전

이 문서는 2개의 버전이 있습니다. 현재 최신 버전을 보고 있습니다.

잔차 제곱합

개요

잔차 제곱합(Sum of Squared Residuals, SSR)은 회귀 분석에서 모델의 예측값과 실제 관측값 간의 차이를 정량적으로 평가하는 지표입니다. 이 값은 잔차(residual)를 제곱한 후 모든 관측치에 대해 합산한 것으로, 모델의 적합도를 판단하는 핵심 요소입니다. 잔차 제곱합이 작을수록 모델이 데이터에 잘 맞는다는 의미이며, 최소제곱법(Ordinary Least Squares, OLS)은 이 값을 최소화하는 회귀 계수를 추정하는 방법입니다.

수학적 표현

공식

잔차 제곱합은 다음과 같이 정의됩니다:

$$ \text{SSR} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$

$ y_i $: 실제 관측값
$ \hat{y}_i $: 회귀 모델의 예측값
$ n $: 관측치 개수

이 공식에서 잔차 $ e_i = y_i - \hat{y}_i $는 모델의 오차를 나타내며, 제곱하여 양수로 만든 후 모두 더해집니다.

예시

다음 표는 단순 선형 회귀 모델의 잔차 제곱합 계산 예시입니다:

관측치	$ y_i $	$ \hat{y}_i $	잔차 $ e_i $	잔차 제곱 $ e_i^2 $
1	2.0	2.5	-0.5	0.25
2	4.0	3.8	+0.2	0.04
3	6.0	6.2	-0.2	0.04
4	8.0	7.5	+0.5	0.25
합계				0.58

이 경우 SSR은 0.58로 계산됩니다.

회귀 분석에서의 중요성

모델 평가

잔차 제곱합은 모델의 예측 정확도를 수치적으로 표현합니다. 예를 들어, 두 모델 A와 B의 SSR이 각각 10과 50이라면 모델 A가 데이터에 더 잘 적합된다고 판단할 수 있습니다. 그러나 SSR은 데이터의 스케일에 따라 값이 달라지기 때문에 표준화된 지표(예: $ R^2 $)와 함께 사용하는 것이 좋습니다.

매개변수 추정

최소제곱법은 회귀 계수를 추정할 때 SSR을 최소화하는 값을 찾습니다. 예를 들어, 단순 선형 회귀 $ y = \beta_0 + \beta_1 x $에서 $\beta_0$, $\beta_1$은 다음 조건을 만족합니다:

$$ \min_{\beta_0, \beta_1} \sum_{i=1}^{n} (y_i - (\beta_0 + \beta_1 x_i))^2 $$

이 최적화 문제는 미분을 통해 해석적으로 해결됩니다.

다른 통계 지표와의 관계

결정 계수 $ R^2 $

잔차 제곱합은 $ R^2 $ 계산에 직접 활용됩니다:

$$ R^2 = 1 - \frac{\text{SSR}}{\text{SST}} $$

SST(Total Sum of Squares): 총변동 $ \sum (y_i - \bar{y})^2 $
$ R^2 $가 1에 가까울수록 모델의 설명력이 높습니다.

평균 제곱 오차(MSE)

잔차 제곱합을 관측치 수 $ n $으로 나눈 값입니다:

$$ \text{MSE} = \frac{\text{SSR}}{n} $$

MSE는 오차의 평균적인 크기를 나타내며, RMSE(Root Mean Squared Error)는 이의 제곱근입니다.

한계 및 주의사항

과적합(Overfitting)

복잡한 모델은 SSR을 인위적으로 낮출 수 있지만, 이는 훈련 데이터에만 과도하게 적합되어 일반화 성능이 저하될 수 있습니다. 예를 들어, 다항 회귀에서 차수를 높이면 SSR은 감소하지만, 테스트 데이터에서는 성능이 떨어질 수 있습니다.

다른 지표와의 비교

AIC/BIC: 모델 복잡도를 패널티로 적용한 정보 기준
교차 검증(Cross-Validation): SSR을 테스트 데이터로 평가하여 일반화 성능 확인

관련 개념 및 참고 자료

참고 자료

잔차 제곱합은 통계적 모델링의 기초이자 핵심 지표로, 해석과 활용 시 데이터의 특성과 모델의 복잡도를 종합적으로 고려해야 합니다.

📝 마크다운 원본

이 문서의 마크다운 원본 내용입니다.

# 잔차 제곱합

## 개요
잔차 제곱합(Sum of Squared Residuals, SSR)은 **회귀 분석**에서 모델의 예측값과 실제 관측값 간의 차이를 정량적으로 평가하는 지표입니다. 이 값은 잔차(residual)를 제곱한 후 모든 관측치에 대해 합산한 것으로, 모델의 적합도를 판단하는 핵심 요소입니다. 잔차 제곱합이 작을수록 모델이 데이터에 잘 맞는다는 의미이며, **최소제곱법**(Ordinary Least Squares, OLS)은 이 값을 최소화하는 회귀 계수를 추정하는 방법입니다.

---

## 수학적 표현

### 공식
잔차 제곱합은 다음과 같이 정의됩니다:

$$
\text{SSR} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

- $ y_i $: 실제 관측값  
- $ \hat{y}_i $: 회귀 모델의 예측값  
- $ n $: 관측치 개수  

이 공식에서 잔차 $ e_i = y_i - \hat{y}_i $는 모델의 오차를 나타내며, 제곱하여 양수로 만든 후 모두 더해집니다.

### 예시
다음 표는 단순 선형 회귀 모델의 잔차 제곱합 계산 예시입니다:

| 관측치 | $ y_i $ | $ \hat{y}_i $ | 잔차 $ e_i $ | 잔차 제곱 $ e_i^2 $ |
|-------|----------|----------------|---------------|----------------------|
| 1     | 2.0      | 2.5            | -0.5          | 0.25                 |
| 2     | 4.0      | 3.8            | +0.2          | 0.04                 |
| 3     | 6.0      | 6.2            | -0.2          | 0.04                 |
| 4     | 8.0      | 7.5            | +0.5          | 0.25                 |
| **합계** |          |                |               | **0.58**             |

이 경우 SSR은 **0.58**로 계산됩니다.

---

## 회귀 분석에서의 중요성

### 모델 평가
잔차 제곱합은 모델의 예측 정확도를 수치적으로 표현합니다. 예를 들어, 두 모델 A와 B의 SSR이 각각 10과 50이라면 모델 A가 데이터에 더 잘 적합된다고 판단할 수 있습니다. 그러나 SSR은 데이터의 스케일에 따라 값이 달라지기 때문에 **표준화된 지표**(예: $ R^2 $)와 함께 사용하는 것이 좋습니다.

### 매개변수 추정
최소제곱법은 회귀 계수를 추정할 때 SSR을 최소화하는 값을 찾습니다. 예를 들어, 단순 선형 회귀 $ y = \beta_0 + \beta_1 x $에서 $\beta_0$, $\beta_1$은 다음 조건을 만족합니다:

$$
\min_{\beta_0, \beta_1} \sum_{i=1}^{n} (y_i - (\beta_0 + \beta_1 x_i))^2
$$

이 최적화 문제는 미분을 통해 해석적으로 해결됩니다.

---

## 다른 통계 지표와의 관계

### 결정 계수 $ R^2 $
잔차 제곱합은 $ R^2 $ 계산에 직접 활용됩니다:

$$
R^2 = 1 - \frac{\text{SSR}}{\text{SST}}
$$

- **SST**(Total Sum of Squares): 총변동 $ \sum (y_i - \bar{y})^2 $  
- $ R^2 $가 1에 가까울수록 모델의 설명력이 높습니다.

### 평균 제곱 오차(MSE)
잔차 제곱합을 관측치 수 $ n $으로 나눈 값입니다:

$$
\text{MSE} = \frac{\text{SSR}}{n}
$$

MSE는 오차의 평균적인 크기를 나타내며, **RMSE**(Root Mean Squared Error)는 이의 제곱근입니다.

---

## 한계 및 주의사항

### 과적합(Overfitting)
복잡한 모델은 SSR을 인위적으로 낮출 수 있지만, 이는 훈련 데이터에만 과도하게 적합되어 **일반화 성능**이 저하될 수 있습니다. 예를 들어, 다항 회귀에서 차수를 높이면 SSR은 감소하지만, 테스트 데이터에서는 성능이 떨어질 수 있습니다.

### 다른 지표와의 비교
- **AIC/BIC**: 모델 복잡도를 패널티로 적용한 정보 기준  
- **교차 검증**(Cross-Validation): SSR을 테스트 데이터로 평가하여 일반화 성능 확인  

---

## 관련 개념 및 참고 자료

### 관련 개념
- **회귀 제곱합**(Explained Sum of Squares, SSE): $ \sum (\hat{y}_i - \bar{y})^2 $  
- **총제곱합**(Total Sum of Squares, SST): $ \sum (y_i - \bar{y})^2 $  
- **ANOVA**(분산 분석): SSR과 SSE를 활용해 그룹 간 차이를 분석  

### 참고 자료
- [위키피디아: 회귀 분석](https://ko.wikipedia.org/wiki/%ED%9A%8C%EA%B7%80_%EB%B6%84%EC%84%9D)  
- [Coursera: Statistics for Business and Economics](https://www.coursera.org/specializations/business-economics)

--- 

잔차 제곱합은 통계적 모델링의 기초이자 핵심 지표로, 해석과 활용 시 데이터의 특성과 모델의 복잡도를 종합적으로 고려해야 합니다.

AI 생성 콘텐츠 안내

이 문서는 AI 모델(qwen-3-235b-a22b)에 의해 생성된 콘텐츠입니다.

주의사항: AI가 생성한 내용은 부정확하거나 편향된 정보를 포함할 수 있습니다. 중요한 결정을 내리기 전에 반드시 신뢰할 수 있는 출처를 통해 정보를 확인하시기 바랍니다.

🤔 AI의 사고 과정

이 섹션은 AI가 답변을 생성하는 과정에서의 사고 과정을 보여줍니다.

Okay, I need to write a professional Wikipedia-style document in about the topic of "잔차곱합" (Sum Squared Residuals under the categories of Science, Statistics, and Regression Analysis. Let me start by understanding the user's requirements.

First, the should have a clear H1 title, an/introduction section, several sections using H2 and H3 headings, and possibly a references or related documents section. The content must be accurate, objective, and have sufficient depth but be easy to understand. Technical terms should be explained, and the markdown format needs proper structure with lists, tables, code blocks if necessary. The length should be over 800 characters, which I think translates to about a few paragraphs.

Starting with the title: "잔차 제곱합" is correct. Then the introduction should define what it is. Maybe start by explaining residuals in regression analysis. Residuals are the differences between observed and predicted values. The sum of squared residuals (SSR) is a measure of the discrepancy between the data and the regression model. It's important in evaluating the model's performance.

Next, structure the sections. The overview should give a basic definition. Then, maybe a section on its mathematical expression. Then, importance in regression analysis, how it's used in model evaluation, its relationship with other statistical measures like R-squared, and maybe examples or applications. Also, potential issues, like overfitting, and related concepts like MSE or adjusted R-squared.

I need to ensure that each section flows logically. Use H2 for main sections and H3 for subsections. For example, under "수학적 표현", I can have H3 for formula and example. In "회귀 분석에서의 중요성", maybe break into model evaluation and parameter estimation.

I should explain technical terms. For instance, when mentioning OLS (Ordinary Least Squares), clarify that it's a method where the goal is to minimize SSR. Also, mention how SSR relates to total sum of squares (SST) and explained sum of squares (SSE) since R-squared is derived from these.

Including an example would help. Maybe a simple linear regression case where observed values are given, predicted values calculated, residuals squared, and summed. A small table could illustrate this.

Potential pitfalls: Overfitting can lead to low SSR but poor generalization. Mention that SSR alone isn't sufficient for model selection and should be used with other metrics like AIC, BIC, or cross-validation.

Also, connect SSR to other loss functions and metrics like MSE (Mean Squared Error), which is SSR divided by the number of observations. This helps in understanding scale differences.

Need to check accuracy: SSR is the sum of squared residuals, which is the same as the residual sum of squares (RSS) in some contexts. Make sure there's no confusion with other terms like SSE (sometimes used for Error Sum of Squares). In ANOVA, SSE might be the sum of squared errors, which is similar. Clarify if necessary.

Make sure to mention that minimizing SSR is the basis for OLS estimation. When estimating regression coefficients, the goal is to find the line (or hyperplane) that minimizes SSR.

Also, in the related concepts section, link SSR to other evaluation metrics. For instance, R-squared = 1 - (SSR/SST), so explaining SST (total sum of squares) and SSE (explained sum of squares) would be helpful.

I should structure the document as follows:

제목 (H1)
개요 (Introduction)
수학적 표현 (Mathematical Expression)
공식
예시 (table)
회귀 분석에서의 중요성 (Importance in Regression)
모델 평가
매개변수 추정
다른 통계 지표와의 관계 (Relationship with other metrics)
R-제곱
MSE
한계 및 주의사항 (Limitations)
과적합
다른 지표와의 비교
참고 자료 및 관련 문서

Wait, the user mentioned "잔차 제곱합" which is the sum of squared residuals. But sometimes in statistics, the residual sum of squares (RSS) or sum of squared errors (SSE) might be used. Need to clarify that SSR here refers to the same as RSS or SSE in some literature, but in ANOVA, SSE can be different. So maybe in the mathematical section, clarify the different terms.

Also, ensure that the example is correct. Let's say we have observed y and predicted ŷ values. Compute residuals, square them, sum up.

For the markdown format, use tables for examples, code blocks for formulas if needed, but since it's mathematical expressions, using LaTeX in markdown is appropriate. For example, the formula for SSR would be written as:

$$ \text{SSR} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$

But since the user wants Korean text, the formula should be in italics or using LaTeX notation.

Also, in the related documents section, link to regression analysis, R-squared, OLS, ANOVA if applicable.

Now, checking for 800 characters. The example section with a table might help reach the length requirement. Also, detailed explanations in each section.

Potential mistakes to avoid: Confusing SSR with SSE in different contexts. In some references, SSR is the regression sum of squares (explained), and SSE is the error. Need to confirm. For example, in ANOVA and regression, sometimes:

Total Sum of Squares (SST) = Regression Sum of Squares (SSR) + Error Sum of Squares (SSE)

But in other contexts, SSR might stand for residual sum of squares. So this is a possible confusion point. The user's topic is "잔차 제곱합", which translates to Sum of Squared Residuals, so it's the error part. So in the document, clarify that there can be different notations in literature to avoid confusion.

Therefore, in the mathematical expression section, mention that SSR is sometimes called Residual Sum of Squares (RSS) or Error Sum of Squares (SSE) in different sources. But in this document, we'll use SSR as Sum of Squared Residuals.

This is important for accuracy. So the user might be following a specific convention where SSR is the residual sum, but others might use RSS or SSE. Clarifying this helps the reader not get confused when they encounter different terms elsewhere.

Also, in the relationship with R-squared, SST is total, SSR (regression) + SSE (error). Wait, if SSR here is regression sum of squares, then R-squared is SSR/SST. But if SSR is the residual sum, then it's 1 - (SSR/SST). So the terminology is critical.

This is a key point. Therefore, in the document, I need to make sure that the definition aligns with the common usage in regression. Typically:

Total Sum of Squares (SST) = Σ(y_i - ȳ)^2
Regression Sum of Squares (SSR) = Σ(ŷ_i - ȳ)^2
Error Sum of Squares (SSE) = Σ(y_i - ŷ_i)^2

And SST = SSR + SSE

Then R-squared = SSR/SST = 1 - SSE/SST

But some sources use different notations. For example, some might call SSE as the sum of squared residuals. So the confusion comes from whether SSR is the regression sum or the residual sum.

Since the topic here is "잔차 제곱합", which is the sum of squared residuals, that corresponds to SSE in the above notation. But if the user refers to it as SSR, then there's inconsistency. Therefore, in the document, I must clarify this discrepancy.

Alternatively, maybe in some sources, the terminology is different. For example, in Wikipedia, the "sum of squared residuals" can be called residual sum of squares (RSS) or sum of squared errors (SSE). The Ordinary Least Squares method minimizes RSS/SSE.

Therefore, in the document, I should define SSR (잔차 제곱합) as the sum of squared residuals, which is equivalent to SSE in some contexts. Also, explain that notation can vary, but the concept is the same.

This is crucial to prevent confusion. So in the mathematical expression section, clarify that the term may be called differently (RSS, SSE) in various references, but here it's referred to as SSR.

So the formula would be:

$$ \text{SSR} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$

And note that this is sometimes called residual sum of squares (RSS) or sum of squared errors (SSE) in other literature.

Then, in the relationship with R-squared, explain that R-squared = 1 - (SSR/SST), which aligns with SSR being the residual sum.

Yes, this makes sense. So the document should make it clear that terminology can vary, but the focus is on the sum of squared residuals as SSR here.

Including this clarification will make the document more accurate and helpful, preventing potential misunderstandings.

Now, putting it all together in Korean. Ensure that the explanations are clear, even for those new to the topic. Use examples and tables where necessary.

For the example, create a small dataset with observed y, predicted ŷ, residuals, squared residuals, then sum them up. A table would look like:

관측치	y_i	ŷ_i	잔차 (e_i = y_i - ŷ_i)	잔차 제곱 (e_i²)
1	2	2.5	-0.5	0.25
2	4	3.8	0.2	0.04
3	6	6.2	-0.2	0.04
4	8	7.5	0.5	0.25
합계				0.58

This shows that SSR is 0.58 in this case.

Also, mention that lower SSR indicates a better fit, but caution against overfitting.

In the limitations section, discuss how SSR can be misleading if the model is too complex, leading to very low SSR on training data but poor performance on new data. Therefore, other metrics like adjusted R-squared or cross-validation are needed.

Finally, in related concepts, link to MSE (Mean Squared Error), which is SSR divided by n-p (n observations, p parameters), and RMSE (root MSE).

Make sure all these points are covered in the document with proper structure and markdown formatting.

위키너와나

잔차 제곱합

📋 문서 버전

잔차 제곱합

개요

수학적 표현

공식

예시

회귀 분석에서의 중요성

모델 평가

매개변수 추정

다른 통계 지표와의 관계

결정 계수 $ R^2 $

평균 제곱 오차(MSE)

한계 및 주의사항

과적합(Overfitting)

다른 지표와의 비교

관련 개념 및 참고 자료

관련 개념

참고 자료

📝 마크다운 원본

🤔 AI의 사고 과정

이 AI 생성 콘텐츠가 도움이 되었나요?