"벤포드의 법칙"의 두 판 사이의 차이

2021년 2월 12일 (금) 09:40 기준 최신판

개요

수로 구성된 많은 데이터에서, 첫째 자리에 오는 숫자가 고르게 분포되어 있지 않은 현상
다음과 같은 가정들로 설명됨.
- 단위불변성(scale invariance)
- base-invariance

발견

미국의 수학자이자 천문학자인 사이먼 뉴컴(Simon Newcomb)은, 다른 사람과 함께 쓰던 로그책에서 책의 앞부분이 훨씬 낡아 있는 것을 발견
로그표는 수가 커지는 순서대로 배열되어 있다. 그러므로 위 결과는, 실제 계산에서는 맨 앞자리수가 큰 숫자보다, 맨 앞자리수가 작은 수가 더 많이 쓰인다는 사실을 말해 준다.
통상의 계산에서, 계산량이 많아지면 모든 크기의 수가 고르게 사용될텐데, 왜 이 수들의 최대 유효숫자는 이렇지 않을까?
뉴컴은 다음과 같은 경험법칙을 얻는다.
- 첫 유효숫자 \(d\) 로 시작하는 수의 비율은, (10진법에서) 1/9 가 아니라 \(\log(1 + 1/d)\) 와 같이 나타난다
이 사실을 그는 American Journal of Mathematics 에 간략하게 실었으나, 수학적 분석이 없었으므로 별 주목을 받지 못했음. (1881)

\(d\)	직관적 확률	경험적 확률
\(1\)	\(0.111\cdots\)	\(0.30103\)
\(2\)	\(0.111\cdots\)	\(0.17609\)
\(3\)	\(0.111\cdots\)	\(0.12494\)
\(4\)	\(0.111\cdots\)	\(0.09691\)
\(5\)	\(0.111\cdots\)	\(0.07918\)
\(6\)	\(0.111\cdots\)	\(0.06695\)
\(7\)	\(0.111\cdots\)	\(0.05799\)
\(8\)	\(0.111\cdots\)	\(0.05115\)
\(9\)	\(0.111\cdots\)	\(0.04578\)

[2]

1938 년 미국 GE 의 물리학자 프랭크 벤포드(Frank Benford)가, 위의 뉴컴(Newcomb이 발견한 것과 정확히 같은 양상 - 즉 곧 첫 유효숫자의 분포는 \(\log(1 + 1/d)\) 와 같이 나타난다 - 을 재발견했다.

벤포드는 경험적 검증을 위해, 강의 넓이, 사망률, 야구 통계 등 전혀 무관한 임의의 20000 여개의 숫자들를 분석했다. 결과는 경험 법칙을 지지하는 방향으로 나타났다. (출처 필요)

[3]

많은 숫자의 나열이 벤포드 법칙을 따르지는 않는다. 극도로 임의적이거나, 정규분포나 균일 분포를 따르는 숫자의 나열이 그러하다.

자료가 벤포드 법칙을 따르려면 꼭 들어맞는 구조를 갖추어야 할 것으로 보인다.

어떤 분포를 임의로 골라서, 이 분포들에서 임의로 자료를 모으면, 각 분포들 자체는 그렇지 않더라도, 이렇게 결합된 자료는 벤포드 법칙을 따른다는 것을 1996년 힐이 보였다. (출처 필요)

[4]

단위 불변성은 벤포드 법칙을 함축한다.

단위불변성은, 임의의 환산 인자 \(k\) 를 곱했을 때도 자료에 남아 있는 성질이다.

관찰

고르게 선택된 숫자들에 2를 곱한 자료를 생각해 보자.

[1, 1.5)	2
[1.5, 2)	3
[2, 2.5)	4
[2.5, 3)	5
[3, 3.5)	6
[3.5, 4)	7
[4, 4.5)	8
[4.5, 5)	9
[5, 10)	1

위를 보면 알 수 있듯이, 첫 유효숫자의 분포는 고르지 않다. 첫자리에 1 이 올 확률은 나머지 2~9 가 올 확률의 합과 같다.

여기에서, 고르게 선택된 숫자들의 분포는 단위불변성을 갖지 않는다는 사실을 알 수 있다.

단위불변성(scale invariance)

측정의 단위가 변해도, 분포가 변하지 않는 것을 일컫는다.
확률변수 \(X\) 에 어떤 환산값을 곱해서 단위를 바꾸어도, 어느 구간에 있을 확률이 변하지 않는다면 단위불변성(Scale invariance)을 가진다고 하자.

정리

단위불변성을 가진 변수의 로그는 상수의 확률밀도함수를 가진다.

(증명)

확률변수 \(X\)에 대한 확률밀도함수 \(\phi (x) \) 를, \(P(a \le X \le b) = \int_{a}^{b}\phi(x)dx\) 와 같이 정의하고, 누적밀도함수 \(\Phi(x)\) 를 \(\Phi(x) = P(X \le x) = \int^{x}\phi(t)dt\) 와 같이 정의하자.

그렇다면 확률변수 \(X\) 는 \(P( a < X < x) = P(ka < X < kx)\) 와 같은 성질을 만족한다. 여기서 \(a\) 는 고정된 상수이고, \(x\) 는 변수, \(k\) 는 환산인자이다.

그러므로, 우리는 \(\Phi(kx) - \Phi(ka) = \Phi(x) - \Phi(a)\) 를 얻고, 미분하면 \(k\phi(kx) = \phi(x)\) 를 얻는다.

확률변수 \(X\) 에 대해, 다시 확률변수 \(Y = \log_{b}X\) 를 정의하자. 그리고 \(Y\) 에 대해, \(\psi (y)\) 와 \(\Psi(y)\) 를 위의 확률변수 \(X\) 에서와 같이 정의하자.

그러면 \(\Psi(y) = P(Y \le y) = P(\log_b X \le y) = P(X \le b^y ) = \Phi(b^y) = \Phi(x)\) 이므로, \(\Psi(y) = \Phi(x)\) 이다. 여기에서

\(\psi(y) = \frac{d}{dy}\Phi(x) = \frac{dx}{dy}\phi(x)\) 를 얻고, 따라서 조금 정리하면 \(\psi( \log_b x) = x\phi(x) \ln{b}\) 를 얻을 수 있다

\(k\phi(kx) = \phi(x)\) 에서, \(x \phi(x) = \phi(1)\) 이므로, 결국 \(\psi\) 는 상수함수임을 알 수 있다. ■

단위불변성을 통한 벤포드 법칙의 유도

여기에서 벤포드 법칙을 이끌어낼 수 있다.

수 \(n\) 의 맨 왼쪽 수를 \(d\) 라 하자. 환산 인자를 \(n\) 에 곱할 때마다 첫 유효 숫자는 법 10 에서 변한다.

로그의 밑 \(b\) 를 \(b = 10\) 으로 삼으면, \(\log_{10}x\) 는 \(0 \le x \le 1\) 에서, 상수 1 의 확률밀도함수를 가질 것이다.

그러므로 단위불변성을 가정하면, \(n = 1, 2, \cdots, 9\) 에 대해

\[ \begin{aligned} P(d = n) & = P(n \le x < n+1 ) \\ & = P(\log_{10} n \le \log_{10}x < \log_{10}(n+1)\ ) \\ & = P(\log_{10}n \le y < \log_{10}(n+1) ) \\ & = \log_{10}(n+1) - \log_{10}{n} \\ & = \log_{10}(1 + \frac{1}{n}) \end{aligned} \] 를 얻고, 이것은 벤포드 법칙과 같다.

여기에서, 단위불변성을 가지는 자료는 벤포드 법칙을 만족함을 알 수 있다.

더 일반적인 significant digit law

동일한 접근 방식으로 둘째 유효숫자에 대한 분석을 할 수도 있다. 둘째 유효숫자에서 가장 많이 나타나는 수는 0 인데, 빈도는 11.97% 정도 된다.
- 참고할만한 자료의 Base-invariance implies Benford's law 참조.

벤포드 법칙의 응용

여러 회계 자료들도 벤포드 법칙을 잘 따른다. 이것을 이용하여, 벤포드 법칙을 통해 숫자들의 패턴을 분석해서, 숫자 조작, 사기, 오류, 자료에 내재된 편견 등을 검증할수는 방법도 제안되었다.
- Nigrini의 논문 참조.
알파 붕괴의 반감기는 이론과 관측에서 벤포드 법칙을 따른다는 것이 확인되었다.

거듭제곱과 벤포드 법칙

2부터, 2의 제곱, 2의 세제곱, …, 2의 100000제곱까지, 100000의 숫자에 대해 따져보면, 첫째 자리수가 {1,2,3,4,5,6,7,8,9} 인 숫자는 각각 {30103, 17610, 12493, 9691, 7919, 6695, 5797, 5116, 4576} (확인필요) 개씩 있다.
2의 거듭제곱뿐 아니라, 대부분의 경우에도 성립.(10의 거듭제곱과 같은 경우는 제외)
여기서 사용하는 \(\log\) 는 상용로그임
가령 여섯자리수인 2의 거듭제곱의 첫째자리가 1인 경우는 자연수n에 대한 다음 부등식을 풀면 얻어진다\[100000\leq 2^n < 200000\]\[\log 100000 \leq n \log 2 < \log 2 + \log 100000\]\[\frac{5}{\log 2} \leq n < \frac{\log 2}{\log 2} + \frac{5}{\log 2} \]
마찬가지 방법으로 여섯자리수인 2의 거듭제곱의 첫째자리수가 p인 경우는 다음 부등식을 풀면 얻을 수 있다.\[\frac{\log p}{\log 2}+\frac{5}{\log 2} \leq n < \frac{\log (p+1)}{\log 2} + \frac{5}{\log 2} \]
여섯자리수인 2의 거듭제곱의 첫째자리수가 p의 경우는 길이가 \(\frac{\log(p+1)-\log p}{\log 2}=\frac{\log (\frac {p+1}{p})}{\log2}\) 인 구간에 있는 자연수의 개수라고 생각할 수 있음.
따라서 여섯자리수인 2의 거듭제곱 중에서 첫째자리수가 p의 비율은 다음과 같음

\[\log (\frac {p+1}{p})\]

여섯자리수뿐 아니라 더 일반적인 경우에도 첫째자리가 p의 경우에도 그 비율은 위와 똑같다는 것을 알 수 있음.
따라서 2의 거듭제곱의 첫째자리수는 벤포드의 법칙을 따르게 됨.
2의 거듭제곱뿐 아니라 일반적인 수 \(\alpha\) 의 거듭제곱 \(\log \alpha\) 가 무리수이면 벤포드의 법칙을 따르게 됨.

피보나치 수열과 벤포드의 법칙

피보나치 수열도 벤포드 법칙을 따름
일반항은 다음과 같이 주어짐 (피보나치 수열의 여러가지 성질 참조) \[F(n) = {{\varphi^n-(1-\varphi)^n} \over {\sqrt 5}}\]\[\varphi=\frac{1+\sqrt5}{2}=1.61803398874989\cdots\]
따라서 n번째 피보나치 수열은 근사적으로 \({{\varphi^n} \over {\sqrt 5}}\)와 같으므로, 그 첫째자리의 분포 역시 등비수열과 마찬가지로 설명됨.
- http://www.mcs.surrey.ac.uk/Personal/R.Knott/Fibonacci/fibmaths.html#msds
- http://www.mcs.surrey.ac.uk/Personal/R.Knott/Fibonacci/fibCalcX.htmlInitial digit frequencies of fib(i) for i from 1 to 100000: Digit: 1 2 3 4 5 6 7 8 9 Frequency: 30103 17610 12494 9690 7918 6695 5798 5117 4575 100000 values Percent: 30 18 12 10 8 7 6 5 5

Benford's Law for Fibonacci and Lucas Numbers
- L. C. Washington
- The Fibonacci Quarterly vol. 19, 1981, pages 175-177

재미있는 사실

미드 numb3rs의 두번째 시즌 15번째 에피소드에 등장
Mark J. Nigrini 박사의 수학적으로 면밀하진 않지만, 좀 더 쉽게 이해되는 설명 주식시장을 생각해 봅시다. 우리가 1,000로 다우존스 평균을 생각하는 경우에, 우리의 첫번째 자릿수는 1입니다. 첫 번째 자리수가 2가 되려면 평균은 2천이 되야하고 100%가 증가해야합니다.일년에 20%씩 증가한다고 해도 5년이 소요됩니다. 그런데 만약 첫번째 자리가 5라면, 20% 증가라면 6이 되는데 단지 1년이 필요하죠. 9000천 이라면 11% 만으로 다시 첫번째 자리가 1이 됩니다. 다시 10000에서 20000이 되는데는 5년이 필요합니다. 고로 1이 주로 나타나게 됩니다.

역사

1881년 사이먼뉴컴
1938년 프랭크 벤포드
수학사 연표

메모

Kronecker theorem on ergodicity

매스매티카 파일 및 계산 리소스

리뷰논문, 에세이, 강의노트

BENFORD’S LAW FROM 1881 TO 2006: A BIBLIOGRAPHY

블로그

Benford’s law, Zipf’s law, and the Pareto distribution
- 터렌스 타오, 2009-7-3

노트

말뭉치

Benford’s law (also called the first digit law) states that the leading digits in a collection of data sets are probably going to be small.^[1]
An extension of Benford's law predicts the distribution of first digits in other bases besides decimal; in fact, any base b ≥ 2.^[2]
Black dots indicate the distribution predicted by Benford's law.^[2]
As a rule of thumb, the more orders of magnitude that the data evenly covers, the more accurately Benford's law applies.^[2]
For instance, one can expect that Benford's law would apply to a list of numbers representing the populations of UK settlements.^[2]
Benford's law states that in listings, tables of statistics, etc., the digit 1 tends to occur with probability , much greater than the expected 11.1% (i.e., one digit out of 9).^[3]
Benford's law can be observed, for instance, by examining tables of logarithms and noting that the first pages are much more worn and smudged than later pages (Newcomb 1881).^[3]
While Benford's law unquestionably applies to many situations in the real world, a satisfactory explanation has been given only recently through the work of Hill (1998).^[3]
Benford's law was used by the character Charlie Eppes as an analogy to help solve a series of high burglaries in the Season 2 "The Running Man" episode (2006) of the television crime drama NUMB3RS.^[3]
Benford's law is an observation about the leading digits of the numbers found in real-world data sets.^[4]
The "BL prediction" column is the percentage that Benford's law predicts for each digit.^[4]
So Benford's law appears to predict the data in both examples quite well.^[4]
Benford's law of anomalous numbers states that generally, in naturally occurring collections of numbers, the leading digit is likely to be small.^[5]
Forensic accountants, fraud examiners, accountants, and auditors use Benford's law to detect anomalies that require investigation.^[5]
Benford's law can help avoid the effort of baseline-derived anomaly detection.^[5]
If the network traffic conforms to the assumptions of Benford's law, any traffic data deviating from the Benford curve can be considered an anomaly.^[5]
The contributors describe how Benford's law has been successfully used to expose fraud in elections, medical tests, tax filings, and financial reports.^[6]
Significance The detection of frauds is one of the most prominent applications of the Newcomb–Benford law for significant digits.^[7]
We develop statistical tools for the detection of frauds in customs declarations that rely on the Newcomb–Benford law for significant digits.^[7]
We also provide approximations to the distribution of test statistics when the Newcomb–Benford law does not hold.^[7]
In this work we consider fraud detection through the Newcomb–Benford law (NBL).^[7]
Your view now shows the distribution of first digits, and the size of the bars (decreasing from left to right) suggests that the data in this case conforms to Benford's law.^[8]
Benford's law had been proposed in the past as a way to modelize the probability distribution of the first digit in a set of natural numbers.^[9]
experimental and predicted, to assess their following of Benford's law as seen in many natural phenomena.^[10]
Benford's law states that the leading digits of many data sets are not uniformly distributed from one through nine, but rather exhibit a profound bias.^[11]
In this video Norman Wildberger describes Benford's Law: a curious observation about the unequal distribution of first digits in random numbers.^[12]

소스

메타데이터

위키데이터

ID : Q817168

Spacy 패턴 목록

[{'LOWER': 'benford'}, {'LOWER': "'s"}, {'LEMMA': 'law'}]
[{'LOWER': 'newcomb'}, {'OP': '*'}, {'LOWER': 'benford'}, {'LOWER': "'s"}, {'LEMMA': 'law'}]
[{'LOWER': 'newcomb'}, {'OP': '*'}, {'LOWER': 'benford'}, {'LEMMA': 'law'}]
[{'LOWER': 'law'}, {'LOWER': 'of'}, {'LOWER': 'anomalous'}, {'LEMMA': 'number'}]
[{'LOWER': 'first'}, {'OP': '*'}, {'LOWER': 'digit'}, {'LEMMA': 'law'}]

[ref_4c10b410-1] Benford’s Law (The First Digit Law): Simple Definition, Examples

[ref_382d70e4-2] 2.0 ^2.1 ^2.2 ^2.3 Benford's law

[ref_7a284011-3] 3.0 ^3.1 ^3.2 ^3.3 Benford's Law -- from Wolfram MathWorld

[ref_e6afdb81-4] 4.0 ^4.1 ^4.2 Brilliant Math & Science Wiki

[ref_52edad7a-5] 5.0 ^5.1 ^5.2 ^5.3 Benford's Law: Potential Applications for Insider Threat Detection

[ref_3429ddfb-6] Benford's Law: Theory and Applications on JSTOR

[ref_64062c74-7] 7.0 ^7.1 ^7.2 ^7.3 Newcomb–Benford law and the detection of frauds in international trade

[ref_ffabecfc-8] Visualise Benford's Law

[ref_589c06e5-9] Images and Benford's Law

[ref_5f8f52a7-10] Benford's law in medicinal chemistry: Implications for drug design

[ref_069ba123-11] Benford's Law

[ref_179e17dc-12] Benford’s law

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

@@ 182번째 줄: / 182번째 줄: @@
 그러므로 단위불변성을 가정하면, <math>n = 1, 2, \cdots, 9</math> 에 대해
-:<math>\begin{tabular}{ll} <math> P(d = n) </math>&<math> = P(n \le x < n+1 )</math> \\  & <math>= P(\log_{10} n \le \log_{10}x  < \log_{10}(n+1)\ )</math>\\  & <math>=P(\log_{10}n \le y < \log_{10}(n+1) )</math> \\  & =\log_{10}(n+1) - \log_{10}{n} = \log_{10}(1 + \frac{1}{n}) \end{tabular}</math>
+:<math>
+\begin{aligned}
+P(d = n) & = P(n \le x < n+1 ) \\
+& = P(\log_{10} n \le \log_{10}x  < \log_{10}(n+1)\ ) \\
+& = P(\log_{10}n \le y < \log_{10}(n+1) ) \\
+& = \log_{10}(n+1) - \log_{10}{n} \\
+& = \log_{10}(1 + \frac{1}{n})
+\end{aligned}
+</math>
 를 얻고, 이것은 벤포드 법칙과 같다.
@@ 189번째 줄: / 198번째 줄: @@
 여기에서, 단위불변성을 가지는 자료는 벤포드 법칙을 만족함을 알 수 있다.
 ==더 일반적인 significant digit law==
@@ 392번째 줄: / 397번째 줄: @@
 ** 터렌스 타오, 2009-7-3
 [[분류:교양수학]]
-== 메타데이터 ==
-===위키데이터===
-* ID :  [https://www.wikidata.org/wiki/Q817168 Q817168]
 == 노트 ==