Hypothesis Testing in Marriage Rate

We always hear the marriage rate is decreasing in recent years. Is it true?

Because I live in Taoyuan city, I’ll use the data from my hometown to verify whether it’s a true phenomenon or not here.

First, we need to get the data. Where is the data? We can go to the open data platform of the government to download it.

https://data.gov.tw/dataset/27396

After downloading it, let’s take a look at it first.

It looks not well for visualization and analysis.

Let clean it a little bit.

def read_all_data(file_name):
    data = pd.read_csv('Marriage/'+file_name,encoding='utf-8')
    data = data.melt(id_vars= ["月份區域別"], value_name="count").rename(columns={'variable':'month'})
    data['year'] = file_name.replace('.csv','')
    return data
Okay, it looks better.

Let’s put it into Tableau for simple visualization.

Alright, we can’t directly know whether 102 and 103 years are different from 108 and 109 years. Let’s do hypothesis testing.

We separate the data from 102-103 years as a group and 108-109 years as another group.

old_data = total_data[(total_data.year=='102') | (total_data.year=='103')]
new_data = total_data[(total_data.year=='108') | (total_data.year=='109')]

Use a Wilcoxon test to check whether 102-103 is same as 108-109 or not.

from scipy.stats import wilcoxon
old_102_103 = old_data['count'].values
new_108_109 = new_data['count'].values
stat, p = wilcoxon(data1, data2)
if p > 0.05:
	print('Same')
else:
	print('Different')

The p-value is 0.307 which is greater than 0.05, so reject the alternative hypothesis.

The script and data are in my github, feel free to take a look. Thanks!

https://github.com/ValiantChiu/OpenData/tree/master/OpenData

Reference: https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/