We always hear the marriage rate is decreasing in recent years. Is it true?
Because I live in Taoyuan city, I’ll use the data from my hometown to verify whether it’s a true phenomenon or not here.
First, we need to get the data. Where is the data? We can go to the open data platform of the government to download it.
After downloading it, let’s take a look at it first.
Let clean it a little bit.
def read_all_data(file_name):
data = pd.read_csv('Marriage/'+file_name,encoding='utf-8')
data = data.melt(id_vars= ["月份區域別"], value_name="count").rename(columns={'variable':'month'})
data['year'] = file_name.replace('.csv','')
return data
Let’s put it into Tableau for simple visualization.
We separate the data from 102-103 years as a group and 108-109 years as another group.
old_data = total_data[(total_data.year=='102') | (total_data.year=='103')]
new_data = total_data[(total_data.year=='108') | (total_data.year=='109')]
Use a Wilcoxon test to check whether 102-103 is same as 108-109 or not.
from scipy.stats import wilcoxon
old_102_103 = old_data['count'].values
new_108_109 = new_data['count'].values
stat, p = wilcoxon(data1, data2)
if p > 0.05:
print('Same')
else:
print('Different')
The p-value is 0.307 which is greater than 0.05, so reject the alternative hypothesis.
The script and data are in my github, feel free to take a look. Thanks!
https://github.com/ValiantChiu/OpenData/tree/master/OpenData
Reference: https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/