osu!mania前1000名数据分析 (2024)

这一年多涌现了很多新面孔,多了很多 pp 图,同时也有不少人退坑。因为前排活跃高手的排位发生了较大的变化,所以我觉得有必要重新爬一次数据。

收集数据

ppy 修改了一些东西,所以代码也改动了一点。同时,这次也爬取了很多新的数据,应该可以分析一些更有意思的东西。

前排提醒:本文仅仅抓取了排行榜前 1000 的玩家,不在排行榜内的退役玩家是抓取不到的。而且本文使用的区分 4k 和 7k 玩家的方法相当粗糙,极有可能不准确。分析结果仅供参考,图一乐就行。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
import json
import random
import time
import requests
import csv
import logging
import statistics
from bs4 import BeautifulSoup

logging.basicConfig(
level = logging.INFO,
format = '%(asctime)s %(levelname)s %(message)s',
datefmt = '%Y-%m-%dT%H:%M:%S')

def random_sleep():
seconds = random.randint(1, 3)
logging.info(f"Sleep for {seconds}s")
time.sleep(seconds)

mode_name = ["mania"] # add other modes here if necessary
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}
start_page = 2 # 1 page = 50 players on leaderboard
end_page = 20

fieldnames = ['uid', 'rank', 'rank_highest', 'rank_avg_90d', 'username', 'country_code', 'country_name', 'accuracy', 'play_count', 'main_keymode', 'performance', 'pp_4k', 'pp_7k', 'SS_count', 'SSH_count', 'S_count','SH_count', 'A_count', 'count_300', 'count_100', 'count_50','count_miss', 'is_supporter', 'has_supported', 'support_level', 'is_active', 'last_visit', 'name_change_count', 'post_count', 'comments_count', 'kudosu_available', 'kudosu_total', 'friend_count', 'follower_count', 'badges_count', 'level', 'level_progress', 'ranked_score', 'play_time', 'total_score', 'total_hits', 'maximum_combo', 'replays_count', 'user_achievements_count', 'avatar_url', 'cover_url', 'title', 'join_date', 'is_admin', 'is_bng', 'is_full_bn', 'is_gmt', 'is_limited_bn', 'is_moderator', 'is_nat', 'is_restricted', 'is_silenced']


with open('data.csv', 'w', newline='') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()

for mode in mode_name:
for pg in range(start_page, end_page + 1):
logging.info(f"Getting page {pg} for mode {mode}")

data_lst = []

url = f"https://osu.ppy.sh/rankings/{mode}/performance"
payload = {"page": f"{pg}"}
r = requests.get(url=url, params=payload, headers=headers)
if r.status_code != 200:
raise requests.exceptions.HTTPError("The response code is not 200. Something's wrong!")
webdata = r.text
soup = BeautifulSoup(webdata,"lxml")
uid_list = soup.find_all("a", class_ = "ranking-page-table__user-link-text js-usercard")
stat_list = soup.find_all("td", class_ = "ranking-page-table__column")
for idx in range(len(uid_list)):
uid = uid_list[idx]['data-user-id']

info_url = f"https://osu.ppy.sh/users/{uid}/mania"
info_resp = requests.get(url=info_url, headers=headers)
if info_resp.status_code != 200:
raise requests.exceptions.HTTPError("The response code is not 200. Something's wrong!")
info_text = info_resp.text
info_soup = BeautifulSoup(info_text,"lxml")
info_raw = info_soup.find("div", class_ = "js-react--profile-page osu-layout osu-layout--full")['data-initial-data']
info_data = json.loads(info_raw)

country_code = info_data["user"]["country"]["code"]
country_name = info_data["user"]["country"]["name"]

is_supporter = info_data["user"]["is_supporter"]
has_supported = info_data["user"]["has_supported"]
support_level = info_data["user"]["support_level"]

avatar_url = info_data["user"]["avatar_url"]
cover_url = info_data["user"]["cover_url"]

is_active = info_data["user"]["is_active"]
last_visit = info_data["user"]["last_visit"]

title = info_data["user"]["title"]

join_date = info_data["user"]["join_date"]

is_admin = info_data["user"]["is_admin"]
is_bng = info_data["user"]["is_bng"]
is_full_bn = info_data["user"]["is_full_bn"]
is_gmt = info_data["user"]["is_gmt"]
is_limited_bn = info_data["user"]["is_limited_bn"]
is_moderator = info_data["user"]["is_moderator"]
is_nat = info_data["user"]["is_nat"]
is_restricted = info_data["user"]["is_restricted"]
is_silenced = info_data["user"]["is_silenced"]


name_change_count = len(info_data["user"]["previous_usernames"])
post_count = info_data["user"]["post_count"]
comments_count = info_data["user"]["comments_count"]
kudosu_available = info_data["user"]["kudosu"]["available"]
kudosu_total = info_data["user"]["kudosu"]["total"]
friend_count = info_data["user"]["follower_count"]
follower_count = info_data["user"]["mapping_follower_count"]
badges_count = len(info_data["user"]["badges"])

# statistics
rank = info_data["user"]["statistics"]["global_rank"]
rank_highest = info_data["user"]["rank_highest"]["rank"]
rank_avg_90d = statistics.fmean(info_data["user"]["rank_history"]["data"])
username = info_data["user"]["username"]
accuracy = info_data["user"]["statistics"]["hit_accuracy"]
play_count = info_data["user"]["statistics"]["play_count"]
performance = info_data["user"]["statistics"]["pp"]
SS_count = info_data["user"]["statistics"]["grade_counts"]["ss"]
SSH_count = info_data["user"]["statistics"]["grade_counts"]["ssh"]
S_count = info_data["user"]["statistics"]["grade_counts"]["s"]
SH_count = info_data["user"]["statistics"]["grade_counts"]["sh"]
A_count = info_data["user"]["statistics"]["grade_counts"]["a"]

count_300 = info_data["user"]["statistics"]["count_300"]
count_100 = info_data["user"]["statistics"]["count_100"]
count_50 = info_data["user"]["statistics"]["count_50"]
count_miss = info_data["user"]["statistics"]["count_miss"]

level = info_data["user"]["statistics"]["level"]["current"]
level_progress = info_data["user"]["statistics"]["level"]["progress"]
ranked_score = info_data["user"]["statistics"]["ranked_score"]
play_time = info_data["user"]["statistics"]["play_time"]
total_score = info_data["user"]["statistics"]["total_score"]
total_hits = info_data["user"]["statistics"]["total_hits"]
maximum_combo = info_data["user"]["statistics"]["maximum_combo"]
replays_count = info_data["user"]["statistics"]["replays_watched_by_others"]
user_achievements_count = len(info_data["user"]["user_achievements"])

# mania only
try:
pp_4k = int(info_data["user"]["statistics"]["variants"][0]["pp"])
pp_7k = int(info_data["user"]["statistics"]["variants"][1]["pp"])

if pp_4k > pp_7k:
main_keymode = "4k"
else:
main_keymode = "7k"
except:
pp_4k = pp_7k = main_keymode = "N/A"


player_pkg = {}

for item in fieldnames:
player_pkg[item] = eval(item)

data_lst.append(player_pkg)

logging.info(f"#{rank} {username} {uid} done!")
random_sleep()

with open('data.csv', 'a', encoding="utf-8", newline='') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

for data in data_lst:
writer.writerow(data)

数据到手了,接下来用 pandas 和 matplotlib 来做一些可视化分析~

代码基本还是一样的,少数有改动,为了方便其他人复现,依然会全部提供

0x00 - 国家和地区

稍微看看高手们都来自哪里吧~

先导入一些必要的库并读取数据

1
2
3
4
5
6
7
8
9
10
import random
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import pandas as pd
import numpy as np
from cycler import cycler

df = pd.read_csv('data.csv')
df

首先看一下地域构成,这一环节依旧是把少于 20 人的国家地区全部归类到其它(Other)

1
2
3
df_draw = df.groupby('country_code').size().to_frame(name='count')
df_draw = df_draw.sort_values('count', ascending=False).reset_index()
df_draw
country_code count
KR 237
CN 116
US 87
JP 57
ID 45
PH 44
PE 24
TH 24
CA 23
CL 21
Other 322

饼图(pie chart)代码如下所示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# 来给你们诠释下什么叫面向 Stackoverflow 编程
# https://python.tutorialink.com/function-to-move-specific-row-to-top-or-bottom-of-pandas-dataframe/
def shift_row_to_bottom(df, index_to_shift):
"""Shift row, given by index_to_shift, to bottom of df."""
idx = df.index.tolist()
idx.pop(index_to_shift)
df = df.reindex(idx + [index_to_shift])
return df

# https://stackoverflow.com/questions/34035427/conditional-removal-of-labels-in-pie-chart
# autopct 可以接受字符串或者函数,其实这里可以直接用 lambda
def my_autopct(pct):
return ('%1.1f%%' % pct) if pct > 4 else ''

# https://stackoverflow.com/questions/69839373/group-small-values-in-a-pie-chart
# 少于20人的国家地区全部归类到其它
df_draw.loc[df_draw['count'] < 20, 'country_code'] = 'Other'
df_draw = df_draw.groupby('country_code')['count'].sum().reset_index()
df_draw = df_draw.sort_values('count', ascending=False, ignore_index=True) # pandas 1.0.0 之后可以 ignore_index,方便不少
df_draw = shift_row_to_bottom(df_draw, 1)

# https://stackoverflow.com/questions/65347771/how-can-i-create-more-colors-for-my-plot
# 默认调色板(Set1)难看,而且颜色数量少,这里用另一个内置的调色板(Set3)
# 更多调色板可以在这里看效果:https://matplotlib.org/2.0.2/examples/color/colormaps_reference.html
cm = plt.get_cmap('Set3')
matplotlib.rcParams["axes.prop_cycle"] = cycler(
color=[cm(v) for v in np.linspace(0, 1, len(df_draw))]
)

# 画图
plt.pie(df_draw['count'], labels=df_draw['country_code'], autopct=my_autopct, startangle=140)
plt.title("osu!mania top #1000 country code (2024)")
plt.show()

中日韩美加起来仍然占据半壁江山,紧随其后的是印尼,菲律宾,秘鲁,泰国。韩国高手数量缩水严重,Other 类别中的玩家数量显著增加。其它国家排名基本不变,但是阿根廷(AR)和英国(UK)被加拿大(CA)和秘鲁(PE)取而代之。老牌强国马来西亚(MY)也消失了。

0x01 - 键数之争

2024 年,7k 的 pp 上限仍旧比 4k 要高得多,但是前阵子也 rank 了很多 4k 高星 pp 图,所以这一块想必会发生较大的变化

main_keymode count_2024 count_2022
7k 694 631
4k 306 369

可以看到虽然 rank 了那么多 4k pp 图,结果 4k 人更少了(-63)。一哥说主要 7k 一个 pp 图顶几个 4k。

简单画个环图(donut chart)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
df_draw = df_draw.groupby('main_keymode')['count'].sum().reset_index()
df_draw = df_draw.sort_values('count', ascending=False, ignore_index=True)

cm = plt.get_cmap('Set3')

matplotlib.rcParams["axes.prop_cycle"] = cycler(
color=[cm(v) for v in np.linspace(0, 1, len(df_draw))]
)

explode = [0.05, 0.05]

plt.pie(df_draw['count'], labels=df_draw['main_keymode'], autopct='%1.1f%%', startangle=140, explode=explode, pctdistance=0.85)
plt.title("osu!mania top #1000 main keymode (2024)")

# draw circle
centre_circle = plt.Circle((0, 0), 0.70, fc='white')
fig = plt.gcf()

# Adding Circle in Pie chart
fig.gca().add_artist(centre_circle)
plt.show()

从我另一个项目 oscarcx123/osu-minimum-pp 可以看到三位数门槛走势图

现在三位数门槛已经来到了 12100pp,纯 4k 确实不太好刷,怪不得接下来又要削 7k 的 pp

0x02 - 赞助皮老板

让我看看有多少铁公鸡👀

每个玩家有两个 boolean 值,分别是has_supportedis_supporter,简单groupby一下就能算出“从未”、“曾经”、“现在”三种状态的人数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
df_draw = df.groupby(['has_supported', 'is_supporter']).size().to_frame(name='count')
status = ['never', 'was_supporter', 'is_supporter']
df_draw['status'] = status

cm = plt.get_cmap('Set3')
matplotlib.rcParams["axes.prop_cycle"] = cycler(
color=[cm(v) for v in np.linspace(0, 1, len(df_draw))]
)

explode = [0.1, 0, 0]

plt.pie(df_draw['count'], labels=df_draw['status'], autopct='%1.1f%%', startangle=140, explode=explode, pctdistance=0.85)
plt.title("osu!mania top #1000 supporter (2024)")

centre_circle = plt.Circle((0, 0), 0.70, fc='white')
fig = plt.gcf()
fig.gca().add_artist(centre_circle)

plt.show()

这么看来,绝大部分高手都买过至少一次 osu!supporter。等等,真的是这样吗?这次也同时抓取了support_level。这个数值范围是 0 - 3,对应的就是个人主页的那个 supporter 爱心,应该是只有氪金过的才会增加。

根据我的猜测,support_level的对应关系如下所示

1
2
3
4
0 = 没买过
1 = 买了 1 年以内
2 = 买了 1 - 5 年
3 = 买了 5 年以上

这就跑下代码看看

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
df_draw = df.groupby('support_level').size().to_frame(name='count')
df_draw = df_draw.sort_values('support_level', ascending=True).reset_index()
df_draw['support_level'] = 'L' + df_draw['support_level'].astype(str)
df_draw

cm = plt.get_cmap('Set3')

matplotlib.rcParams["axes.prop_cycle"] = cycler(
color=[cm(v) for v in np.linspace(0, 1, len(df_draw))]
)

plt.pie(df_draw['count'], labels=df_draw['support_level'], autopct='%1.1f%%', startangle=140, pctdistance=0.85)
plt.title("osu!mania top #1000 support level (2024)")

# draw circle
centre_circle = plt.Circle((0, 0), 0.70, fc='white')
fig = plt.gcf()

# Adding Circle in Pie chart
fig.gca().add_artist(centre_circle)
plt.show()

这么一看,好像很多人的 supporter 都是别人送的??

把从未成为 supporter 的人剔除之后再看看

1
2
3
4
5
df_draw = df[df['has_supported'] == 1]
df_draw = df_draw.groupby('support_level').size().to_frame(name='count')
df_draw = df_draw.sort_values('support_level', ascending=True).reset_index()
df_draw['support_level'] = 'L' + df_draw['support_level'].astype(str)
print(df_draw)
support_level count
L0 584
L1 119
L2 121
L3 30

如果没有理解错误的话,相当多的高手并没有自己买过 supporter,都是通过其它途径获取的。。。

依照惯例,来看看铁公鸡头目👀

这个地方代码优化了一下,因为之前使用了两个.loc,实际上只用一个就可以完成筛选

1
df.loc[(df['has_supported'] == False), ['rank', 'username', 'country_code']]
rank username country_code
60 LR2MAG KR
61 RaffCo ID
72 karcice KR
98 Dius KR
114 7keyEgoist JP

0x03 - 改名富豪

osu 跟别的游戏不太一样,没法随意免费改 id,修改次数越多就越贵,价格表如下所示。如果买了 supporter,那么第一次改名是免费的。

Changes Price
1 US$4
2 US$8
3 US$16
4 US$32
5 US$64
6+ US$100

那么来看看大家都改了几次 id 吧~

1
2
3
4
5
6
7
8
9
10
df_draw = df.groupby('name_change_count').size().to_frame(name='count').reset_index()
fig, ax = plt.subplots()
bars = ax.bar(df_draw['name_change_count'], df_draw['count'])

ax.bar_label(bars)
ax.set_title("osu!mania top #1000 player name change (2024)")
ax.set_xlabel('# Name Change')
ax.set_ylabel('Player Count')

plt.show()

好像并没有太大的变化~

name_chg_times 2022 2024 chg
0 419 400 -19
1 390 405 15
2 136 126 -10
3 42 46 4
4 12 20 8
5 0 2 2
6 1 1 0

看看改名狂魔都有谁👀

1
df.loc[(df['name_change_count'] == 5) | (df['name_change_count'] == 6), ['rank', 'username', 'country_code', 'name_change_count']]
rank username country_code name_change_count
18 ZoyFangirl KR 5
533 Lovelyn FI 6
972 [KC]CruB US 5

这下 ppy 躺着数钱了,看看高手们给他贡献了多少钱

1
2
3
4
df_draw = df.groupby('name_change_count').size().to_frame(name='count').reset_index()
df_draw['cost'] = [0, 4, 12, 28, 60, 124, 224]
df_draw['ppy_laugh'] = df_draw['cost'] * df_draw['count']
df_draw['ppy_laugh'].sum() # -> 6092

0x04 - 肝帝

键盘毁灭者

osu 个人主页有个总命中次数(Total Hits),也就是键盘敲击次数

1
2
df_draw = df.sort_values('total_hits', ascending=False, ignore_index=True)
df_draw.loc[:, ['rank', 'username', 'country_code', 'main_keymode', 'total_hits']].head(10)
rank username country_code main_keymode total_hits
7 bojii PH 7k 140716567
9 Stellium KR 7k 122258223
809 Min- KR 7k 120880661
346 JDS20 CO 7k 119917450
334 masaya NO 7k 118370621
322 X_Devil RU 7k 111932394
203 palmEuEi TH 7k 111731128
23 Arona PH 7k 110712990
61 [ M Y S T I C ] KR 7k 109373225
141 Mafuyu87Fanboy CN 7k 102640755

2022 年没有国人上榜,今年咱们的肝帝雪糕终于挤进全球前 10 了🎉

画个箱线图(boxplot)看看分布

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# https://stackoverflow.com/a/41997865
def box_plot(data, edge_color, fill_color):
bp = ax.boxplot(data, patch_artist=True, vert=False, widths=0.4)
for element in ['boxes', 'whiskers', 'fliers', 'means', 'medians', 'caps']:
plt.setp(bp[element], color=edge_color)
for patch in bp['boxes']:
patch.set(facecolor=fill_color)

return bp

# https://stackoverflow.com/a/41717533
plt.rcParams["figure.figsize"] = (6,2)
# plt.rcParams["figure.figsize"] = plt.rcParamsDefault["figure.figsize"] # reset back to default

df_draw = df.sort_values('total_hits', ascending=False, ignore_index=True)
fig, ax = plt.subplots()
box_plot(df_draw['total_hits'], 'blue', 'cyan')
plt.tick_params(left = False, labelleft = False) # remove y-axis tick and label
# ax.grid(True) # this will show both x and y grids
plt.gca().xaxis.grid(True) # show x-axis grid
ax.set_title("osu!mania top #1000 total hits")
ax.set_xlabel('Total Hits (millions)')
plt.ticklabel_format(style='sci', axis='x', scilimits=(6,6)) # 1e6 instead of the default 1e8
plt.show()

跟预期一样,极少数 outlier。那么再看看中位数。

1
df_draw['total_hits'].median() # -> 33234205.5

2022 年的中位数只有 2600 万,今年来到了 3300 万

接下来看看国内的击打次数前十(变动数据为手动输入)

1
2
df_draw = df.sort_values('total_hits', ascending=False, ignore_index=True)
df_draw.loc[(df_draw['country_code'].isin(['CN', 'MO', 'HK'])), ['rank', 'username', 'main_keymode', 'total_hits']].head(10)
rank username main_keymode total_hits chg
141 Mafuyu87Fanboy 7k 102640755 24963372
376 Carpihat 7k 89420880 26150589
119 ExNeko 7k 78289749 17419308
63 Stink God 7k 76219518 7320228
221 Mito Van 7k 68875972 17053893
324 [GB]King Fish 7k 68470057 16767038
196 Chenut BS 7k 65536382 13833363
347 idqoos123 7k 63099663 新上榜
45 HxcQ777 7k 61966696 新上榜
38 [Crz]Reimu 7k 58388849 11944925

可以看到榜上已经没有 4k 玩家了,全是 7k 大神

时间掌控者

除了键盘敲击次数,还有另一个指标,就是总游戏时长(Total Play Time),似乎是只计算打图的时间

1
2
3
4
df_draw = df.sort_values('play_time', ascending=False, ignore_index=True)
df_draw = df_draw.loc[:, ['rank', 'username', 'country_code', 'main_keymode', 'play_time']].head(10)
df_draw['play_time'] = df_draw['play_time'].apply(lambda x: str(round(x / 3600 / 24, 2)) + ' days')
df_draw
rank username country_code main_keymode play_time
203 palmEuEi TH 7k 97.75 days
141 Mafuyu87Fanboy CN 7k 96.2 days
7 bojii PH 7k 91.79 days
334 masaya NO 7k 88.69 days
606 -Willow- AU 7k 85.59 days
346 JDS20 CO 7k 84.67 days
322 X_Devil RU 7k 83.89 days
272 hisaella EE 7k 78.75 days
893 Axfaerie PH 4k 77.39 days
444 -Lalito898 PE 4k 74.55 days

雪糕在 2022 年就是第二,现在还是第二,哈哈

画图代码和上面的几乎一样,就是要记得先预处理下play_time,因为爬到的数据都是秒,转换成天数会更直观。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# https://stackoverflow.com/a/41997865
def box_plot(data, edge_color, fill_color):
bp = ax.boxplot(data, patch_artist=True, vert=False, widths=0.4)
for element in ['boxes', 'whiskers', 'fliers', 'means', 'medians', 'caps']:
plt.setp(bp[element], color=edge_color)
for patch in bp['boxes']:
patch.set(facecolor=fill_color)

return bp

# https://stackoverflow.com/a/41717533
plt.rcParams["figure.figsize"] = (6,2)
# plt.rcParams["figure.figsize"] = plt.rcParamsDefault["figure.figsize"] # reset back to default


df_draw = df.sort_values('play_time', ascending=False, ignore_index=True)
df_draw['play_time'] = df_draw['play_time'].apply(lambda x: round(x / 3600 / 24, 2))
fig, ax = plt.subplots()
box_plot(df_draw['play_time'], 'blue', 'cyan')
plt.tick_params(left = False, labelleft = False) # remove y-axis tick and label
# ax.grid(True) # this will show both x and y grids
plt.gca().xaxis.grid(True) # show x-axis grid
ax.set_title("osu!mania top #1000 total hits (2024)")
ax.set_xlabel('Total Hits (millions)')
plt.ticklabel_format(axis='x')
plt.show()

中位数 25.15 天,和 2022 年相比增加了 5 天

接下来看看国榜👀

1
2
3
df_draw = df.sort_values('play_time', ascending=False, ignore_index=True)
df_draw['play_time'] = df_draw['play_time'].apply(lambda x: round(x / 3600 / 24, 2))
df_draw.loc[(df_draw['country_code'].isin(['CN', 'MO', 'HK'])), ['rank', 'username', 'main_keymode', 'play_time']].head(10)
Rank Username Main Keymode Play Time
141 Mafuyu87Fanboy 7k 96.2
376 Carpihat 7k 72.05
119 ExNeko 7k 60.5
221 Mito Van 7k 57.66
63 Stink God 7k 56.9
324 [GB]King Fish 7k 46.66
345 fishbone2445 7k 45.94
896 Ranm 4k 45.9
224 hisa_knowledge 7k 45.62
763 Myukee 7k 42.22

两个 boxplot 的 lower fence 都接近 0(和 2022 年的数据一样),估计是从其它游戏过来的大佬,当然也可能是挂哥,让我看看是谁

1
2
3
4
df_draw = df.sort_values('total_hits', ascending=True, ignore_index=True)
df_draw = df_draw.loc[:, ['rank', 'username', 'country_code', 'main_keymode', 'play_count', 'total_hits', 'play_time']].head(10)
df_draw['play_time'] = df_draw['play_time'].apply(lambda x: str(round(x / 3600 / 24, 2)) + ' days')
df_draw
rank username country_code main_keymode play_count total_hits play_time
567 My Angel Rei SG 7k 226 563303 0.37 days
950 Twilightprncss EE 7k 321 1016530 0.69 days
935 Sujin97 KR 7k 475 1044196 0.64 days
259 efewfa KR 7k 351 1427974 0.8 days
364 InsCharteux ID 7k 976 1521600 1.11 days
437 141truth US 7k 596 1556900 0.83 days
457 UngDiKing KR 7k 766 1740292 0.98 days
348 -K i r e i- JP 7k 1150 1873711 1.36 days
500 LoveHanu KR 7k 711 1906283 1.16 days
138 Li Ji Xian KR 7k 1049 2017684 1.13 days

刷分高手

除了这几个指标,还可以看 osu 等级和 total_score

1
2
score(n) = 5,000 / 3 * (4n^3 - 3n^2 - n) + 1.25 * 1.8^(n - 60) if n <= 100
score(n) = 26,931,190,827 + 99,999,999,999 * (n - 100) if n > 100

根据官网 wiki,这俩实际上是同一个东西,而且可以看到,后期的等级非常难升,100 级之后每一级都要 1000 亿的分数。为了方便后续处理,直接看 total_score 了

1
2
3
4
df['level_real'] = round(df['level'] + df['level_progress'] / 100, 2)
df['total_score_m'] = df['total_score'].astype(str).str[:-6] + "M"
df_draw = df.sort_values('total_score', ascending=False, ignore_index=True)
df_draw.loc[:, ['rank', 'username', 'country_code', 'main_keymode', 'total_score', 'level_real']].head(10)
Rank Username Country Code Main Keymode Total Score (M) Real Level
661 Anemia US 7k 52084M 100.25
141 Mafuyu87Fanboy CN 7k 46424M 100.19
159 lxLucasxl AR 7k 45382M 100.18
203 palmEuEi TH 7k 43432M 100.16
606 -Willow- AU 7k 41302M 100.14
7 bojii PH 7k 41281M 100.14
334 masaya NO 7k 39518M 100.12
322 X_Devil RU 7k 39480M 100.12
660 araragigun KR 7k 38072M 100.11
588 109 JP 7k 37345M 100.10

依旧是我们的万年老二雪糕,今年再加把劲,把他们都踹下来!

中位数是 10723M,也就是 107 亿的总分,还好我 138 亿😋

下面看一下国榜👀

1
2
df_draw = df.sort_values('total_score', ascending=False, ignore_index=True)
df_draw.loc[(df_draw['country_code'] == 'CN') | (df_draw['country_code'] == 'MO') | (df_draw['country_code'] == 'HK'), ['rank', 'username', 'country_code', 'main_keymode', 'total_score_m', 'level_real']].head(10)
Rank Username Main Keymode Total Score (M) Real Level
141 Mafuyu87Fanboy 7k 46424M 100.19
376 Carpihat 7k 30691M 100.03
63 Stink God 7k 29267M 100.02
119 ExNeko 7k 27984M 100.01
347 idqoos123 7k 21411M 99.40
221 Mito Van 7k 21034M 99.36
345 fishbone2445 7k 20412M 99.29
896 Ranm 4k 19645M 99.21
339 Quotient 4k 18804M 99.11
196 Chenut BS 7k 17828M 99.01

说起来,99 级和 100 级之间也相差了 100 亿分数。。。

0x05 - 多面手

有一部分玩家是 4k 和 7k 都会玩的,那么我们来看看谁是 4k 最强的 7k 玩家

1
2
3
4
df_draw = df.copy()
df_draw = df_draw.loc[df_draw['main_keymode'] == '7k']
df_draw = df_draw.sort_values('pp_4k', ascending=False, ignore_index=True)
df_draw.loc[:, ['rank', 'username', 'country_code', 'pp_4k', 'pp_7k']].head(10)
rank username country_code pp_4k pp_7k
7 bojii PH 17349 25173
55 yeonho7028 KR 15593 20232
28 SillyFangirl BR 15525 22046
79 instal TH 15207 19439
211 gaesol KR 14989 15247
89 grillroasted CZ 14814 18700
9 Stellium KR 14602 24963
182 [LS]bambi fnf CL 14538 15420
49 NkeyZoyDkqehKal KR 14372 20881
5 Kalkai KR 14239 25761

7k 最强的 4k 玩家(代码一样就不贴了)

rank username country_code pp_4k pp_7k
183 Orost BR 15227 14694
254 xxxxxx2800 MY 14395 14358
267 jhleetgirl JP 14517 14094
236 Poca KR 15080 13708
219 imyeeyee KR 15571 13513
489 Minwoo3098 US 12707 11970
509 EstaticStatisIO ID 12653 11907
541 Focoo AR 12516 11616
498 pboo2424 TH 13028 11567
550 Rei_Insana300 EC 12421 11474

0x06 - 众目睽睽

终于到了雪糕最爱的环节,回放次数大比拼!

1
2
df_draw = df.sort_values('replays_count', ascending=False, ignore_index=True)
df_draw.loc[:, ['rank', 'username', 'country_code', 'main_keymode', 'replays_count']].head(10)
rank username country_code main_keymode replays_count
99 OutLast KR 7k 382049
246 Majesty KR 7k 293656
88 myucchii CL 4k 291103
7 bojii PH 7k 288776
299 gosy777 KR 7k 201678
753 Lothus BR 7k 185352
151 Estonians KR 7k 158569
28 SillyFangirl BR 7k 135702
19 ideu- KR 7k 131539
458 Cobo- KR 7k 109118

很多古神(inteliser,0133等)因为退坑的缘故,所以没法展示出来,比较遗憾。即便如此,榜上大佬们的含金量还是很高的。

画个箱线图看看分布(这里用ln处理了下,要不然这图全都挤到左边,完全没法看)

1
2
3
4
5
6
7
8
9
10
df_draw = df.sort_values('replays_count', ascending=False, ignore_index=True)
df_draw['replays_count'] = df_draw['replays_count'].apply(lambda x: np.log(x))
fig, ax = plt.subplots()
box_plot(df_draw['replays_count'], 'blue', 'cyan')
plt.tick_params(left = False, labelleft = False) # remove y-axis tick and label
plt.gca().xaxis.grid(True) # show x-axis grid
ax.set_title("osu!mania top #1000 replay count (2024)")
ax.set_xlabel('Replay Count (e^x)')
plt.show()
df_draw['replays_count'].median() # -> 4.7999057951649995,e^x = 121.5

中位数是 121.5,唉,我的回放数过了这么久一点都没涨,还是只有 14😭

看看国榜有没有雪糕👀

1
2
df_draw = df.sort_values('replays_count', ascending=False, ignore_index=True)
df_draw.loc[(df_draw['country_code'].isin(['CN', 'MO', 'HK'])), ['rank', 'username', 'main_keymode', 'replays_count']].head(10)
Rank Username Main Keymode Replays Count
486 DawnX 4k 46686
63 Stink God 7k 43308
66 LiangIaiajan 7k 14711
34 [Crz]Satori 7k 6785
172 Krn_ 7k 5283
339 Quotient 4k 4670
12 tyrcs 7k 4621
43 VanWilder 7k 3593
369 [Crz]Nickname 4k 3447
43 QingJiDing 7k 3145

哈哈,没有,雪糕才 2000 多回放,上不了榜🤡

0x07 - 有朋自远方来,来不动了

osu的好友是单向关注的(绿色),如果互关了(mutual)就会变成粉色。这里统计的是个人主页显示的好友数量,也就是有多少人关注了你。

1
2
df_draw = df.sort_values('friend_count', ascending=False, ignore_index=True)
df_draw.loc[:, ['rank', 'username', 'country_code', 'main_keymode', 'friend_count']].head(10)
rank username country_code main_keymode friend_count
28 SillyFangirl BR 7k 9589
88 myucchii CL 4k 5145
7 bojii PH 7k 4703
758 Andere CL 7k 4012
1 dressurf KR 7k 3166
479 Eliminate GB 4k 2565
41 Motion KR 7k 2505
317 arcwinolivirus PH 7k 2351
395 jkzu123 DE 4k 2262
474 CrewK JP 4k 2076

画个箱线图看看(这里同样对数据进行了处理,否则图像会全部集中到左边)

1
2
3
4
5
6
7
8
9
10
df_draw = df.sort_values('friend_count', ascending=False, ignore_index=True)
df_draw['friend_count'] = df_draw['friend_count'].apply(lambda x: np.log(x))
fig, ax = plt.subplots()
box_plot(df_draw['friend_count'], 'blue', 'cyan')
plt.tick_params(left = False, labelleft = False) # remove y-axis tick and label
plt.gca().xaxis.grid(True) # show x-axis grid
ax.set_title("osu!mania top #1000 friend count (2024)")
ax.set_xlabel('Friend Count (e^x)')
plt.show()
df_draw['friend_count'].median() # -> 4.770684624465665,e^x = 118

好友中位数 118,看来我拖后腿了

瞅瞅国榜👀

Rank Username Main Keymode Friend Count
12 tyrcs 7k 878
119 ExNeko 7k 834
369 [Crz]Nickname 4k 700
486 DawnX 4k 631
135 AWMRone 7k 630
66 LiangIaiajan 7k 573
579 lovely_hyahya 7k 572
470 [Crz]Rachel 7k 505
923 [Crz]Alleyne 4k 488
376 Carpihat 7k 487

没想到电子宠物大坏猫居然是交际花

0x08 - 自古以来

稍微看看现役的大佬们都是什么时候注册账号入坑的👀

1
2
3
4
5
6
7
8
9
10
11
12
13
df_draw = df.groupby('join_year').size().to_frame(name='count').reset_index()
df_draw = df_draw.sort_values('join_year', ascending=True)
df_draw['join_year'] = df_draw['join_year'].astype(str).str[2:]

fig, ax = plt.subplots()
bars = ax.bar(df_draw['join_year'], df_draw['count'])

ax.bar_label(bars)
ax.set_title("osu!mania top #1000 player join year (2024)")
ax.set_xlabel('Year')
ax.set_ylabel('Player Count')

plt.show()

有点好奇那 4 个 09 年注册,一直活跃到现在的大神是谁

Rank Username Country Code Main Keymode Performance
19 ideu- KR 7k 23290.3
43 VanWilder CN 7k 21367.5
396 turtlewing KR 7k 14670.3
451 inuyashasama KR 7k 14359.6

翻身!!!说起来翻身今年就要 35 岁了,依然在代表中国参加 MWC 7K 2024,宝刀未老!

0x09 - 警惕判比

说到判比,就想到 SS 的数量。因为这里只爬取了现役前 1000 名玩家的数据,所以藏比就抓不出来了。

雪糕天天在群里刷儿歌打判定,看看全球榜有没有雪糕👀

1
2
3
df['SS_total'] = df['SS_count'] + df['SSH_count']
df_draw = df.sort_values('SS_total', ascending=False, ignore_index=True)
df_draw.loc[:, ['rank', 'username', 'country_code', 'main_keymode', 'SS_total']].head(10)
rank username country_code main_keymode SS_total
99 OutLast KR 7k 6923
104 lnote_ KR 7k 6746
159 lxLucasxl AR 7k 5899
606 -Willow- AU 7k 4968
661 Anemia US 7k 4490
320 robby250 RO 7k 4109
7 bojii PH 7k 3714
593 Miku Meru BR 4k 2857
798 Exlude EE 7k 2798
299 gosy777 KR 7k 2789

哈哈,怎么回事呢,那个男人微笑去哪里了呢🤡

不会在国榜吧?

1
2
df_draw = df.sort_values('SS_total', ascending=False, ignore_index=True)
df_draw.loc[(df_draw['country_code'].isin(['CN', 'MO', 'HK'])), ['rank', 'username', 'main_keymode', 'SS_total']].head(10)
Rank Username Main Keymode SS Total
141 Mafuyu87Fanboy 7k 2699
63 Stink God 7k 2203
781 Mihyo_San 7k 1798
369 [Crz]Nickname 4k 922
339 Quotient 4k 916
119 ExNeko 7k 859
763 Myukee 7k 757
896 Ranm 4k 714
376 Carpihat 7k 625
535 lucky icons 7k 545

还真在国榜,领先臭神将近 500 个 SS 😨

那么 acc 最高的判比们又有哪些呢?

1
2
df_draw = df.sort_values('accuracy', ascending=False, ignore_index=True)
df_draw.loc[:, ['rank', 'username', 'country_code', 'main_keymode', 'accuracy']].head(10)
Rank Username Country Code Main Keymode Accuracy
641 [LS]Tenshi PH 7k 99.6653
976 diviza PE 7k 99.4974
765 Luna I guess US 4k 99.1412
434 lyvet PH 4k 99.0993
935 Sujin97 KR 7k 99.0888
737 Hualow ID 4k 98.9990
554 [GB]SuddenDeath KR 4k 98.9655
389 [Albert] ID 4k 98.8328
625 Fieri ID 4k 98.8294
407 Hello_Son US 4k 98.8161

接下来按照惯例,来看看国产大判比

1
2
df_draw = df.sort_values('accuracy', ascending=False, ignore_index=True)
df_draw.loc[(df_draw['country_code'].isin(['CN', 'MO', 'HK'])), ['rank', 'username', 'main_keymode', 'accuracy']].head(10)
rank username main_keymode accuracy
339 Quotient 4k 98.8076
414 [GB]nyasun 4k 98.7232
708 racksack 4k 98.5903
611 [Paw]Just_MLN 4k 98.5578
592 [GB]ParasolTree 4k 98.5175
877 StarTemplar 4k 98.4896
913 Squis1037 4k 98.4798
718 ATP Koshepen 4k 98.3490
815 [Crz]Caicium 4k 98.3145
424 neeko the rock 4k 98.2423

最后看看需要警惕哪些 7k 国产大判比

1
2
df_draw = df.sort_values('accuracy', ascending=False, ignore_index=True)
df_draw.loc[(df_draw['main_keymode'] == '7k') & (df_draw['pp_4k'] < 9000) & (df_draw['country_code'].isin(['CN', 'MO', 'HK'])), ['rank', 'username', 'accuracy']].head(20)
Rank Username Accuracy
34 [Crz]Satori 98.0678
43 QingJiDing 97.8572
43 VanWilder 97.8104
172 Krn_ 97.7941
202 U1d 97.7313
141 Mafuyu87Fanboy 97.6430
750 tangjinxi 97.6200
153 - Minato Aqua - 97.6176
22 af- 97.6103
66 LiangIaiajan 97.5192
119 ExNeko 97.4939
196 Chenut BS 97.3203
260 10086kfry 97.3195
763 Myukee 97.3040
781 Mihyo_San 97.2728
853 jhlee0I33 97.2546
347 idqoos123 97.2320
201 Mi-a 97.2275
171 Longe 97.2046
180 RiskyMonster272 97.1972

0x10 - 我是大漏勺

这次发现 ppy 还提供了count_miss,应该是mania生涯的 miss 数,让我看看谁是大漏勺

1
2
df_draw = df.sort_values('count_miss', ascending=False, ignore_index=True)
df_draw.loc[:, ['rank', 'username', 'country_code', 'main_keymode', 'count_miss']].head(10)
rank username country_code main_keymode count_miss
325 AdamYuan CN 7k 2749684
346 JDS20 CO 7k 2524503
809 Min- KR 7k 2449145
9 Stellium KR 7k 2400575
756 StevenS EC 7k 2341577
301 do you fart NZ 7k 2256312
376 Carpihat CN 7k 2253024
23 Arona PH 7k 2237571
344 Stoom DK 7k 2102566
133 invadey US 4k 2012134

但是这个排行算法有个问题,因为 miss 数量跟总击打数(total_hits)是正相关的,所以可能计算 miss 的比例会更合适

1
2
3
df['miss_ratio'] = round(df['count_miss'] / df['total_hits'], 4)
df_draw = df.sort_values('miss_ratio', ascending=False, ignore_index=True)
df_draw.loc[:, ['rank', 'username', 'country_code', 'main_keymode', 'miss_ratio']].head(10)
rank username country_code main_keymode miss_ratio
325 AdamYuan CN 7k 0.0824
357 Cattlea JP 7k 0.0545
231 SoftC418 CN 7k 0.0369
866 THIS A PERSON US 7k 0.0364
301 do you fart NZ 7k 0.0364
524 imstupidfor7k SG 7k 0.0344
258 Shepped CL 7k 0.0343
917 WoodKliz PA 7k 0.0337
495 aceqwer370 KR 7k 0.0333
636 DannyXLee CN 7k 0.0324

呃呃,咱们中国的 AdamYuan 一骑绝尘,化身金牌大漏勺,遥遥领先其它玩家

接下来按照惯例,来看看国产大漏勺

1
2
df_draw = df.sort_values('count_miss', ascending=False, ignore_index=True)
df_draw.loc[(df_draw['country_code'].isin(['CN', 'MO', 'HK'])), ['rank', 'username', 'main_keymode', 'count_miss']].head(10)
rank username main_keymode count_miss
325 AdamYuan 7k 2749684
376 Carpihat 7k 2253024
24 SoftC418 7k 1680528
58 Mafuyu87Fanboy 7k 1357278
62 Mito Van 7k 1310800
79 [GB]King Fish 7k 1242739
92 Yozomi 7k 1213444
113 shiyu1213 7k 1128582
118 [GB]Burger King 7k 1093815
133 kanasshi 7k 1060956
1
2
df_draw = df.sort_values('miss_ratio', ascending=False, ignore_index=True)
df_draw.loc[(df_draw['country_code'].isin(['CN', 'MO', 'HK'])), ['rank', 'username', 'main_keymode', 'miss_ratio']].head(10)
rank username main_keymode miss_ratio
325 AdamYuan 7k 0.0824
231 SoftC418 7k 0.0369
636 DannyXLee 7k 0.0324
934 kanasshi 7k 0.0316
746 Tat3 7k 0.0315
624 [Crz]Zetsfy 7k 0.0285
547 shiyu1213 7k 0.0277
585 ToukiM 7k 0.0273
626 Croatian songs 4k 0.0269
814 _Reimu 7k 0.0266

0x11 - 警惕连比

说完漏勺,自然也不能不说一下连比。MCNC 7K 2024 Semifinals,张帆对 af 的那局,翻身愣是把好几个吊图给连上了。只要我不掉,对面自己会掉🥵

不过需要注意的是,当前版本仍然是 ScoreV1,面条特别多的图(尤其较低的ln 段位或者放手)能够刷出极高的连击数

1
2
df_draw = df.sort_values('maximum_combo', ascending=False, ignore_index=True)
df_draw.loc[:, ['rank', 'username', 'country_code', 'main_keymode', 'maximum_combo']].head(10)
rank username country_code main_keymode maximum_combo
893 Axfaerie PH 4k 55637
425 Plana_ PH 4k 55554
858 BossPlays AR 4k 55538
538 ERA Dev US 4k 55509
208 Plutes MX 7k 55506
846 Loslic KR 4k 55502
856 nayeonie bunny BR 4k 55501
39 lupesco MX 7k 55495
375 SnowScent KR 7k 55486
88 myucchii CL 4k 55457

50000 combo,基本可以确定是这张图:Between the Buried and Me - The Parallax II: Future Sequence。这张 loved 图长达 1 小时 12 分钟,只能说都是狠人。。。

接下来看看需要警惕哪些国产连比

1
2
df_draw = df.sort_values('maximum_combo', ascending=False, ignore_index=True)
df_draw.loc[(df_draw['country_code'].isin(['CN', 'MO', 'HK'])), ['rank', 'username', 'main_keymode', 'maximum_combo']].head(10)
Rank Username Main Keymode Maximum Combo
369 [Crz]Nickname 4k 36319
376 Carpihat 7k 30957
93 [Paw]FIood 7k 27078
592 [GB]ParasolTree 4k 23349
224 hisa_knowledge 7k 22391
66 LiangIaiajan 7k 20552
339 Quotient 4k 19014
611 [Paw]Just_MLN 4k 18452
909 [Crz]Xinyi2016 4k 17801
424 neeko the rock 4k 17635

众所周知 7k 更容易漏勺,所以看看需要警惕哪些国产 7k 大连比

1
2
3
# 这里特判喇叭(SilentParleHorn)是连比因为他真的是连比
df_draw = df.sort_values('maximum_combo', ascending=False, ignore_index=True)
df_draw.loc[(df_draw['main_keymode'] == '7k') & (df_draw['pp_4k'] < 9000) & (df_draw['country_code'].isin(['CN', 'MO', 'HK'])) | (df_draw['username'] == 'SilentParleHorn'), ['rank', 'username', 'maximum_combo']].head(30)
rank username maximum_combo
66 LiangIaiajan 20552
345 fishbone2445 16334
34 [Crz]Satori 16308
141 Mafuyu87Fanboy 15990
171 Longe 14455
196 Chenut BS 14249
172 Krn_ 13754
201 SilentParleHorn 13640
744 O2jam Ultima 13556
347 idqoos123 13290
43 VanWilder 12920
221 Mito Van 12773
153 - Minato Aqua - 11891
481 quailty 10991
135 _Yiiiii 10742
275 pipisugar 9915
490 ICDYO 9650
763 Myukee 9195
142 Watch01 9103
717 Yozomi 8636
22 af- 8566
63 Stink God 8414
119 ExNeko 8369
260 10086kfry 8081
989 jackyuanchen 8066
202 U1d 7979
43 QingJiDing 7859
274 KafuuChino 7735
470 [Crz]Rachel 7723
198 [GB]hej_067 7635

为什么连比名单这么长呢?因为打比赛要警惕连比!!

0x12 - 后记

其实也没啥好写的了,如果有人需要数据集可以问我要。分析仅供参考,如有疑问,那就有疑问吧,本来也就随手写着玩。