Password Strengths Evaluation 1 (Entropy)
##1
密码应该说是最普遍的身份认证的方式,对其的攻击也有很多种,比如:暴力破解、基于字典、甚至可以通过用户信息猜测、撞库,近年来又出现了基于机器学习的方法猜测密码的黑科技。
所以就有一系列问题,哪种密码对上述攻击的抵抗性更高?如何评估密码的强度?如何引导用户选择高强度而且容易记忆的密码。
关于这个几个问题,最近也是在边学边做,所以顺便做一下笔记,这篇就谈一下目前使用最广泛的密码强度评估方法——Entropy。
##2
关于Entropy,Claude Shannon是这样解释的:
The entropy is a statistical parameter which measures in a certain sense, how much information is produced on the average for each letter of a text in the language. If the language is translated into binary digits (0 or 1) in the most efficient way, the entropy H is the average number of binary digits required per letter of the original language.
形式化为:
将自然语言中的entropy的概念迁移到密码中来,该值表征的就是密码的不确定程度,不确定程度越高,攻击者“猜”到密码的可能性越低。
首先假设密码是随机生成的,长度固定为l,可选字符集大小为b,那么此时
entropy的值是最高的,也就是说该密码库理论上是最安全的,但是此时也就没有可用性了,因为用户根本记不住随机密码(如果没有辅助手段)。
现实中的密码库远远不是随机的,Mark Burnett曾经搜集了6百万个密码,他发现其中有大量的重复密码,出现概率最高的1000个密码占总量的91%左右,而且很多密码符合一定的模式,比如qwerty、123456,或者再进行简单的l33t替换。
##3
对于现实中的密码不能仅仅用l和b来计算entropy(当然这两个值对entropy贡献最大),那么如何基于entropy计算未知密码的强度呢?
dropbox开源了一个基于entropy的密码强度评估的类库——zxcvbn,我们看一下她是如何实现的。
首先用一下:
➜ zxcvbn git:(master) ✗ node
> var zxcvbn = require('./dist/zxcvbn.js')
undefined
> zxcvbn('Yu@n1989shiChao')
{ password: 'Yu@n1989shiChao',
entropy: 41.911,
match_sequence:
[ { pattern: 'dictionary',
i: 0,
j: 3,
token: 'Yu@n',
matched_word: 'yuan',
rank: 1237,
dictionary_name: 'passwords',
reversed: false,
l33t: true,
sub: [Object],
sub_display: '@ -> a',
base_entropy: 10.27262978497637,
uppercase_entropy: 1,
reversed_entropy: 0,
l33t_entropy: 1,
entropy: 12.27262978497637 },
{ pattern: 'regex',
token: '1989',
i: 4,
j: 7,
regex_name: 'recent_year',
regex_match: [Object],
entropy: 4.321928094887363 },
{ pattern: 'regex',
token: 'shi',
i: 8,
j: 10,
regex_name: 'alpha_lower',
regex_match: [Object],
entropy: 14.101319154423278 },
{ pattern: 'dictionary',
i: 11,
j: 14,
token: 'Chao',
matched_word: 'chao',
rank: 1189,
dictionary_name: 'passwords',
reversed: false,
base_entropy: 10.215532999745657,
uppercase_entropy: 1,
reversed_entropy: 0,
l33t_entropy: 0,
entropy: 11.215532999745657 } ],
crack_time: 206805262.144,
crack_time_display: '6 years',
score: 4,
calc_time: 13 }
>
返回结果的字段的含义:
password 密码
entropy
match_sequence 密码匹配到的模式
crack_time 估算的破解时间,根据entropy计算
crack_time_display
score 根据crack_time映射的分值
calc_time 计算耗时
实际上zxcvbn累加了匹配到的模式的entropy值来评估密码强度,因为攻击者可以通过构建的字典和一些模式减小“猜”的范围。
这种从攻击者的角度评估密码强度的思路是很可取的。
抛开细枝末节看一下主流程:
# 对输入的密码进行模式匹配,也就是结果中的match_sequence
matches = matching.omnimatch password
# 计算各种entropy
result = scoring.minimum_entropy_match_sequence password, matches
zxcvbn考虑到的模式还是挺多的:
@dictionary_match #基于字典,也就是查看密码的子串是否出现在字典中,以及在字典中的位置
passwords: build_ranked_dict frequency_lists.passwords #常见的密码,概率降序
english: build_ranked_dict frequency_lists.english
surnames: build_ranked_dict frequency_lists.surnames
male_names: build_ranked_dict frequency_lists.male_names
female_names: build_ranked_dict frequency_lists.female_names
@reverse_dictionary_match #翻转密码字符串
@l33t_match #l33t逆回去之后再进行dictionary_match
@spatial_match #qwerty这类键盘连续的字符串
@repeat_match #aaa这类连续的串
@sequence_match #abcd这类连续的串
@regex_match #匹配连续的符号、数字、大写、小写 (3个以上)
@date_match #匹配日期
得出一个匹配列表,然后按照i和j排个序,在列表中,ij区间是有重叠的,所以首先算出一个不重叠的子列表,并且要求entropy最小,这个过程可以理解为找最容易攻击的一组模式,所以就可以用这个最小的entropy做为密码强度的评估值。
minimum_entropy_match_sequence: (password, matches) ->
# 计算暴力破解的基数,也就是计算entropy公式中的(b**l)
bruteforce_cardinality = @calc_bruteforce_cardinality password # e.g. 26 for lowercase
# 保存到位置k的最小entropy,就是个动态规划嘛
up_to_k = [] # minimum entropy up to k.
# for the optimal seq of matches up to k, backpointers holds the final match (match.j == k).
# null means the sequence ends w/ a brute-force character.
backpointers = []
for k in [0...password.length]
# starting scenario to try and beat:
# adding a brute-force character to the minimum entropy sequence at k-1.
up_to_k[k] = (up_to_k[k-1] or 0) + @lg bruteforce_cardinality
backpointers[k] = null
for match in matches when match.j == k
[i, j] = [match.i, match.j]
# see if best entropy up to i-1 + entropy of this match is less than current minimum at j.
# 在calc_entropy中,entropy的计算方法因“匹配类型”而异
candidate_entropy = (up_to_k[i-1] or 0) + @calc_entropy(match)
if candidate_entropy < up_to_k[j]
up_to_k[j] = candidate_entropy
backpointers[j] = match
# walk backwards and decode the best sequence
match_sequence = []
k = password.length - 1
while k >= 0
match = backpointers[k]
if match
match_sequence.push match
k = match.i - 1
else
k -= 1
match_sequence.reverse()
#match_sequence就是不重叠的匹配列表,但是中间可能有没有覆盖的字符串
# fill in the blanks between pattern matches with bruteforce "matches"
# that way the match sequence fully covers the password:
# match1.j == match2.i - 1 for every adjacent match1, match2.
make_bruteforce_match = (i, j) =>
pattern: 'bruteforce'
i: i
j: j
token: password[i..j]
entropy: @lg Math.pow(bruteforce_cardinality, j - i + 1)
cardinality: bruteforce_cardinality
# 使用bruteforce计算其中没有覆盖到的字符的entropy
k = 0
match_sequence_copy = []
for match in match_sequence
[i, j] = [match.i, match.j]
if i - k > 0
match_sequence_copy.push make_bruteforce_match(k, i - 1)
k = j + 1
match_sequence_copy.push match
if k < password.length
match_sequence_copy.push make_bruteforce_match(k, password.length - 1)
match_sequence = match_sequence_copy
min_entropy = up_to_k[password.length - 1] or 0 # or 0 corner case is for an empty password ''
crack_time = @entropy_to_crack_time min_entropy
# final result object
password: password
entropy: @round_to_x_digits(min_entropy, 3)
match_sequence: match_sequence
crack_time: @round_to_x_digits(crack_time, 3)
crack_time_display: @display_time crack_time
score: @crack_time_to_score crack_time
下面再看一下不同的匹配计算entropy的方式:
calc_entropy: (match) ->
return match.entropy if match.entropy? # a match's entropy doesn't change. cache it.
entropy_functions =
dictionary: @dictionary_entropy
spatial: @spatial_entropy
repeat: @repeat_entropy
sequence: @sequence_entropy
regex: @regex_entropy
date: @date_entropy
match.entropy = entropy_functions[match.pattern].call this, match
dictionary_entropy: (match) ->
## 依赖于字典中字段的排位
match.base_entropy = @lg match.rank # keep these as properties for display purposes
match.uppercase_entropy = @extra_uppercase_entropy match
match.reversed_entropy = match.reversed and 1 or 0
match.l33t_entropy = @extra_l33t_entropy(match)
match.base_entropy + match.uppercase_entropy + match.l33t_entropy + match.reversed_entropy
START_UPPER: /^[A-Z][^A-Z]+$/
END_UPPER: /^[^A-Z]+[A-Z]$/
ALL_UPPER: /^[^a-z]+$/
ALL_LOWER: /^[^A-Z]+$/
extra_uppercase_entropy: (match) ->
word = match.token
return 0 if word.match @ALL_LOWER
# a capitalized word is the most common capitalization scheme,
# so it only doubles the search space (uncapitalized + capitalized): 1 extra bit of entropy.
# allcaps and end-capitalized are common enough too, underestimate as 1 extra bit to be safe.
for regex in [@START_UPPER, @END_UPPER, @ALL_UPPER]
return 1 if word.match regex
# otherwise calculate the number of ways to capitalize U+L uppercase+lowercase letters
# with U uppercase letters or less. or, if there's more uppercase than lower (for eg. PASSwORD),
# the number of ways to lowercase U+L letters with L lowercase letters or less.
U = (chr for chr in word.split('') when chr.match /[A-Z]/).length
L = (chr for chr in word.split('') when chr.match /[a-z]/).length
possibilities = 0
possibilities += @nCk(U + L, i) for i in [1..Math.min(U, L)]
@lg possibilities
extra_l33t_entropy: (match) ->
return 0 if not match.l33t
extra_entropy = 0
for subbed, unsubbed of match.sub
# lower-case match.token before calculating: capitalization shouldn't affect l33t calc.
chrs = match.token.toLowerCase().split('')
S = (chr for chr in chrs when chr == subbed).length # num of subbed chars
U = (chr for chr in chrs when chr == unsubbed).length # num of unsubbed chars
if S == 0 or U == 0
# for this sub, password is either fully subbed (444) or fully unsubbed (aaa)
# treat that as doubling the space (attacker needs to try fully subbed chars in addition to
# unsubbed.)
extra_entropy += 1
else
# this case is similar to capitalization:
# with aa44a, U = 3, S = 2, attacker needs to try unsubbed + one sub + two subs
p = Math.min(U, S)
possibilities = 0
possibilities += @nCk(U + S, i) for i in [1..p]
extra_entropy += @lg possibilities
extra_entropy
其他匹配的计算方法就不罗列了,根据entropy计算score仅仅是数值上的变换,所以这里也略去了。
##4
我们再看一下计算字典匹配entropy的函数,其中计算base_entroy的公式可以再讨论一下。这种密码的列表很容易从Mark Burnett的网站中找到,甚至也可以自己统计现有泄露的密码库,但是转成这种列表之后实际上丢失了概率信息,仅仅保留了一个排名信息。
如果攻击者有几个完整的泄露密码库,他们还能从其中提取哪些有用的信息呢?我们如何利用这些信息评估未知密码的强度呢?
下一篇再谈。