关于(我所理解的)加密学入门的笔记

复习co487的时候想着也写个自个用笔记.没有记载具体,草草概括一下.

Aside:使用了mathjax,可以更好得写数学公式,关于mathjax的tips.
感谢yihui大佬提供的hexo与mathjax不兼容的解决方式.

加密,是不同的算法.将想要的信息按照算法和自己的密匙加密,再也通过算法和密匙解密.
破解,是在已知算法的前提下,试图找算法的破绽,也用大量的尝试不同密匙来暴力破解.

不过加密方式是实时更新的.这门课我上了两次,隔了半年教材也少许有更改.
所以如果阅读这篇文章的时候距离创造时间够久了,实用意义就会减少.比如如果量子计算机使用起来,现在的加密方法肯定也得重新考虑安全性了.

Symmetric-key encryption scheme (SKES)

流密码上次有简单记述.
借着上次的三种传统加密方法以外,这里是一些经常在stream cipher中出现的辅助定义.

A pseudorandom bit generator (PRBG)

Definition
A pseudorandom bit generator (PRBG) is a deterministic algorithm that takes as input a (random) seed, and outputs a longer “pseudorandom” sequence called the keystream.

用于one-time pad.使用一次性密匙的流密码.

Indistinguishability: The keystream should be “indistinguishable” from a random sequence.
Unpredictability: If an adversary knows a portion c1 of ciphertext and the corresponding plaintext m1, then she can easily find the corresponding portion $k1 = c1 ⊕ m1$ of the keystream. Thus, given portions of the keystream, it should be infeasible to learn any information about the rest of the keystream.

The keystream generator is the critical security component of stream cipher. View generator as a finite state machine:

State: storage of internal values
State update function: changes stored internal value at each time interval
Output function: keystream output is function of current stored value

Linear feedback shift registers (LFSR)

可以用来generate key的方式.使用initial vector,来生成更长的key.

One application: A5 cipher.

Definition:
Any infinite periodic sequence (st) can be defined via a recurrence relation: ${ s_{t+n} = c_{n−1}s_{t+n−1} ⊕ c_{n−2}s_{t+n−2} . . . c_1s_{t+1} ⊕ c_0s_t }$ where ci are binary constants.
The vector ( ${ s_0, s_1, . . . , s_{n−1} }$ ) is called the initial state vector.

Definition:
The period of a binary sequences(t) is said to be p if $S_{t+p} = S_{n}$ for all t, and p is the smallest such number.

对于LFSRs我们可以attack on small period.

If the period of the keystream is less than the length of a ciphertext then two sections of message are encrypted using the same portion of keystream.
By XORing these two sections, the keystream cancels and one obtains the XOR of the plaintext strings.

RC4 stream cipher

Pros:

extremely simple
extremely fast
variable key length
No catastrophic weakness has been found

Cons:

design criteria are proprietary
not much public scrutiny until the year 2001
has small biases exploitable with ≈millions of ciphertexts

RC4 has two components: A key scheduling algorithm, and a keystream generator.

Aside:
Birthday paradox: Suppose that an urn contains n numbered balls.
Suppose that balls are drawn from the urn, one at a time, with replacement. The expected number of draws before a ball is selected for a second time (called a collision) is approximately ${\sqrt {πn/2}} ≈ {\sqrt n}.$

Block ciphers

同样也是流密码,但把长度分为一段一段的block.

Definition:
A block cipher is a symmetric-key encryption scheme in which a fixed-length block of plaintext determines an equal-sized block of ciphertext.

Security:
- Diffusion: each ciphertext bit should depend on all plaintext bits.
- Confusion: the relationship between key and ciphertext bits should be complicated.
- Key length: should be small, but large enough to preclude exhaustive key search.
Efficiency:
- Simplicity (easier to implement and analyze).
- High encryption and decryption rate.
- Suitability for hardware or software.

Data Encryption Standard (DES)

Note: Basically, DES is (believed to be) secure against everything except brute-force key search.

关于Feistel network:
用于DES.

Underlying principle: Take something “simple” and use it several times; hope that the result is “complicated”

DES Problem:

Small Key Size
Small Block Size

DES的Multiple encryption:

Double-DES:

k = (k1, k2), k1, k2 ∈ {\{0, 1\}}^{56}

c = E_{k2} (E_{k1} (m))

Double-DES attack: Meet-in-the-middle.
Main idea: If

c = E_{k1} (E_{k2} (m))

, then

E^{−1}_{k2}(c) = E_{k1}(m)

About meet-in-the-middle attack: time-memory tradeoff. The attack can be modified to decrease the storage requirements at the expense of time.

虽然两倍的DES不大行,但三倍好了些.
Three-key Triple encryption:

k = (k1, k2, k3), k1, k2, k3 ∈_R {\{0, 1\}}^{56}

c = E_{k3} (E_{k2} (E_{k1} (m)))

Meet-in-the-middle attack takes ≈ $2^{112}$ steps
No proof that Triple-DES is more secure than DES.但是有大量运用.

The Advanced Encryption Standard (AES)

关于Substitution-Permutation Networks:

AES比DES增加了Bitwise-XOR.

关于更多的Substitution-Permutation Network加密方式:

Rijndael (AES)
Heys cipher

Key whitening: Subkey prevents an adversary from reversing the final round of substitution and permutation.
Substitution-Permutation Networks的attack:寻找bias

更多->Piling-up Lemma的wiki.

The basic idea is to look for linear (boolean) relations among the bits which hold with probability much different from 50%.

Padding

Some modes, namely ECB and CBC, require the plaintext to consist of one or more complete blocks.
The padding bits can be removed unambiguously, if the receiver knows that this padding method is used.

中文翻译是填充,因为ECB CBC这样的block cipher对字数有特定的限制,于是得填充无用的字符来达到字数要求.

Example:

remove all trailing ‘0’ bits after the last ‘1’ bit 2.
remove a single ‘1’ bit.

Hashing

Hash的最初目的是单向的mapping.

A hash function is a checksum designed to be safe from malicious tampering.
Note: The description of a hash function is public. There are no secret keys.

Preimage resistance: Hard to invert given just an output.
2nd preimage resistance: Hard to find a second input that has the same hash value
as a given first input.
Collision resistance: Hard to find two different inputs that have the same hash values.

A hash function that is preimage resistant is sometimes called a one-way hash function (OWHF).
A hash function that is preimage-, second preimage-, and collision-resistant is called a cryptographic hash function.

Applications of hash functions:

Password protection
Modification Detection Codes (MDCs)
Message digests for digital signature schemes
Message Authentication Codes (MACs)
Pseudorandom bit generation
Key derivation function (KDF)

Theorem:

Collision resistance implies 2nd preimage resistance.
2nd preimage resistance does not guarantee collision resistance.
Collision resistance does not guarantee preimage resistance.

Popular hash functions:

MD5, SHA-1, SHA-2, SHA-3.

Generic attack for finding collisions

暴力测试 -> $2^{n}$
测试 (H(x), x) -> $\sqrt { 2^{n} }$

原理->Pollard’s rho algorithm & Floyd’s cycle-finding algorithm.
除了单线计算等待重复的search,也可以parallel(VW Parallel Collision Search).

Password hashing

不过常见的词被hash后有收录,所以有些可以知道hash前的内容 -> data branch

进阶版寻找hash的方法: Rainbow table
Rainbow tables are an example of a time-space tradeoff using hash chains.

阻碍rainbow table的是加salt(每次运算hash加入无意义的杂数).

Message authentication code (MAC)

用来验证信息是否被更改或被攻击.

Definition

A message authentication code (MAC) scheme is an efficiently computable function
M : ${ \{0, 1\} }^l × { \{0, 1\} }^* → { \{0, 1\} }^n$
written ` $M(k,m) = t$ , where k is the key, m is the message, and t is the tag.
Used for providing (symmetric-key) data integrity and data origin authentication.

To provide data integrity and data origin authentication:

Alice and Bob establish a secret key ${ \{ 0, 1 \} }^l$
Alice compute $t = (m, t)$ to Bob.
Bob verifies that $t = M(k, m)$ .

No confidentiality or non-repudiation.

Secure

A MAC scheme is secure if:

Given some number of MAC tags $M(k, m_i)$ for messages mi chosen by the adversary
(interaction),
it is computationally infeasible (computational resources)
to compute (with non-negligible probability of success) the value of $M(k, m)$ for any message $m \neq m_i$ (goal).
In other words, a MAC scheme is secure if it is existentially unforgeable against chosen-message attack.

常见的MAC

CBC-MAC, Encrypted CBC-MAC (EMAC)

以及跟hash function组合:

Secret prefix method:

M(k, m) = H(K||m)

(insecure)

M(k, m) = H(m||K)

(H is not collision resistant, insecure)

Envelope method a.k.a. Sandwich method:

M(k, m) = H(K∥m∥K)

Suppose that the compression function used in H is a secure MAC with fixed length messages and a secret IV as the key. Then Envelope MAC with m padded to a multiple of the block length of H is a secure MAC algorithm.

HMAC(“Hash-based” MAC):

M(k, m) = H (K⊕opad)∥H (K⊕ipad)∥m

Suppose that the compression function used in H is a secure MAC with fixed length messages and a secret IV as the key. Then HMAC is a secure MAC algorithm.

Pseudorandom generators

伪随机数生成器,生成看上去是随机的数列但实际是有其算法.使用seed来生成.

PRG

A pseudorandom generator is a deterministic function that takes as input a random seed $k ∈ { \{ 0, 1 \} }^λ$ and outputs a random-looking binary string of length l. PRG :

{\{0, 1\}}^λ → {\{0, 1\}}^l

PRF

A pseudorandom function is a deterministic function that takes as input a random seed $k ∈ { \{ 0, 1 \} }^λ$ and a (non-secret) label in ${ \{ 0, 1 \} }^*$ and outputs a random-looking binary string of length l. PRF :

{ \{ 0, 1 \} }^λ × { \{ 0, 1 \} }^* → { \{ 0, 1 \} }^l

KDF

A key derivation function is a deterministic function that takes as input a random seed $k ∈ { \{ 0, 1 \} }^λ$ and a (non-secret) label in ${ \{ 0, 1 \} }^*$ and outputs a random-looking binary string of length l. KDF :

{ \{ 0, 1 \} }^λ × { \{ 0, 1 \} }^* → { \{ 0, 1 \} }^l

Difference between KDFs and PRFs: KDF output should be indistinguishable from random even if the key k is non-uniform but sufficiently high entropy.

Applications

Key streching: Password-Based Key Derivation Function 2 (PBKDF2)

k = PBKDF2(F, p, s, c, l)

where
F = keyed hash function
p = passphrase
s = salt
c = number of iterations
l = output length

Authenticated encryption

结合加密和认证.

Composed primitives:

MAC-then-encrypt (MtE) : SSL/TLS up to v1.2
encrypt-then-MAC (EtM) : IPsec (safe)
encrypt-and-MAC (E&M) : SSH

Goal:

Confidentiality: semantic security under chosen plaintext or chosen ciphertext attack
Integrity: existential unforgeability under chosen message attack

再加上padding:

Apply MAC, then pad, then encrypt(insecure)
Apply padding, then MAC, then encrypt

Public-Key Cryptography

这个是我觉得加密学比较罗曼的一个部分(…).解密方式都是公开的,因为公开,所以可以被大量验证安全性.
(自己做的更容易被解答出来,不如说所有加密的内容都会有被解答的可能,只是那是否infeasible是判定安全性的标准.)
在公开的解密方式下,保证安全性的是加密方和解密方的密匙(key).其中的问题是,密匙的安全性.

Public key如字面意思,不需要私下的密匙,所有的东西都是可公开,在此前提下也能做到只有加密方和解密方能明白传递信息.
这不就是罗曼!

所做到的方式是创造大量的puzzle.
解密方只需要解答其中一个puzzle就能得到解码方式,而其他人要解码得在所有的puzzle中寻找解出那个有解码方式的puzzle.通过puzzle的数量来达到infeasible的标准.

Digital Signatures

类似的思路出现在Digital Signatures里.

RSA

RSA也是可以用在Digital Signatures里的.

$(n, e)$ is the public key and $(n, d)$ is the private key.
where
1.random primes $p$ and $q$ with $\log_2 p ≈ \log_2 q ≈ l/2$
2. $n = pq$
3. $φ(n) = (p−1)(q−1)$
4.integer $e$ with $1 < e < φ(n)$ and $gcd(e, φ(n)) = 1$
5. $d = e−1$ in $\Bbb Z_φ(n)$
Note: $de ≡ 1 (mod φ(n))$ by definition of $e^{−1}$

Encryption: $E((n, e), m) = m^e mod n$ .
Decryption: $D((n, d), c) = c^d mod n$ .

Diffie–Hellman key exchange

1. $p$ is a prime
2. $\Bbb Z^{∗}_{p}$ is the set $\{1,2,...,p−1 \}$
3. $g ∈ \Bbb Z^{∗}_{p}$ is an element of $\Bbb Z^{∗}_{p}$ of large prime order $q$

Note: $\Bbb Z^{∗}_{n}$ = $\{ x∈ \Bbb Z_{n} :gcd(x,n)=1 \}$
The order of an element $x∈ \Bbb Z^{∗}_{n}$ is defined to be the smallest positive integer $t$ such that $x^t =1$ in $\Bbb Z^{∗}_{n}$ .
An element of $\Bbb Z^{∗}_{n}$ is defined to be a generator if it has the maximum possible order.

The exchange:

Diffie–Hellman assumption(DH):
Let $a, b ∈ \Bbb Z_{q}$ . Given $g, g^a , g^b$ , it is computationally infeasible to determine $g^{ab}$ .
Discrete logarithm assumption(DLOG):
Let $a ∈ \Bbb Z_{q}$ . Given $g$ and $g^a$ , it is computationally infeasible to determine $a$ .
Decisional Diffie-Hellman assumption(DDH):
Let $x, y, z ∈ \Bbb Z_{q}$ . Given $g, g^x , g^y$ , and either $g^{xy}$ or $g^z$ , it is computationally infeasible to determine whether you were given the real $g^{xy}$ or the random $g^z$ .

Underlake Reading Room