Hashcash

What is Hashcash?

Hashcash is a proof-of-work system used to limit email spam and denial-of-service attacks, and more recently has become known for its use in bitcoin (and other cryptocurrencies) as part of the mining algorithm. The original idea was first proposed by Cynthia Dwork and Moni Naor and Eli Ponyatovski in their 1992 paper "Pricing via Processing or Combatting Junk Mail". Later a similar proposal called Hashcash was proposed in 1997 by Adam Back.

哈希现金是一个工作量证明系统。这个系统用来限制垃圾邮件(email spam)和拒绝服务攻击(denial-of-service attacks)。最近由于被用于比特币以及其他加密数字货币中作为挖矿算法的一部分而备受瞩目。这个想法最初是1992年由 Cynthia Dwork、Moni Naor 和 Eli Ponyatovski 在他们的论文《通过处理和对抗垃圾邮件来定价》中提出(propose)。随后在1997年,一个相似的名为“哈希现金”的概念被Adam Back提出。

How it works

Hashcash is a proof-of-work algorithm that requires a selectable amount of work to compute, but the proof can be verified efficiently. For email uses, a textual encoding of a hashcash stamp is added to the header of an email to prove the sender has expended a modest amount of CPU time calculating the stamp prior to sending the email. In other words, as the sender has taken a certain amount of time to generate the stamp and send the email, it is unlikely that they are a spammer. The receiver can, at negligible computational cost, verify that the stamp is valid. However, the only known way to find a header with the necessary properties is brute force, trying random values until the answer is found; though testing an individual string is easy, if satisfactory answers are rare enough it will require a substantial number of tries to find the answer.

哈希现金是一个工作量证明算法。这个算法需要定量的计算工作,但是证据的验证过程非常高效。为了用于电子邮件中,需要把哈希现金戳的文本编码添加到邮件的头部。这样就能证明发送方在发送邮件前花费了适当的(a modest amount of)CPU时钟来计算哈希现金戳。换句话说(In other words),由于发送方花费了一定时间去生成哈希现金戳并发送了邮件,这说明他不太可能是一个垃圾邮件发送者。接收方可以凭借微不足道的(negligible)计算开销验证受到的哈希现金戳是否有效。但是,已知的寻找一个包含必备属性的头部的唯一方法就是暴力求解(brute force)。尝试不同的随机数直至找到合适的答案。虽然验证一个字符串是容易的,但是合格的(satisfactory)答案是非常稀少的,需要进行大量的(substantial number of)尝试才能找到。

The hypothesis is that spammers, whose business model relies on their ability to send large numbers of emails with very little cost per message, will cease to be profitable if there is even a small cost for each spam they send. Receivers can verify whether a sender made such an investment and use the results to help filter email.

这个假设是基于这样的事实:垃圾邮件发送者为了盈利必须发送大量的电子邮件。每封邮件消耗少量的资源,但是一旦加在一起就会是个不小的数字,使得他们无法借此盈利(cease to be profitable)。接收方可以验证邮件的发送方是否付出了这种小型“投资”,借此过滤垃圾邮件。

Technical details

The header line looks something like this:
头部格式如下:

X-Hashcash: 1:20:1303030600:adam@cypherspace.org::McMybZIhxKXu57jd:ckvi

The header contains:
头部包含如下部分:

ver: Hashcash format version, 1 (which supersedes version 0).
ver: 哈希现金格式版本, 填1(替换了版本0)

bits: Number of "partial pre-image" (zero) bits in the hashed code.
bits: 哈希值中前象的位数(即哈希值前几位为全0)

date: The time that the message was sent, in the format YYMMDD[hhmm[ss]].
date: 邮件发送时间,格式为 YYMMDD[hhmm[ss]]。

resource: Resource data string being transmitted, e.g., an IP address or email address.
resource: 资源数据串。例如:IP地址或邮箱地址。

ext: Extension (optional; ignored in version 1).
ext: 扩展名(可选填,如果是版本1则不需填写)

rand: String of random characters, encoded in base-64 format.
rand: 随机数,以base-64编码。

counter: Binary counter (up to 220 ), encoded in base-64 format.
counter: 计数器(最大为220),以base-64编码。

The header contains the recipient's email address, the date of the message, and information proving that the required computation has been performed. The presence of the recipient's email address requires that a different header be computed for each recipient. The date allows the recipient to record headers received recently and to ensure that the header is unique to the email message.

头部包含接收方的邮箱地址,邮件发送时间和用于证明发送方已经执行相关计算的信息。接收方邮箱地址的存在使得不同的接收方可以计算产生不同的头部。日期使得接收方可以记录下近期受到的头部并保证这个头部是当前的邮件专属的(is unique to the email message)。

Sender's side

The sender prepares a header and appends a counter value initialized to a random number. It then computes the 160-bit SHA-1 hash of the header. If the first 20 bits (i.e. the 5 most significant hex digits) of the hash are all zeros, then this is an acceptable header. If not, then the sender increments the counter and tries the hash again. Out of 2160 possible hash values, there are 2140 hash values that satisfy this criterion. Thus the chance of randomly selecting a header that will have 20 zeros as the beginning of the hash is 1 in 220. The number of times that the sender needs to try to get a valid hash value is modeled by geometric distribution. Hence the sender will on average have to try 220 values to find a valid header. Given reasonable estimates of the time needed to compute the hash, this would take about 1 second to find. At this time, no more efficient method is known to find a valid header.

发送方准备了一个头部,并且在头部末尾加上一个计数器值。这个值使用一个随机数初始化。随后使用这个头部计算出一个160位的SHA-1哈希值。如果前20位(亦即前5位十六进制数)全为0,则这是一个合格的头部。否则,发送方递增计数器的值并重新计算哈希值。在 2160 个可能出现的哈希值中(out of),有 2140 个哈希值符合条件(criterion)。那么随机选择一个开头20位全0的头部的概率为 1/220。发送方要选中一个有效哈希值所需的尝试次数遵从几何分布(geometric distribution)。因此发送方平均需要尝试 220 个值才能找到一个合格的头部。估计计算这样一个有效哈希值所需要的时间大概是1秒。当前,获取一个合格的头部没有更行之有效的方法。

A normal user on a desktop PC would not be significantly inconvenienced by the processing time required to generate the Hashcash string. However, spammers would suffer significantly due to the large number of spam messages sent by them.

一个PC端的普通用户不会因为生成一个哈希现金串而消耗的处理时间产生巨大不便(significantly inconvenienced)。但是,一个垃圾邮件发送者则不得不为(due to)发送大量的垃圾邮件承受巨大的折磨(suffer significantly)。

Recipient's side

Technically the system is implemented with the following steps:

一般系统会经历如下几步:

  1. The recipient's computer calculates the 160-bit SHA-1 hash of the entire string.
    1:20:060408:adam@cypherspace.org::1QTjaYd7niiQA/sc:ePa
    This takes about two microseconds on a 1 GHz machine, far less time than the time it takes for the rest of the e-mail to be received. If the first 20 bits are not all zero, the hash is invalid. (Later versions may require more bits to be zero as machine processing speeds increase.)
  • 接收方的电脑计算整个字串的160位SHA-1哈希值。
    1:20:060408:adam@cypherspace.org::1QTjaYd7niiQA/sc:ePa
    在一台主频1GHz的机器上大概会消耗两毫秒的时间来计算这个哈希值,远低于收取剩余邮件的时间。如果开头的20位非全0,那么此哈希值无效。(随着硬件处理速度的提升,后续版本可能会要求更多的位数为0)
  1. The recipient's computer checks the date in the header (e.g., "060408", which represents the date 8 Apr 2006). If it is not within two days of the current date, it is invalid. (The two-day window compensates for clock skew and network routing time between different systems.)
  • 接收者的计算检查头部的时间戳(如"060408",代表2006年4月8日)。如果这个日期不在当前日期的两天以内,那么哈希值无效。(两天长度的时间窗口补偿了(compensate for)时钟偏差(clock skew)和邮件在双方之间网络路由的时间(network routing time))
  1. The recipient's computer checks whether the e-mail address in the hash string matches any of the valid e-mail addresses registered by the recipient, or matches any of the mailing lists to which the recipient is subscribed. If a match is not found, the hash string is invalid.
  • 接收方的电脑检查哈希值中的邮件地址是否符合接收方登记的有效邮件地址之一,或者是否匹配接收方注册的邮箱列表之一。如果没有匹配的选项,那么这个哈希串是无效的。
  1. The recipient's computer inserts the hash string into a database. If the string is already in the database (indicating that an attempt is being made to re-use the hash string), it is invalid.
  • 接收方的电脑将哈希串插入数据库中。如果该串已经存在与数据库中,说明发送方有意重用这枚哈希串,那么该串无效。

If the hash string passes all of these tests, it is considered a valid hash string. All of these tests take far less time and disk space than receiving the body content of the e-mail.

如果哈希串通过了以上所有验证,那么可以认定这是一条合格的哈希串。所有以上测试所使用的时间和空间远低于接收邮件内容(body content)的消耗。

Required effort

The time needed to compute such a hash collision is exponential with the number of zero bits. So zero bits can be added (doubling the amount of time needed to compute a hash with each additional zero bit) until it is too expensive for spammers to generate valid header lines.

用于计算哈希碰撞(hash collision)的时间是哈希值规定全0位数的指数级别。全0位数可以递增(每增加一个0位则计算一个合格哈希值所需的时间将会翻番)直到令垃圾邮件发送者感到生成合格的头部开销太大。

Confirming that the header is valid always takes the same amount of time, no matter how many zero bits are required for a valid header, since this requires only a single hashing operation.

无论条件中要求多少个全0位,我们总是消耗相同的时间来确认头部是否合格,因为接收方只需要执行一次哈希运算(hashing operation)。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 206,968评论 6 482
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 88,601评论 2 382
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 153,220评论 0 344
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 55,416评论 1 279
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 64,425评论 5 374
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,144评论 1 285
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,432评论 3 401
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,088评论 0 261
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 43,586评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,028评论 2 325
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,137评论 1 334
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,783评论 4 324
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,343评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,333评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,559评论 1 262
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,595评论 2 355
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,901评论 2 345

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,294评论 0 10
  • 2016-7-14 闲来无事,偶然翻到去年给文澜副刊写的文,当年送大四学长学姐毕业,如今轮到了我们,等到毕业尘埃落...
    时辰君阅读 914评论 0 0
  • 集中营,一个可怕的字眼。 我们通常以一种同情的眼光看待在集中营中受凌辱的人。 但是,似乎任何一个团体都是一个微型社...
    Vera_2546阅读 350评论 0 0