Understanding Transaction Logs, Soft Commit and Commit in SolrCloud

Hard commits, soft commits and transaction logs

转自地址
As of Solr 4.0, there is a new “soft commit” capability, and a new parameter for hard commits – openSearcher. Currently, there’s quite a bit of confusion about the interplay between soft and hard commit actions, and especially what it all means for the transaction log. The stock solrconfig.xml file explains the options, but with the usual documentation-in-example limits, if there was a full explanation of everything, the example file would be about a 10M and nobody would ever read through the whole thing. This article outlines the consequences hard and soft commits and the new openSearcher option for hard commits. The release documentation can be found in the Solr Reference Guide, this post is a more leisurely overview of this topic. I persuaded a couple of the committers to give me some details. I’m sure I was told the accurate information, any transcription errors are mine!

The mantra

Repeat after me “Hard commits are about durability, soft commits are about visibility“. Hard and soft commits are related concepts, but serve different purposes. Concealed in this simple statement are many details; we’ll try to illuminate some of them. First, some definitions:
Transaction log (tlog): A file where the raw documents are written for recovery purposes. In SolrCloud, each node has its own tlog. On update, the entire document gets written to the tlog. For Atomic Updates, it’s still* *the entire document, including data read from the old version of the document. In other words, the document written to the tlog is not the “delta” for atomic updates. Tlogs are critical for consistency, they are used to bring an index up to date if the JVM is stopped before segments get closed.NOTE: The transaction log will be replayed on server restart if the server was not gracefully shut down! So if your tlog is huge (and we’ve seen it in gigabyte ranges) then restarting your server can be very, very slow. As in hours.

Hard commit:This is governed by the <autoCommit> option in solrconfig.xml or explicit calls from a client (SolrJ or HTTP via the browser, cURL or similar). Hard commits truncate the current segment and open a new segment in your index.openSearcher: A boolean sub-property of <autoCommit> that governs whether the newly-committed data is made visible to subsequent searches.

Soft commit: A less-expensive operation than hard-commit (openSearcher=true) that also makes documents visible to search. Soft commits do not truncate the transaction log however.Important: Soft commits are “less expensive”, but they still aren’t free. You should make the soft commit interval as long as is reasonable for best performance!

fsynch: Low-level I/O-speak. When an fsynch call returns, the bits have actually been flipped on disk. This is different from simply having a Java program write data in that the return from a Java-level write only guarantees that the new data has been handed over to the operating system which will change the bits on disk in its own good time.
**flush: **The Java operation that hands the data over to the op-system. Upon return, the bits on the disk will not have been changed and the data will be lost in the event of an op system crash that happens at just the wrong time.Note that, especially in SolrCloud where there are more than one replica per shard, losing data requires that both nodes go down at the same time in such a manner as neither one manages to complete the write to disk, which is very unlikely.
The op system flips the bits after a flush in a few milliseconds (say 10-50 ms). If the JVM crashes, the op system will still change the bits on disk after a flush. But the op system crashes after Java “writes” the file but before the I/O subsystem gets around to actually changing the bits on disk, the data can be lost. This is usually not something you need to be concerned about, it’s important only when you need to be absolutely sure that no data ever gets lost.

Transaction Logs

Transaction logs are integral to the data guarantees of Solr4, and also a place people get into trouble, so let’s talk about them a bit. The indexing flow in SolrCloud is as follows:

  • Incoming documents are received by a node and forwarded to the proper leader.
  • From the leader they’re sent to all replicas for the relevant shard.
  • The replicas respond to their leader.
  • The leader responds to the originating node.
  • After all the leaders have responded, the originating node replies to the client. At this point, all documents have been flushed to the tlog for all the nodes in the cluster!
  • If the JVM crashes, the documents are still safely written to disk. If the op system crashes, then not.
  • If the JVM crashes (or, say, is killed with a -9), then on restart, the tlog is replayed.
  • You can alter the configuration in solrconfig.xml to fsynch rather than flush before return, but this is rarely necessary. With leaders and replicas the chance that all of the replicas suffer a hardware crash at the same time that loses data for all of them is small. Some use-cases cannot tolerate even this tiny chance, so may choose to pay the price of decreased throughput.

Note: tlogs are “rolled over” automatically on hard commit (openSearcher true or false). The old one is closed and a new one is opened. Enough tlogs are kept around to contain 100 documents [1], and older tlogs are deleted. Consider if you are indexing in batches of 25 documents and hard committing after each one (not that you should commit that often, but just saying). You should have 5 tlogs at any given time. the oldest four (closed) contain 25 documents each, totaling 100 plus the current tlog file. When the current tlog is closed, the oldest tlog will be deleted, and a new one opened. Note particularly that there is no attempt on Solr’s part to only* *put 100 documents in any particular tlog. Tlogs are only rolled over when *you *tell Solr to, i.e. issue a hard commit (or autoCommit happens, configured in solrconfig.xml). So in bulk-loading situations where you are loading, say, 1,000 docs/second and you don’t do a hard commit for an hour, your single tlog will contain 3,600,000 documents. And an un-graceful shutdown may cause it to be entirely replayed before the Solr node is open for business. This may take *hours. *And since I don’t have the patience to just wait, I think there’s something wrong and restart Solr. Which starts replaying the tlog all over again. You see where this is going. If you have very large tlogs, this is A Bad Thing and you should change your hard commit settings! This is especially trappy for people coming from the 3.x days where hard commits were often very lengthy since they were always expensive, there was no openSearcher=false option.

Soft commit

*Soft commits are about visibility, hard commits are about durability. * The thing to understand most about soft commits are that they will make documents visible, but at some cost. In particular the “top level” caches, which include what you configure in solrconfig.xml (filterCache, queryResultCache, etc) will be invalidated! Autowarming will be performed on your top level caches (e.g. filterCache, queryResultCache), and any newSearcher queries will be executed. Also, the FieldValueCache is invalidated, so facet queries will have to wait until the cache is refreshed. With very frequent soft commits it’s often the case that your top-level caches are little used and may, in some cases, be eliminated. However, “segment level caches”, used for function queries, sorting, etc., are “per segment”, so will not be invalidated on soft commit; they can continue to be used [2].

So what does all this mean?

  • Consider a soft commit. On execution you have the following:
  • The tlog has NOT been truncated. It will continue to grow.
  • The documents WILL be visible.
  • Some caches will have to be reloaded.
  • Your top-level caches will be invalidated.
  • Autowarming will be performed.
  • New segments are created that will be merged.

Note, I haven’t said a thing about index segments! That’s for hard commits. And again, soft commits are “less expensive” than hard commits (openSearcher=true), but they are not free. The motto of the Lunar Colony in a science fiction novel (“The Moon Is a Harsh Mistress” by Robert Heinlein) was TANSTAAFL, There Ain’t No Such Thing As A Free Lunch. Soft commits are there to support Near Real Time, and they do. But they do have cost, so use the longest soft commit interval you can for best performance.

Hard commit

Hard commits are about durability, soft commits are about visibility. There are really two flavors here, openSearcher=true and openSearcher=false. First we’ll talk about what happens in both cases. If openSearcher=true or openSearcher=false, the following consequences are most important:
The tlog is truncated: A new tlog is started. Old tlogs will be deleted if there are more than 100 documents in newer, closed tlogs.
The current index segment is closed and flushed.
Background segment merges may be initiated.

The above happens on all hard commits. That leaves the openSearcher setting

  • openSearcher=true: The Solr/Lucene searchers are re-opened and all caches are invalidated. Autowarming is done etc. This used to be the only way you could see newly-added documents.
  • openSearcher=false: Nothing further happens other than the four points above. To search the docs, a soft commit is necessary.

Recovery

I’ve talked above about durability, so let’s expand on that a bit. When a machine crashes, the JVM quits, whatever, here’s the state of your cluster.

  • The last update call that returned successfully has all the documents written to the tlogs in the cluster. The default is that the tlog has been flushed, but not fsync’d. As discussed above, you can override this default behavior but it is not recommended.

  • On restart of the affected machine, it contacts the leader and either

    • Replays the documents from its own tlog if < 100 new updates have been received by the leader. Note that while this replay is going on, additional updates that come in are written to the end of the tlog and they’ll need to be replayed too.
    • Does an old-style full replication from the leader to catch up if the leader received > 100 updates while the node was offline.
  • Recovery can take some time. This is one of the hidden “gotchas” people are running in to as the work with SolrCloud. They are experimenting, so they’re bouncing servers up and down all over the place, killing Solrs with ‘kill -9’ etc. On the one hand, this is great, since it exercises the whole SolrCloud recovery process. On the other hand it’s not very great as it’s a highly artificial experience. If you have nodes disappearing many times a day you have bigger problems than Solr taking some time to recover on startup that should be fixed!

Recommendations

I always shudder at this, because any recommendation will be wrong in some cases. My first recommendation would be to not overthink the problem. Some very smart people have tried to make the entire process robust. Try the simple things first and only tweak things as necessary. In particular, look at the size of your transaction logs and adjust your hard commit intervals to keep these “reasonably sized”. Remember that the penalty is mostly the replay-time involved if you restart after a JVM crash. Is 15 seconds tolerable? Why go smaller then? We’ve seen situations in which the hard commit interval is much shorter than the soft commit interval, see the bulk indexing bit below.

Shut down gracefully. In other words, “kill -9” while indexing is just asking for trouble

This means:

  • Stop ingesting documents

  • Issue a hard commit or wait until the autoCommit interval expires.

  • Stop the Solr servers.

    These are settings to *start *with and tweak to fit your situation.

Heavy (bulk) indexing

The assumption here is that you’re interested in getting lots of data to the index as quickly as possible for search sometime in the future. I’m thinking original loads of a data source etc.

  • Set your soft commit interval quite long. As in 10 minutes or even longer (-1 for no soft commits at all). *Soft commit is about visibility, *and my assumption here is that bulk indexing isn’t about near real time searching so don’t do the extra work of opening any kind of searcher.
  • Set your hard commit intervals to 15 seconds, openSearcher=false. Again the assumption is that you’re going to be just blasting data at Solr. The worst case here is that you restart your system and have to replay 15 seconds or so of data from your tlog. If your system is bouncing up and down more often than that, fix the reason for that first.
  • Only after you’ve tried the simple things should you consider refinements, they’re usually only required in unusual circumstances. But they include:
    • Turning off the tlog completely for the bulk-load operation
    • Indexing offline with some kind of map-reduce process
    • Only having a leader per shard, no replicas for the load, then turning on replicas later and letting them do old-style replication to catch up. Note that this is automatic, if the node discovers it is “too far” out of sync with the leader, it initiates an old-style replication. After it has caught up, it’ll get documents as they’re indexed to the leader and keep its own tlog.
  • etc.

Index-heavy, Query-light

By this I mean, say, searching log files. This is the case where you have a lot of data coming at the system pretty much all the time. But the query load is quite light, often to troubleshoot or analyze usage.

  • Set your soft commit interval quite long, up to the maximum latency you can stand for documents to be visible. This could be just a couple of minutes or much longer. Maybe even hours with the capability of issuing a hard commit (openSearcher=true) or soft commit on demand.
  • Set your hard commit to 15 seconds, openSearcher=false

Index-light, Query-light or heavy

This is a relatively static index that sometimes gets a small burst of indexing. Say every 5-10 minutes (or longer) you do an update

  • Unless NRT functionality is required, I’d omit soft commits in this situation and do hard commits every 5-10 minutes with openSearcher=true. This is a situation in which, if you’re indexing with a single external indexing process, it might make sense to have the client issue the hard commit.

Index-heavy, Query-heavy

This is the Near Real Time (NRT) case, and is really the trickiest of the lot. This one will require experimentation, but here’s where I’d start

  • Set your soft commit interval to as long as you can stand. Don’t listen to your product manager who says “we need no more than 1 second latency”. Really. Push back hard and see if the user is best served or will even notice. Soft commits and NRT are pretty amazing, but they’re not free.
  • Set your hard commit interval to 15 seconds.

SolrJ and HTTP and client indexing

Generally, all the options available automatically are also available via SolrJ or HTTP. The HTTP commands are documented here. The SolrJ commands are in the Javadocs, SolrServer class. *Late edit (Jun, 2014) ****Be very careful committing from the client! In fact, don’t do it. ***By and large, do not issue commits from any client indexing to Solr, it’s almost always a mistake. And especially in those cases where you have multiple clients indexing at once, it is A Bad Thing. What happens is commits come in unpredictably close to each other, generating work as above. You’ll possibly see warnings in your log about “too many warming searchers”. Or you’ll see a zillion small segments. Or… Let your autocommit settings (both soft and hard) in solrconfig.xml handle the commit frequency. If you absolutely must control the visibility, say you want to search docs right after the indexing run happens and you can’t afford to wait for your autocommit settings to kick in, commit once at the end. In fact, I would only do that if I had only one indexing client. Otherwise, I’d wait until they were all finished and submit a “manual” commit, something like:

http://host:port/solr/collection/update?commit=true

should do it, cURL it in, send it from a browser, etc. You can also in the SolrJ world, add documents to Solr and specify a “commitWithin” interval, measured in milliseconds. This is perfectly reasonable to do from as many SolrJ clients as you want as it’s all the same to Solr. What happens on the server is that the timer is started when the first update is received that specifies commitWithin. commitWithin milliseconds later, all docs that have been received from any client are committed as well regardless of whether they have commitWithin specified or not. The next update that has commitWithin specified starts a new timer. And remember, optimizing an index is rarely necessary! Happy Indexing!

[1] I’ve used 100 for the size of the tlog in all the examples. This is the default value and was the only choice in the original implementation. Since Solr 5.2, this is configurable in solrconfig.xml by specifying numRecordsToKeep in the <updatelog…> section.

[2] Three years after writing this post, a thoughtful reader asked a question at the 2016 Lucene Revolution “Stump the chump” session. The question was roughly “Erick Erickson says that per-segment caches are NOT invalidated on soft commit, does that mean facet counts and the like are inaccurate after a soft commit?”. Re-reading that section, I realized the confusion. What I was trying to say (clearly unclearly) was that they were not invalidated because there was no need to invalidate them, the per-segment caches are still accurate since the segments they reference haven’t changed. So the uses of per-segment caches for facet & etc. continue to be accurate without needing to reload them. Pedantic note: The fact that docs can be deleted from this segment is handled. Lucene/Solr “does the right thing” with deleted docs.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 214,658评论 6 496
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,482评论 3 389
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,213评论 0 350
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,395评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,487评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,523评论 1 293
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,525评论 3 414
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,300评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,753评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,048评论 2 330
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,223评论 1 343
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,905评论 5 338
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,541评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,168评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,417评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,094评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,088评论 2 352

推荐阅读更多精彩内容