Hard commits, soft commits and transaction logs
转自地址
As of Solr 4.0, there is a new “soft commit” capability, and a new parameter for hard commits – openSearcher. Currently, there’s quite a bit of confusion about the interplay between soft and hard commit actions, and especially what it all means for the transaction log. The stock solrconfig.xml file explains the options, but with the usual documentation-in-example limits, if there was a full explanation of everything, the example file would be about a 10M and nobody would ever read through the whole thing. This article outlines the consequences hard and soft commits and the new openSearcher option for hard commits. The release documentation can be found in the Solr Reference Guide, this post is a more leisurely overview of this topic. I persuaded a couple of the committers to give me some details. I’m sure I was told the accurate information, any transcription errors are mine!
The mantra
Repeat after me “Hard commits are about durability, soft commits are about visibility“. Hard and soft commits are related concepts, but serve different purposes. Concealed in this simple statement are many details; we’ll try to illuminate some of them. First, some definitions:
Transaction log (tlog): A file where the raw documents are written for recovery purposes. In SolrCloud, each node has its own tlog. On update, the entire document gets written to the tlog. For Atomic Updates, it’s still* *the entire document, including data read from the old version of the document. In other words, the document written to the tlog is not the “delta” for atomic updates. Tlogs are critical for consistency, they are used to bring an index up to date if the JVM is stopped before segments get closed.NOTE: The transaction log will be replayed on server restart if the server was not gracefully shut down! So if your tlog is huge (and we’ve seen it in gigabyte ranges) then restarting your server can be very, very slow. As in hours.
Hard commit:This is governed by the <autoCommit> option in solrconfig.xml or explicit calls from a client (SolrJ or HTTP via the browser, cURL or similar). Hard commits truncate the current segment and open a new segment in your index.openSearcher: A boolean sub-property of <autoCommit> that governs whether the newly-committed data is made visible to subsequent searches.
Soft commit: A less-expensive operation than hard-commit (openSearcher=true) that also makes documents visible to search. Soft commits do not truncate the transaction log however.Important: Soft commits are “less expensive”, but they still aren’t free. You should make the soft commit interval as long as is reasonable for best performance!
fsynch: Low-level I/O-speak. When an fsynch call returns, the bits have actually been flipped on disk. This is different from simply having a Java program write data in that the return from a Java-level write only guarantees that the new data has been handed over to the operating system which will change the bits on disk in its own good time.
**flush: **The Java operation that hands the data over to the op-system. Upon return, the bits on the disk will not have been changed and the data will be lost in the event of an op system crash that happens at just the wrong time.Note that, especially in SolrCloud where there are more than one replica per shard, losing data requires that both nodes go down at the same time in such a manner as neither one manages to complete the write to disk, which is very unlikely.
The op system flips the bits after a flush in a few milliseconds (say 10-50 ms). If the JVM crashes, the op system will still change the bits on disk after a flush. But the op system crashes after Java “writes” the file but before the I/O subsystem gets around to actually changing the bits on disk, the data can be lost. This is usually not something you need to be concerned about, it’s important only when you need to be absolutely sure that no data ever gets lost.
Transaction Logs
Transaction logs are integral to the data guarantees of Solr4, and also a place people get into trouble, so let’s talk about them a bit. The indexing flow in SolrCloud is as follows:
- Incoming documents are received by a node and forwarded to the proper leader.
- From the leader they’re sent to all replicas for the relevant shard.
- The replicas respond to their leader.
- The leader responds to the originating node.
- After all the leaders have responded, the originating node replies to the client. At this point, all documents have been flushed to the tlog for all the nodes in the cluster!
- If the JVM crashes, the documents are still safely written to disk. If the op system crashes, then not.
- If the JVM crashes (or, say, is killed with a -9), then on restart, the tlog is replayed.
- You can alter the configuration in solrconfig.xml to fsynch rather than flush before return, but this is rarely necessary. With leaders and replicas the chance that all of the replicas suffer a hardware crash at the same time that loses data for all of them is small. Some use-cases cannot tolerate even this tiny chance, so may choose to pay the price of decreased throughput.
Note: tlogs are “rolled over” automatically on hard commit (openSearcher true or false). The old one is closed and a new one is opened. Enough tlogs are kept around to contain 100 documents [1], and older tlogs are deleted. Consider if you are indexing in batches of 25 documents and hard committing after each one (not that you should commit that often, but just saying). You should have 5 tlogs at any given time. the oldest four (closed) contain 25 documents each, totaling 100 plus the current tlog file. When the current tlog is closed, the oldest tlog will be deleted, and a new one opened. Note particularly that there is no attempt on Solr’s part to only* *put 100 documents in any particular tlog. Tlogs are only rolled over when *you *tell Solr to, i.e. issue a hard commit (or autoCommit happens, configured in solrconfig.xml). So in bulk-loading situations where you are loading, say, 1,000 docs/second and you don’t do a hard commit for an hour, your single tlog will contain 3,600,000 documents. And an un-graceful shutdown may cause it to be entirely replayed before the Solr node is open for business. This may take *hours. *And since I don’t have the patience to just wait, I think there’s something wrong and restart Solr. Which starts replaying the tlog all over again. You see where this is going. If you have very large tlogs, this is A Bad Thing and you should change your hard commit settings! This is especially trappy for people coming from the 3.x days where hard commits were often very lengthy since they were always expensive, there was no openSearcher=false option.
Soft commit
*Soft commits are about visibility, hard commits are about durability. * The thing to understand most about soft commits are that they will make documents visible, but at some cost. In particular the “top level” caches, which include what you configure in solrconfig.xml (filterCache, queryResultCache, etc) will be invalidated! Autowarming will be performed on your top level caches (e.g. filterCache, queryResultCache), and any newSearcher queries will be executed. Also, the FieldValueCache is invalidated, so facet queries will have to wait until the cache is refreshed. With very frequent soft commits it’s often the case that your top-level caches are little used and may, in some cases, be eliminated. However, “segment level caches”, used for function queries, sorting, etc., are “per segment”, so will not be invalidated on soft commit; they can continue to be used [2].
So what does all this mean?
- Consider a soft commit. On execution you have the following:
- The tlog has NOT been truncated. It will continue to grow.
- The documents WILL be visible.
- Some caches will have to be reloaded.
- Your top-level caches will be invalidated.
- Autowarming will be performed.
- New segments are created that will be merged.
Note, I haven’t said a thing about index segments! That’s for hard commits. And again, soft commits are “less expensive” than hard commits (openSearcher=true), but they are not free. The motto of the Lunar Colony in a science fiction novel (“The Moon Is a Harsh Mistress” by Robert Heinlein) was TANSTAAFL, There Ain’t No Such Thing As A Free Lunch. Soft commits are there to support Near Real Time, and they do. But they do have cost, so use the longest soft commit interval you can for best performance.
Hard commit
Hard commits are about durability, soft commits are about visibility. There are really two flavors here, openSearcher=true and openSearcher=false. First we’ll talk about what happens in both cases. If openSearcher=true or openSearcher=false, the following consequences are most important:
The tlog is truncated: A new tlog is started. Old tlogs will be deleted if there are more than 100 documents in newer, closed tlogs.
The current index segment is closed and flushed.
Background segment merges may be initiated.
The above happens on all hard commits. That leaves the openSearcher setting
- openSearcher=true: The Solr/Lucene searchers are re-opened and all caches are invalidated. Autowarming is done etc. This used to be the only way you could see newly-added documents.
- openSearcher=false: Nothing further happens other than the four points above. To search the docs, a soft commit is necessary.
Recovery
I’ve talked above about durability, so let’s expand on that a bit. When a machine crashes, the JVM quits, whatever, here’s the state of your cluster.
The last update call that returned successfully has all the documents written to the tlogs in the cluster. The default is that the tlog has been flushed, but not fsync’d. As discussed above, you can override this default behavior but it is not recommended.
-
On restart of the affected machine, it contacts the leader and either
- Replays the documents from its own tlog if < 100 new updates have been received by the leader. Note that while this replay is going on, additional updates that come in are written to the end of the tlog and they’ll need to be replayed too.
- Does an old-style full replication from the leader to catch up if the leader received > 100 updates while the node was offline.
Recovery can take some time. This is one of the hidden “gotchas” people are running in to as the work with SolrCloud. They are experimenting, so they’re bouncing servers up and down all over the place, killing Solrs with ‘kill -9’ etc. On the one hand, this is great, since it exercises the whole SolrCloud recovery process. On the other hand it’s not very great as it’s a highly artificial experience. If you have nodes disappearing many times a day you have bigger problems than Solr taking some time to recover on startup that should be fixed!
Recommendations
I always shudder at this, because any recommendation will be wrong in some cases. My first recommendation would be to not overthink the problem. Some very smart people have tried to make the entire process robust. Try the simple things first and only tweak things as necessary. In particular, look at the size of your transaction logs and adjust your hard commit intervals to keep these “reasonably sized”. Remember that the penalty is mostly the replay-time involved if you restart after a JVM crash. Is 15 seconds tolerable? Why go smaller then? We’ve seen situations in which the hard commit interval is much shorter than the soft commit interval, see the bulk indexing bit below.
Shut down gracefully. In other words, “kill -9” while indexing is just asking for trouble
This means:
Stop ingesting documents
Issue a hard commit or wait until the autoCommit interval expires.
-
Stop the Solr servers.
These are settings to *start *with and tweak to fit your situation.
Heavy (bulk) indexing
The assumption here is that you’re interested in getting lots of data to the index as quickly as possible for search sometime in the future. I’m thinking original loads of a data source etc.
- Set your soft commit interval quite long. As in 10 minutes or even longer (-1 for no soft commits at all). *Soft commit is about visibility, *and my assumption here is that bulk indexing isn’t about near real time searching so don’t do the extra work of opening any kind of searcher.
- Set your hard commit intervals to 15 seconds, openSearcher=false. Again the assumption is that you’re going to be just blasting data at Solr. The worst case here is that you restart your system and have to replay 15 seconds or so of data from your tlog. If your system is bouncing up and down more often than that, fix the reason for that first.
- Only after you’ve tried the simple things should you consider refinements, they’re usually only required in unusual circumstances. But they include:
- Turning off the tlog completely for the bulk-load operation
- Indexing offline with some kind of map-reduce process
- Only having a leader per shard, no replicas for the load, then turning on replicas later and letting them do old-style replication to catch up. Note that this is automatic, if the node discovers it is “too far” out of sync with the leader, it initiates an old-style replication. After it has caught up, it’ll get documents as they’re indexed to the leader and keep its own tlog.
- etc.
Index-heavy, Query-light
By this I mean, say, searching log files. This is the case where you have a lot of data coming at the system pretty much all the time. But the query load is quite light, often to troubleshoot or analyze usage.
- Set your soft commit interval quite long, up to the maximum latency you can stand for documents to be visible. This could be just a couple of minutes or much longer. Maybe even hours with the capability of issuing a hard commit (openSearcher=true) or soft commit on demand.
- Set your hard commit to 15 seconds, openSearcher=false
Index-light, Query-light or heavy
This is a relatively static index that sometimes gets a small burst of indexing. Say every 5-10 minutes (or longer) you do an update
- Unless NRT functionality is required, I’d omit soft commits in this situation and do hard commits every 5-10 minutes with openSearcher=true. This is a situation in which, if you’re indexing with a single external indexing process, it might make sense to have the client issue the hard commit.
Index-heavy, Query-heavy
This is the Near Real Time (NRT) case, and is really the trickiest of the lot. This one will require experimentation, but here’s where I’d start
- Set your soft commit interval to as long as you can stand. Don’t listen to your product manager who says “we need no more than 1 second latency”. Really. Push back hard and see if the user is best served or will even notice. Soft commits and NRT are pretty amazing, but they’re not free.
- Set your hard commit interval to 15 seconds.
SolrJ and HTTP and client indexing
Generally, all the options available automatically are also available via SolrJ or HTTP. The HTTP commands are documented here. The SolrJ commands are in the Javadocs, SolrServer class. *Late edit (Jun, 2014) ****Be very careful committing from the client! In fact, don’t do it. ***By and large, do not issue commits from any client indexing to Solr, it’s almost always a mistake. And especially in those cases where you have multiple clients indexing at once, it is A Bad Thing. What happens is commits come in unpredictably close to each other, generating work as above. You’ll possibly see warnings in your log about “too many warming searchers”. Or you’ll see a zillion small segments. Or… Let your autocommit settings (both soft and hard) in solrconfig.xml handle the commit frequency. If you absolutely must control the visibility, say you want to search docs right after the indexing run happens and you can’t afford to wait for your autocommit settings to kick in, commit once at the end. In fact, I would only do that if I had only one indexing client. Otherwise, I’d wait until they were all finished and submit a “manual” commit, something like:
http://host:port/solr/collection/update?commit=true
should do it, cURL it in, send it from a browser, etc. You can also in the SolrJ world, add documents to Solr and specify a “commitWithin” interval, measured in milliseconds. This is perfectly reasonable to do from as many SolrJ clients as you want as it’s all the same to Solr. What happens on the server is that the timer is started when the first update is received that specifies commitWithin. commitWithin milliseconds later, all docs that have been received from any client are committed as well regardless of whether they have commitWithin specified or not. The next update that has commitWithin specified starts a new timer. And remember, optimizing an index is rarely necessary! Happy Indexing!
[1] I’ve used 100 for the size of the tlog in all the examples. This is the default value and was the only choice in the original implementation. Since Solr 5.2, this is configurable in solrconfig.xml by specifying numRecordsToKeep in the <updatelog…> section.
[2] Three years after writing this post, a thoughtful reader asked a question at the 2016 Lucene Revolution “Stump the chump” session. The question was roughly “Erick Erickson says that per-segment caches are NOT invalidated on soft commit, does that mean facet counts and the like are inaccurate after a soft commit?”. Re-reading that section, I realized the confusion. What I was trying to say (clearly unclearly) was that they were not invalidated because there was no need to invalidate them, the per-segment caches are still accurate since the segments they reference haven’t changed. So the uses of per-segment caches for facet & etc. continue to be accurate without needing to reload them. Pedantic note: The fact that docs can be deleted from this segment is handled. Lucene/Solr “does the right thing” with deleted docs.