关于建solr索引优化比较重要文章

有空翻译下,很好的
ConcurrentUpdateSolrClient vs CloudSolrClient for bulk update to SolrCloud
https://community.hortonworks.com/questions/9611/concurrentupdatesolrclient-vs-cloudsolrclient-for.html

提问:
We have a customer that needs to update few billion documents to SolrCloud. I know the suggested way of using is SolrCloudClient, for its load balancing feature.
As per docs - CloudSolrClient
SolrJ client class to communicate with SolrCloud. Instances of this class communicate with Zookeeper to discover Solr endpoints for SolrCloud collections, and then use the LBHttpSolrClient
to issue requests. This class assumes the id field for your documents is called 'id' - if this is not the case, you must set the right name with setIdField(String)
.
As per the docs - ConcurrentUpdateSolrClient
ConcurrentUpdateSolrClient buffers all added documents and writes them into open HTTP connections. This class is thread safe. Params from UpdateRequest
are converted to http request parameters. When params change between UpdateRequests a new HTTP request is started. Although any SolrClient request can be made with this implementation, it is only recommended to use ConcurrentUpdateSolrClient with /update requests. The class HttpSolrClient
is better suited for the query interface.
Now since with ConcurrentUdateSolrClient I am able to use a queue and a pool of threads, which makes it more attractive to use over CloudSolrClient which will use a HTTPSolrClient once it gets a set of nodes to do the updates.
I would love to hear more in depth discussion on these 2 APIs.
Thanks
Shivaji

评论

回答:
@sdutta in SolrCloud you should be using CloudSolrClient class. It should take care of everything you mentioned. Gets the active Solr servers from Zookeeper. And when you add the document, it will automatically send it to the server which is hosting the shard for the id, etc. It also keeps track if any Solr server is out of commission and automatically reconfigures itself.
CloudSolrClient solrCloudClient = new CloudSolrClient(zkHosts);
solrCloudClient.setDefaultCollection(collectionName);

Bosco, CloudSolrClient will return an LBHTTPClient (which load balances across the nodes). But I do not see that LBHTTPClient is multithreaded. So, the question begs, which has a higher throughput?

You will have to first see where the bottle neck is. Regardless how much you are going to push to the Solr server, it can only index only so many. If you feel transport is the main issue, then you can just create couple of threads and each thread can have it's own solrClient instance.

Secondly, you need to batch all your requests and you shouldn't commit from the client side. You should configure auto-commit on the Solr Server side and let it do the final commit. Between Solr doing the buffering v/s you doing the batching, I am not sure what would be the difference.

Throwing my 2 cents in since I've spent an insane amount of time working with Solr on this exact problem.
ConcurrentUpdateSolrClient is really easy to get going and you can get a high throughput just by increasing the number of threads. However, at some point it just won't be scalable or efficient once you have a bunch of Solr nodes.
If you are using Solr Cloud, then the CloudSolrClient is definitely the recommended way to go but, in my experience, it is much, much harder to get high throughput. Batching documents is pretty much a requirement. You can't really just increase the number of threads because each one opens a connection to Zookeeper.
If you decide to go with CloudSolrClient, take a look at the code in storm-solr.

I posted on the Solr community and got the below answer from a Committer :-
It's usually not all that difficult to write a multi-threaded client that uses CloudSolrClient, or even fire up multiple instances of the SolrJ client (assuming they can work
on discreet sections of the documents you need to index).
That avoids the problem Shawn alludes to. Plus other
issues. If you do not use CloudSolrClient, then all the
docs go to some node in the system that then sub-divides
the list (and you really should update in batches, see:
https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/)
then the node that receives the packet sub-divides it
into groups based on what shard they should be part of
and forwards them to the leaders for that shard, very
significantly increasing the numbers of conversations
being carried on between Solr nodes. Times the number
of threads you're specifying with CUSC (I really regret
the renaming from ConcurrentUpdateSolrServer, I liked
writing CUSS).
With CloudSolrClient, you can scale nearly linearly with
the number of shards. Not so with CUSC.
FWIW,
Erick

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 211,884评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,347评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 157,435评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,509评论 1 284
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,611评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,837评论 1 290
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,987评论 3 408
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,730评论 0 267
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,194评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,525评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,664评论 1 340
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,334评论 4 330
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,944评论 3 313
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,764评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,997评论 1 266
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,389评论 2 360
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,554评论 2 349

推荐阅读更多精彩内容