先科普一下大名鼎鼎的Two Sigma吧。2001年创立,现在管理多达500亿美金的资产,排名对冲基金公司的全球第四!!!做为一家知名的对冲基金,为何他们给Apache Spark社区坚持不断地做贡献?
这也许是很多人困惑的事情。其实,现在的对冲基金的投资决策是基于大量数据分析,结合人工智能技术而做出的。。。因此,他们也是重度的Spark用户!还专门为社区开发了时序处理的库Flint【此为大财阀Two Sigma的Flint 非阿里的Flink】!他们的网站https://opensource.twosigma.com 如是说,
We depend on Apache Spark to scale our data-heavy analyses and we build tools on top of it (see our Flint project above). The Spark Summits each year are a calendar highlight.
WHY WE CONTRIBUTE
At Two Sigma, we use science and technology to tackle the world’s most complex problems. We balance IP concerns with the drive to give back to the community – wherever possible, we believe in open sourcing the tools we’ve developed to help others discover value in the world’s data.
实际上,他们也到处宣讲他们的开源精神!见Slides:Why Two Sigma Contributes to Open Source
#1: Leveraging other people’s work
#2: Shaping the ecosystem
#3: Avoiding isolation
#4: Being cool
#5: Building your legacy
#6: Making the world a better place
做为重度使用Spark的公司,他们做为投资界的表率,孜孜不倦地为社区做着各种贡献,下面两篇blog介绍了其中的主要贡献!
- 基于Spark的时序library:Introducing Flint: A time-series library for Apache Spark
- 基于Arrow的Pandas UDF:Introducing Pandas UDF for PySpark