演讲 & Hudi 用户

已使用

Uber

Hudi最初由Uber开发,用于实现低延迟、高效率的数据库摄取。 Hudi自2016年8月开始在生产环境上线,在Hadoop上驱动约100个非常关键的业务表,支撑约几百TB的数据规模(前10名包括行程、乘客、司机)。 Hudi还支持几个增量的Hive ETL管道,并且目前已集成到Uber的数据分发系统中。

EMIS Health

EMIS Health是英国最大的初级保健IT软件提供商,其数据集包括超过5000亿的医疗保健记录。HUDI用于管理生产中的分析数据集,并使其与上游源保持同步。Presto用于查询以HUDI格式写入的数据。

Yields.io

Yields.io是第一个使用AI在企业范围内进行自动模型验证和实时监控的金融科技平台。他们的数据湖由Hudi管理,他们还积极使用Hudi为增量式、跨语言/平台机器学习构建基础架构。

Yotpo

Hudi在Yotpo有不少用途。首先,在他们的开源ETL框架中集成了Hudi作为CDC管道的输出写入程序,即从数据库binlog生成的事件流到Kafka然后再写入S3。

演讲 & 报告

  1. “Hoodie: Incremental processing on Hadoop at Uber” - By Vinoth Chandar & Prasanna Rajaperumal Mar 2017, Strata + Hadoop World, San Jose, CA

  2. “Hoodie: An Open Source Incremental Processing Framework From Uber” - By Vinoth Chandar. Apr 2017, DataEngConf, San Francisco, CA Slides Video

  3. “Incremental Processing on Large Analytical Datasets” - By Prasanna Rajaperumal June 2017, Spark Summit 2017, San Francisco, CA. Slides Video

  4. “Hudi: Unifying storage and serving for batch and near-real-time analytics” - By Nishith Agarwal & Balaji Vardarajan September 2018, Strata Data Conference, New York, NY

  5. “Hudi: Large-Scale, Near Real-Time Pipelines at Uber” - By Vinoth Chandar & Nishith Agarwal October 2018, Spark+AI Summit Europe, London, UK

  6. “Powering Uber’s global network analytics pipelines in real-time with Apache Hudi” - By Ethan Guo & Nishith Agarwal, April 2019, Data Council SF19, San Francisco, CA.

  7. “Building highly efficient data lakes using Apache Hudi (Incubating)” - By Vinoth Chandar June 2019, SF Big Analytics Meetup, San Mateo, CA

  8. “Apache Hudi (Incubating) - The Past, Present and Future Of Efficient Data Lake Architectures” - By Vinoth Chandar & Balaji Varadarajan September 2019, ApacheCon NA 19, Las Vegas, NV, USA

  9. “Insert, upsert, and delete data in Amazon S3 using Amazon EMR” - By Paul Codding & Vinoth Chandar December 2019, AWS re:Invent 2019, Las Vegas, NV, USA

  10. “Building Robust CDC Pipeline With Apache Hudi And Debezium” - By Pratyaksh, Purushotham, Syed and Shaik December 2019, Hadoop Summit Bangalore, India

  11. “Using Apache Hudi to build the next-generation data lake and its application in medical big data” - By JingHuang & Leesf March 2020, Apache Hudi & Apache Kylin Online Meetup, China

  12. “Building a near real-time, high-performance data warehouse based on Apache Hudi and Apache Kylin” - By ShaoFeng Shi March 2020, Apache Hudi & Apache Kylin Online Meetup, China

  13. “Building large scale, transactional data lakes using Apache Hudi” - By Nishith Agarwal, June 2020, Berlin Buzzwords 2020.

  14. “Apache Hudi - Design/Code Walkthrough Session for Contributors” - By Vinoth Chandar, July 2020, Hudi community.

  15. “PrestoDB and Apache Hudi” - By Bhavani Sudha Saktheeswaran and Brandon Scheller, Aug 2020, PrestoDB Community Meetup.

  16. “Panel Discussion on Presto Ecosystem” - By Vinoth Chandar, Sep 2020, PrestoCon “panel”.

  17. “Next Generation Data lakes using Apache Hudi” - By Balaji Varadarajan and Sivabalan Narayanan, Sep 2020, “ApacheCon”

  18. “Landing practice of Apache Hudi in T3go” - By VinoYang and XianghuWang, November 2020, Qcon.

    文章

You can check out our blog pages for content written by our committers/contributors.

  1. “The Case for incremental processing on Hadoop” - O’reilly Ideas article by Vinoth Chandar
  2. “Hoodie: Uber Engineering’s Incremental Processing Framework on Hadoop” - Engineering Blog By Prasanna Rajaperumal
  3. “New – Insert, Update, Delete Data on S3 with Amazon EMR and Apache Hudi” - AWS Blog by Danilo Poccia
  4. “The Apache Software Foundation Announces Apache® Hudi™ as a Top-Level Project” - ASF Graduation announcement
  5. “Apache Hudi grows cloud data lake maturity”
  6. “Building a Large-scale Transactional Data Lake at Uber Using Apache Hudi” - Uber eng blog by Nishith Agarwal
  7. “Hudi On Hops” - By NETSANET GEBRETSADKAN KIDANE
  8. “开源数据湖存储框架 Apache Hudi 如何玩转增量处理” - InfoQ CN article by Yanghua
  9. “Origins of Data Lake at Grofers” - by Akshay Agarwal
  10. “Data Lake Change Capture using Apache Hudi & Amazon AMS/EMR” - Towards DataScience article, Oct 20
  11. “How nClouds Helps Accelerate Data Delivery with Apache Hudi on Amazon EMR” - published by nClouds in partnership with AWS
  12. “Apply record level changes from relational databases to Amazon S3 data lake using Apache Hudi on Amazon EMR and AWS Database Migration Service” - AWS blog