DataWorks Summit
DataWorks Summit
  • 230
  • 327 254
APACHE METRON INCUBATING AS A CASE STUDY OF A MODERN STREAMING ARCHITECTURE ON HADOOP
There have been many voices discussing how to architect streaming
applications on Hadoop. Before now, there have been very few worked
examples existing within the open source. Apache Metron (Incubating) is a
streaming advanced analytics cybersecurity application which utilizes
the components within the Hadoop stack as its platform.
Link to Slides: www.slideshare.net/Hadoop_Summit/bringing-it-all-together-apache-metron-incubating-as-a-case-study-of-a-modern-streaming-architecture-on-hadoop
We will attempt to go beyond theoretical discussions of Kappa vs Lambda
architectures and describe the nuts and bolts of a streaming
architecture that enables advanced analytics in Hadoop. We will discuss
the componentry that we had to build and what we could utilize. We will
discuss why we made the architectural decisions that we made and how
they fit together to knit together a coherent application on top of many
different Hadoop ecosystem projects.
We will also discuss the domain specific language that we created out of
necessity to enable a pluggable layer to enable user defined enrichments.
We will discuss how this helped make Metron less rigid and easier to
use. We will also candidly discuss mistakes that we made early on.
Speaker:
CASEY STELLA
Principal Software Engineer/Data Scientist
Hortonworks
Link to Slides: www.slideshare.net/Hadoop_Summit/bringing-it-all-together-apache-metron-incubating-as-a-case-study-of-a-modern-streaming-architecture-on-hadoop
Link to event session page: dataworkssummit.com/san-jose-2017/sessions/bringing-it-all-together-apache-metron-incubating-as-a-case-study-of-a-modern-streaming-architecture-on-hadoop/
Переглядів: 4 820

Відео

Enabling Modern Application Architecture using Data gov open government data
Переглядів 7067 років тому
Enabling Modern Application Architecture using Data gov open government data
Introduction to Deep Learning and AI at Scale for Managers
Переглядів 5737 років тому
Introduction to Deep Learning and AI at Scale for Managers
LLAP Building Cloud First BI
Переглядів 4867 років тому
LLAP Building Cloud First BI
The Future of Data Science
Переглядів 8667 років тому
The Future of Data Science
Analyst Panel Unravels the Data Industry
Переглядів 547 років тому
Analyst Panel Unravels the Data Industry
Securing Enterprise Healthcare Big Data by the Combination of Knox F5, Ranger, TFA and Kerberos Coup
Переглядів 1927 років тому
Securing Enterprise Healthcare Big Data by the Combination of Knox F5, Ranger, TFA and Kerberos Coup
Schema Registry - Set your data free
Переглядів 11 тис.7 років тому
Schema Registry - Set your data free
Security, ETL, BI & Analytics, and Software Integration
Переглядів 3657 років тому
Liberty Mutual Enterprise Data Lake Use Case Study Link to slides: www.slideshare.net/Hadoop_Summit/security-etl-bi-analytics-and-software-integration By building a data lake, Liberty Mutual Insurance Group Enterprise Analytics department has created a platform to implement various big data analytic projects. We will share our journey and how we leveraged Hortonworks Hadoop distribution and oth...
HDFS Tiered Storage Mounting Object Stores in HDFS
Переглядів 5347 років тому
HDFS Tiered Storage Mounting Object Stores in HDFS
APACHE HADOOP YARN PRESENT AND FUTURE
Переглядів 7197 років тому
This session is a (Advanced) talk in our Apache Hadoop track. It focuses on Apache Hadoop, Docker / Container and is geared towards Architect, Developer / Engineer, Operations / IT audiences.
Open Source in the Energy Industry
Переглядів 757 років тому
Open Source in the Energy Industry
Continuous Data Ingestion pipeline for the Enterprise
Переглядів 2,3 тис.7 років тому
Continuous Data ingestion platform built on NIFI and Spark that integrates variety of data sources including real-time events, data from external sources , structured and unstructured data with in-flight governance providing a real-time pipeline moving data from source to consumption in minutes. The next-gen data pipeline has helped eliminate the legacy batch latency and improve data quality an...
WHOOPS, THE NUMBERS ARE WRONG! SCALING DATA QUALITY @ NETFLIX
Переглядів 6 тис.7 років тому
Netflix is a famously data-driven company. Data is used to make informed decisions on everything from content acquisition to content delivery, and everything in-between. As with any data-driven company, it’s critical that data used by the business is accurate. Or, at worst, that the business has visibility into potential quality issues as soon as they arise. But even in the most mature data war...
WHAT’S POSSIBLE WITH AI AND DATA IN 2017?
Переглядів 2217 років тому
This session is a talk in our Enterprise Adoption track. It focuses on Apache Spark, Other and is geared towards CXO, Architect, Data Scientist, Developer / Engineer audiences.
Semi Supervised Learning In An Adversarial Environment
Переглядів 1,1 тис.7 років тому
Semi Supervised Learning In An Adversarial Environment
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud PrevenUon
Переглядів 2417 років тому
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud PrevenUon
Governance Bots Metadata Driven Compliance Through AI, Atlas and NiFi
Переглядів 8577 років тому
Governance Bots Metadata Driven Compliance Through AI, Atlas and NiFi
DRUID SUB SECOND OLAP QUERIES OVER PETABYTES OF STREAMING DATA
Переглядів 3,9 тис.7 років тому
DRUID SUB SECOND OLAP QUERIES OVER PETABYTES OF STREAMING DATA
How Big Data and Predictive Analytics are revolutionizing AML and Financial Crime Detection
Переглядів 9 тис.7 років тому
How Big Data and Predictive Analytics are revolutionizing AML and Financial Crime Detection
HANDLING KERNEL UPGRADES AT SCALE - THE DIRTY COW STORY
Переглядів 4027 років тому
HANDLING KERNEL UPGRADES AT SCALE - THE DIRTY COW STORY
Beyond unit tests Deployment and testing for Hadoop Spark workflows
Переглядів 5537 років тому
Beyond unit tests Deployment and testing for Hadoop Spark workflows
DEEP LEARNING WITH SPARK AND GPUS
Переглядів 1547 років тому
DEEP LEARNING WITH SPARK AND GPUS
End to end Data Governance with Apache Avro and Atlas
Переглядів 13 тис.7 років тому
End to end Data Governance with Apache Avro and Atlas
DATA FEDERATION WITH APACHE SPARK
Переглядів 1,2 тис.7 років тому
DATA FEDERATION WITH APACHE SPARK
AN APACHE HIVE BASED DATA WAREHOUSE
Переглядів 3,5 тис.7 років тому
AN APACHE HIVE BASED DATA WAREHOUSE
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 mins without writing any
Переглядів 2,1 тис.7 років тому
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 mins without writing any
Future Architecture of Streaming Analytics Capitalizing on the Analytics of Things AoT
Переглядів 1897 років тому
Future Architecture of Streaming Analytics Capitalizing on the Analytics of Things AoT
The Unbearable Lightness of Ephemeral Processing
Переглядів 507 років тому
The Unbearable Lightness of Ephemeral Processing
Scalable Data Science with SparkR
Переглядів 2187 років тому
Scalable Data Science with SparkR

КОМЕНТАРІ

  • @paulfunigga
    @paulfunigga 2 місяці тому

    Hive is dead

  • @deepakkumaar9665
    @deepakkumaar9665 Рік тому

    Thanks🎉

  • @nosh3019
    @nosh3019 Рік тому

    Write Audit Publish pattern at 16:34 , good talk

  • @murali1790able
    @murali1790able Рік тому

    20:40 Why would proxyClient write to Kafka cluster when you have Kafka Rest proxy for that job? Also as per your explanation both Local Agent and ProxyClient are doing the same job. batching and sending the messages when downstream recovers.. you should have explained with an example..

  • @saisanthoshiss2558
    @saisanthoshiss2558 Рік тому

    You guys know soo much wy don't u create some tutorials in UA-cam or udemy to contribute to other associates

  • @only1jones2
    @only1jones2 2 роки тому

    @2:05 I thought I was watching Dr Katz

  • @kalpatarusahoo6309
    @kalpatarusahoo6309 2 роки тому

    it is in which langauage #java or #python

  • @kunalrai01
    @kunalrai01 2 роки тому

    Sir, Is online learning algorithm same as self learning algorithm ?

  • @gopinathpavani74
    @gopinathpavani74 2 роки тому

    Is there masked dataset can be tested for proving false positives in SAR or STR

  • @refiloeshabe4795
    @refiloeshabe4795 2 роки тому

    Enjoyed this talk so much!

  • @sirius885
    @sirius885 3 роки тому

    How about saving state for Session windows? Currently, looks like Beam doesn't support Stateful processing for Merged windows, any other approach to achieve same?

  • @valertube
    @valertube 3 роки тому

    Amazing! 👏👏👏

  • @patryknextdoor
    @patryknextdoor 3 роки тому

    🔥🔥🔥🔥

  • @andalupu6145
    @andalupu6145 3 роки тому

    I use Impala query (on HUE) in order to transform and load data into Kudu tables but how can I execute these queries autoamatically in order to update a kudu table on daily basis? I would very much appreciate your help, many thanks

  • @TaniyaLanka
    @TaniyaLanka 3 роки тому

    Great thanks

  • @wxjeacen
    @wxjeacen 3 роки тому

    really concern about the comparison on query performance data among Spark SQL, Hive, Presto etc. How can presto be that slower than Hive???

  • @Kvakonik
    @Kvakonik 3 роки тому

    Wow, thats physically painful to watch.

  • @fytubevw
    @fytubevw 3 роки тому

    Note for (some) developers: This is NOT a video about Yarn - the software package manager for JavaScript/TypeScript. This is about a computing cluster technology called YARN.

  • @zes3813
    @zes3813 3 роки тому

    wrr

  • @gokukakarot6323
    @gokukakarot6323 4 роки тому

    These talks are so useless, they don't discuss any problems or whatever lead to the use of these techs than the other. anyone can come up with this HLD

  • @aysegulsonmez278
    @aysegulsonmez278 4 роки тому

    Thank you for sharing this video.

  • @afshaanmazg
    @afshaanmazg 4 роки тому

    Really enjoyed this talk !

  • @BessyBasil
    @BessyBasil 4 роки тому

    Simple and crisp explanation.Many Thanks

  • @anand_iyer
    @anand_iyer 4 роки тому

    Doesn't talk about SSL scaling, caching and how the application picks up the DC. The author keeps saying aye as though it's a pirate movie 😂, like it.

  • @greatmonk
    @greatmonk 4 роки тому

    very well explained...

  • @Kothwalashivakumar
    @Kothwalashivakumar 4 роки тому

    Thank You for this video.

  • @carnotantonioromero3024
    @carnotantonioromero3024 4 роки тому

    Oh, I wish there were a transcript of this! Who has time to watch a video?

  • @kiraninam
    @kiraninam 4 роки тому

    Extremely confusing.

  • @olegzastavnyi6402
    @olegzastavnyi6402 4 роки тому

    Don't recommend to use NIFi for data ingestion. It's difficult to work with flow provisioning / IT testing is almost impossible, using existing components and building flows become a nightmare when it grows. No easy way to add ingestion as a pipeline into the existing system (like Oozie). Flow changes are difficult to track (no integration with git out of the box).

  • @lonw.7016
    @lonw.7016 4 роки тому

    i completed the html certification from refsnes data. w3shools. paid for the certificate. did not receive my certificate. where do i go from here.

  • @dhruva1221
    @dhruva1221 4 роки тому

    book reading

  • @goelnishank9
    @goelnishank9 4 роки тому

    Hi Moon, Thanks for the wonderful video !!! Is it mandatory to keep the packages inside zeppelin/helium or we can keep it any where. Please suggest which property need to set in zeppelin-site property file. I have a requirement to add TOC(zeppelin-toc-spell) package and it a part of Zeppelin-Spell.

  • @huhu5375
    @huhu5375 5 років тому

    This video clearly mention what is the architecutre of uber. But did not said how it evolve to trillion event

  • @EhsanIrshad
    @EhsanIrshad 5 років тому

    Can we estimate from this how much gps data a particular road is recieving on real time? Need your prophetic remarks thank you

  • @simdeniro
    @simdeniro 5 років тому

    Impressive architecture...

  • @8Trails50
    @8Trails50 5 років тому

    Yeah, real innovation here. Use the fuck out of kafka. Well done

  • @SimulationSeries
    @SimulationSeries 5 років тому

    great talk Jeremy!

  • @rudhisundar
    @rudhisundar 5 років тому

    Thank you, Bhai!!

  • @zexli6709
    @zexli6709 5 років тому

    Sounds like the mic wasn't quite happy there

  • @ArifTak
    @ArifTak 5 років тому

    Very helpful, thank you

  • @welderbatistadeoliveira2209
    @welderbatistadeoliveira2209 5 років тому

    Hi Muhammad Imran! How would you charge me for private online lessons or actually solve some problems with SparkR? I could pay you by Paypal or even by hiring you as a freelancer

  • @crystyxn
    @crystyxn 5 років тому

    www.slideshare.net/HadoopSummit/file-format-benchmark-avro-json-orc-parquet

  • @kennethcarvalho3684
    @kennethcarvalho3684 5 років тому

    Detailed but concise presentation, a great combination and indeed extremely helpful. Thank you for sharing.

  • @attatahir
    @attatahir 5 років тому

    do you consider drivers are machine? from these kind of techs humans are not humans anymore....

  • @saurabhdoshi
    @saurabhdoshi 5 років тому

    Good informative session. If you are interested only in Presto (Not Teradata) listen till 19th minute.

  • @m1k3y23
    @m1k3y23 5 років тому

    The introduction is way too long and tangential. Please make your presentations more concise in the future

    • @KK-oo8se
      @KK-oo8se 7 місяців тому

      i think you were probably not listening ... presenter did mention that there are a lot of ppl in the audience who dont have much background.

  • @sillyfly9530
    @sillyfly9530 6 років тому

    why does the video flicker so badly? :(

    • @TheVamos101
      @TheVamos101 6 років тому

      I think it was recorded from a projector's display.

  • @Textras
    @Textras 6 років тому

    Sound issues. But the volume of messages makes sound quality irrelevant.

  • @amitav1978
    @amitav1978 6 років тому

    Too Good one

  • @arunsinghk1
    @arunsinghk1 6 років тому

    Had not noticed the sound until halfway after which I read the comments!!

    • @5d3d2d
      @5d3d2d Рік тому

      Haha its hard to unnotice the sound