January 18, 2024 Meetup
St. Louis Linux Users Group
Hadoop for big data. An intro.
Presented By: Steven Lembark
Apache Hadoop is an open src, Java-based sftwr platform/ecosystem that manages processing & storage for big data apps. It handles datasets ranging in size from gigabytes to petabytes of data. They can be fed and analyzed by many distributed computers over many distributed disk farms to be read and analyzed by many dispersed computers requesting data.
Hadoop is an ecosystem of open source components that fundamentally changes the way enterprises store, process, and analyze data.
In the infancy of The Internet, there was the quest to 'find stuff'. 'Search engines' were needed. Google, AltaVista, Yahoo, AskJeeves,...all had ideas how to do it.
Inspired by their MapReduce, a programming model that divides an application into small fractions to run on different nodes, Google started Hadoop in 2002 while working on the Apache Nutch.
In 2003, Hadoop was in the academic paper describing the 'Google File System'. In 2006, the Apache Software Foundation released an open src version.
Altho now there are other tools used for such large data (ex Apache Hive · Apache Spark · Amazon EMR · Azure Data Lake Storage · IBM Analytics Engine · Hortonworks Data Platform · Apache Pig, Clarissa,....) there are still those depending on Hadoop, including Netflix.
So, Steven will tell us…
An Overview of the Apache Hadoop Ecosystem:
There is stuff that's growing on your data warehouse hard disks.
In the beginning was Hadoop, and was, well, Google's. And everyone tried it.
But as Google dropped the approach as ineffective lots of other folks had found ways to make pieces of it work, added new pieces to it, and out of the ashes of single-purpose Hadoop grew the Apache Hadoop ecosystem.
Today this includes a variety of software for intake,querying, mapping SQL to key:value stores, and a few other cute tricks.
This talk will look at the pieces of this ecosystem, a bit about how they fit together, and how they can be used for Really Truly HUUUUUUGE data processing.
Spread the word
@TerminalTinkerer • 6h ago
Don't miss Steven Lembark's talk on Jan 18, 2024: 'Hadoop for big data. An intro.' Discover how the Apache Hadoop ecosystem has evolved to tackle REALLY BIG data! #Hadoop #BigData @SLUUG_Org https://www.meetup.com/saint-louis-unix-users-group/events/298136400/
Meeting Artifacts and Media
Meeting Agenda
At 6:00p.m. Central Time the meeting opens. Participants are encouraged to join at this time to if they need to test their microphone, screen sharing, and video camera.
At 6:30p.m. Central Time we attempt a quick welcome, introductions, announcements, current events of interest, and a general CALL FOR HELP (Questions and Answers) segment.
At 6:45p.m. Central Time the presentation begins.