How much data does google handle??

SIBAYAN BAG
3 min readSep 17, 2020

--

This is one of those kind of questions whose answer can never be accurate. On a funnier note, it is like a child asking how many stars are there up in the sky?? which is somewhat similar to asking “how much data does google handle??”

We all know Google is the only one who can answer any kind of question!! We simply conclude that Google knows everything!! And Everything means Everything! Now you must be wondering how much data does google handle to answer all these questions!!??Yes it holds a whole lot of data to answer any kind of question u ask it!! Google doesn’t provide numbers on how much data they store.

Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide. Google currently processes over 20 petabytes of data per day.

What are these new terms? Petabytes or Exabytes? The highest data size I have heard till now is Terabyte(TB). 1 Petabyte(PB) = 1024 Terabytes(TB) 1 Exabyte(EB)= 1024 Petabyte(PB) An exabyte can be understood as 1 million Terabytes(TB). So , from this we can slowly understand this huge amount of data. Its hard to calculate this huge amount of data. we can come to a conclusion that Google holds 10–15 Exabytes of data. This equals to data of 30 Million PCs combined.

To Handle This ??

The January 2008 MapReduce paper provides new insights into Google’s hardware and software crunching processing tens of petabytes of data per day. Google converted its search indexing systems to the MapReduce system in 2003, and currently processes over 20 terabytes of raw web data. It’s some fascinating large-scale processing data that makes your head spin and appreciate the years of distributed computing fine-tuning applied to today’s large problems.

MapReduce ??

MapReduce is a core component of the Apache Hadoop software framework.

Hadoop enables resilient, distributed processing of massive unstructured data sets across commodity computer clusters, in which each node of the cluster includes its own storage. MapReduce serves two essential functions: it filters and parcels out work to various nodes within the cluster or map, a function sometimes referred to as the mapper, and it organizes and reduces the results from each node into a cohesive answer to a query, referred to as the reducer.

How MapReduce works

The original version of MapReduce involved several component daemons, including:

  • JobTracker — the master node that manages all the jobs and resources in a cluster;
  • TaskTrackers — agents deployed to each machine in the cluster to run the map and reduce tasks; and
  • JobHistory Server — a component that tracks completed jobs and is typically deployed as a separate function or with JobTracker.

To distribute input data and collate results, MapReduce operates in parallel across massive cluster sizes. Because cluster size doesn’t affect a processing job’s final results, jobs can be split across almost any number of servers. Therefore, MapReduce and the overall Hadoop framework simplify software development.

The power of MapReduce is in its ability to tackle huge data sets by distributing processing across many nodes, and then combining or reducing the results of those nodes.

As a basic example, users could list and count the number of times every word appears in a novel as a single server application, but that is time-consuming. By contrast, users can split the task among 26 people, so each takes a page, writes a word on a separate sheet of paper and takes a new page when they’re finished. This is the map aspect of MapReduce. And if a person leaves, another person takes his or her place. This exemplifies MapReduce’s fault-tolerant element.

When all the pages are processed, users sort their single-word pages into 26 boxes, which represent the first letter of each word. Each user takes a box and sorts each word in the stack alphabetically. The number of pages with the same word is an example of the reduce aspect of MapReduce.

So, by reading this article We can conclude that how much of huge data google holds? And how they handles those huge data and tackled it properly ?

Thank you for reading this article . I hope it will help you to understand the Big Data problem.

--

--

No responses yet