Leopards Blog


How Google manages data.
1. Collect MapReduce doesn’t depend on a traditional structured database, where  information is categorized as it’s collected. We’ll just gather up the  full text of every book Google has scanned.
2. Map You write a function to map the data: “Count every use of every word in  Google Books.” That request is then split among all the computers in  your army, and each agent is assigned a hunk of data to work with.  Computer A gets War and Peace, for example. That machine  knows what words that book contains, but not what’s inside Anna  Karenina.
3. Save Each of the hundreds of PCs doing a map writes the results to its local  hard drive, cutting down on data transfer time. The computers that have  been assigned “reduce” functions grab the lists from the mappers.
4. Reduce The Reduce computers  correlate the lists of words. Now you know how many times a particular  word is used, and in which books.
5. Solve The result? A data set about your data. In our example, the final list  of words is stored separately so it can be quickly referenced or  queried: “How often does Tolstoy mention Moscow? Paris?” You don’t have  to plow through unrelated data to get the answer.
 Wired

How Google manages data.

1. Collect
MapReduce doesn’t depend on a traditional structured database, where information is categorized as it’s collected. We’ll just gather up the full text of every book Google has scanned.

2. Map
You write a function to map the data: “Count every use of every word in Google Books.” That request is then split among all the computers in your army, and each agent is assigned a hunk of data to work with. Computer A gets War and Peace, for example. That machine knows what words that book contains, but not what’s inside Anna Karenina.

3. Save
Each of the hundreds of PCs doing a map writes the results to its local hard drive, cutting down on data transfer time. The computers that have been assigned “reduce” functions grab the lists from the mappers.

4. Reduce
The Reduce computers correlate the lists of words. Now you know how many times a particular word is used, and in which books.

5. Solve
The result? A data set about your data. In our example, the final list of words is stored separately so it can be quickly referenced or queried: “How often does Tolstoy mention Moscow? Paris?” You don’t have to plow through unrelated data to get the answer.


 Wired
  1. jazmokology reblogged this from shaneguiter
  2. betacar reblogged this from proofmathisbeautiful
  3. flyonair reblogged this from dans-ce-pot
  4. dans-ce-pot reblogged this from proofmathisbeautiful
  5. mohammednasim reblogged this from lickystickypickywe
  6. pixiesuicide reblogged this from proofmathisbeautiful
  7. azfarmukmin reblogged this from lickystickypickywe
  8. vovomark reblogged this from proofmathisbeautiful
  9. 5hane reblogged this from shaneguiter and added:
    I was just reading about this the other day. Map/Reduce and BigTable are really cool. Reminds me of using a botnet to...
  10. shaneguiter reblogged this from danimunoz
  11. ublockedmeonfacebook reblogged this from lickystickypickywe
  12. cingulomania reblogged this from proofmathisbeautiful
  13. firesaw reblogged this from proofmathisbeautiful
  14. roomthily reblogged this from proofmathisbeautiful
  15. proofmathisbeautiful reblogged this from lickystickypickywe
  16. leopardsblog reblogged this from lickystickypickywe
  17. enoughthunder reblogged this from lickystickypickywe