Monday, December 26, 2011

Input Format

Programming Lab 2: Input Format
  1. We will learn how to join different sets of data on the same key using regular Map-Reduce approach.
  2. Also, this lab will show limitations of the regular Map-Reduce approach and what developer need to do in order to overcome those limitations
Problem to Solve
  1. We have two sets of data : position information and salary information
  2. We need to join them together!!!
  1. Produce:
Create Project and Link with libraries
  1. Copy provided libraries and java code from USB drives
  2. Create project in NetBeans or Eclipse
  3. Link with libraries
  4. Create new class and copy provided code
  5. Modify input and output directory
  6. Run code and examine result
  7. For detailed instructions on project creating refer to Lab1.
  8. Our key is position in the company (CEO, Engineer, etc.)
  9. Our data is separated by “=”. We will use KeyValueTextInputFormat.
  10. We also need to set special separator since its different from default tab
Expected output
Please, note that order is not quarantine. We need to do special processing to guarantee order.
Task
  1. Additional input folder contains data for vacation time
  2. Additional input folder can be specified by calling  FileInputFormat.addInputPath
  3. We need to add that data for every record

1 comment: