Tuesday, October 8, 2013

Interesting Big-Data news from around the world : October 2013 - Week 1

Bigdata is in fashion. It has become the in-thing! So it is bound to make some news.

Here are some of the interesting reads I have for you from around the world for the first week of October 2013:
  1. Work4 Exploits Facebook's Graph Search - Who's Next?
    Think the only person that can solicit you on Facebook is a friend or friend of a friend? Think again.
    Today, Work4, a Facebook recruiting solution, is unveiling Graph Search Recruiter, a service which gives companies the ability to search for and contact potential job candidates from across Facebook’s entire membership. Except those whose privacy settings prevent it, that is.
    From a recruiter’s perspective, this seems to be as sexy as it gets.

  2. Why big data has made your privacy a thing of the past?
    Despite the efforts of European regulators to protect citizens' personal data, predictive analytics has made it too easy to piece together information about individuals regardless of the law.

  3. Big data blocks gaming fraud
    The explosion of online games has resulted in the creation of a new industry: in-game currency, currently valued at $1 billion in the U.S. alone. But game developers, particularly startups that are rising and falling on a single game, are losing significant revenue as savvy social game players figure out how to “game” the system by stealing currency rather than buying it.

  4. How to Find Out What Big Data Knows About You?
    The world of Big Data is a world of pervasive data collection and aggressive analytics. Some see the future and cheer it on; others rebel. Behind it all lurks a question most of us are asking—does it really matter? I had a chance to find out recently, as I got to see what Acxiom, a large-scale commercial data aggregator, had collected about me.
    At least in theory large scale data collection matters quite a bit. Large data sets can be used to create social network maps and can form the seeds for link analysis of connections between individuals. Some see this as a good thing; others as a bad one—but whatever your viewpoint, we live in a world which sees increasing power and utility in Big Data’s large scale data sets.

  5. Deutsche Telekom speeds up big data with hosted HANA
    Enterprises have another option for accessing SAP’s in-memory database technology: Deutsche Telekom subsidiary T-Systems has been approved to offer HANA Enterprise Cloud.
    The in-memory database technology can process large data volumes from business applications more quickly than standard server implementations, and also supports new integrated analytical methods, according to T-Systems.

Debugging Hadoop MR Java code in local eclipse dev environment

I have been asked multiple times to blog about this. One of my esteemed colleague has already blogged about it, so here I am just re-blogging it.

The basic thing to remember here is that debugging a Hadoop MR job is going to be similar to any remotely debugged application in Eclipse.

A debugger or debugging tool is a computer program that is used to test and debug other programs (the “target” program). It is greatly useful specially for a Hadoop environment wherein there is little room for error and one small error can cause a huge loss.

Debugging Custom Java code for Hadoop in your local eclipse environment is pretty straight forward and does not take much time to setup.

As you would know, Hadoop can be run in the local environment in 3 different modes :

1. Local Mode
2. Pseudo Distributed Mode
3. Fully Distributed Mode (Cluster)

Typically you will be running your local hadoop setup in Pseudo Distributed Mode to leverage HDFS and Map Reduce(MR). However you cannot debug MR programs in this mode as each Map/Reduce task will be running in a separate JVM process so you need to switch back to Local mode where you can run your MR programs in a single JVM process.

Here are the quick and simple steps to debug this in your local environment:

1. Run hadoop in local mode for debugging so mapper and reducer tasks run in a single JVM instead of separate JVMs. Below steps help you do it.

2. Configure HADOOP_OPTS to enable debugging so when you run your Hadoop job, it will be waiting for the debugger to connect. Below is the command to debug the same at port 8080.

(export HADOOP_OPTS=”-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8008“)

3. Configure fs.default.name value in core-site.xml to file:/// from hdfs://. You won’t be using hdfs in local mode.

4. Configure mapred.job.tracker value in mapred-site.xml to local. This will instruct Hadoop to run MR tasks in a single JVM.

5. Create debug configuration for Eclipse and set the port to 8008 – typical stuff. For that go to the debugger configurations and create a new Remote Java Application type of configuration and set the port as 8080 in the settings.

7. Run your hadoop job (it will be waiting for the debugger to connect) and then launch Eclipse in debug mode with the above configuration. Do make sure to put a break-point first.

That is all you need to do.

Any feedback, good or bad is most welcome.


Email *

Message *