Sunday, March 31, 2013

Hadoop : How to have nested directories as input path ?


At times our input directories have sub-directories in them.
And when we add such directory as our input path we are bound to get exception like follows:
java.io.IOException: Not a file
Say we have the following directory structure
mainInputDirectory
                inputSubDirectory1
                                1234
                                abcdefg
                inputSubDirectory2
                                1234
                                hikjlmno
                inputDirectoryNotNeededAsInput
                                somefolderName
Hadoop supports input paths to be a regular expression. I haven’t experimented with a lot of complex regex, but the simple placeholders ? and * does work.
Now, say, we only want to have the input to be from the directories named abcdefg, this can be achieved by having the input path to be something like follows:
[s3-bucket-path-or-hdfs-path]/mainInputDirectory/inputSubDirectory*/?????*/*
To explain this:
In order to exclude sub-directory inputDirectoryNotNeededAsInput and have all the directories with names starting from inputSubDirectory, we have the following construct in the input path:
/inputSubDirectory*/
In order to exclude sub-directories named 1234 (denote that these directories have names with just 4 letters) and have all the directories with names having more than 4 letters eg : abcdefg, we have the following construct in the input path:
/?????*/

16 comments:

  1. This is a great inspiring tutorials.I am pretty much pleased with your good work.You put really very helpful information. Keep it up.
    Hadoop Training in hyderabad

    ReplyDelete
    Replies
    1. I have read your blog its very attractive and impressive. I like it your blog.


      JavaEE Training in Chennai JavaEE Training in Chennai

      Java Training in Chennai Core Java Training in Chennai Core Java Training in Chennai

      Java Online Training Java Online Training Core Java 8 Training in Chennai Java 8 Training in Chennai

      Delete
    2. Hibernate Online Training Hibernate Online Training Hibernate Training in Chennai Hibernate Training in Chennai Java Online Training Java Online Training Hibernate Training Institutes in ChennaiHibernate Training Institutes in Chennai

      Delete
  2. Thanks for your post; selenium is most trusted automation tool to validate web application and browser. This tool provides precise and complete information about a software application or environment. Selenium Training in Chennai | Selenium Course in Chennai | Best Selenium training institute in Chennai

    ReplyDelete
  3. Latest technology have created a greater impact over testing web applications. This vital in identifying important issues that raises in web appplications. Thanks for sharing this information in here. Keep blogging article like this.

    Selenium training in chennai | Best selenium training institute in chennai | Selenium testing course in chennai

    ReplyDelete
  4. This data is great and amazing. A debt of gratitude is in order for taking an ideal opportunity to talk about this, I feel upbeat about it and I adore adapting more about this theme. I utilize your manual for teach my understudies.
    Thanks,
    Selenium Training institute in Chennai | Selenium Training in Chennai | Selenium Training

    ReplyDelete
  5. Thanks for posting the blog. I am new to this site, but I felt comfortable while reading the post. Keep posting on topics like this.
    Oracle Fusion SCM Training

    ReplyDelete
  6. I wish to show thanks to you just for bailing me out of this particular
    trouble.As a result of checking through the net and meeting
    techniques that were not productive, I thought my life was done.java training in chennai

    ReplyDelete
  7. Great information. I have got some important suggestions from it. Thank you for sharing.
    Best VMware Training Institute in Chennai | Best VMware Training Institute in Velachery

    ReplyDelete
  8. I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.
    aws training in Chennai

    ReplyDelete
  9. It is very useful information. It will help to improve my knowledge in Selenium. Thank you for sharing this awesome site.
    Selenium Training in Chennai | Selenium Training | Selenium Course in Chennai | Selenium Training Institute in Chennai

    ReplyDelete
  10. Great blog! Thanks for giving such valuable information, this is unique one. Really admired.

    Selenium Training in Chennai

    ReplyDelete

Any feedback, good or bad is most welcome.

Name

Email *

Message *