Apache Hadoop is a Java-based cross-platform open-source software framework used in the increasingly popular world of ‘big data’ for distributed storage and processing of large data sets. The most prominent users of Hadoop are Yahoo!, Facebook, public cloud spaces like Microsoft Azure, Google Compute Engine, and Amazon Web Services, and over half of the Fortune 50.
Hadoop is relatively new, created by Doug Cutting and Mike Cafarella back in 2005. It is therefore not surprising there are only a few database administrators who can call themselves Hadoop experts. There are several questions that you should ask during the interview if you are to outsource the services of the top DBA expert’s team.
You might get a well-trained and experienced DBA, but it pays to have one whose experience is with Hadoop and not any other big data solutions. Ask if the DBA has a more than basic understanding of the base Hadoop framework, which comprises of:
- Hadoop Common which consists of the utilities and libraries used by the other modules
- HDFS (Hadoop Distributed File System) which is a distributed file-system where data on commodity machines is stored for high aggregate bandwidth in a cluster
- Hadoop YARN which is a resource-management platform that manages cluster computing resources and uses resources for scheduling of a user’s applications
- Hadoop Map Reduce which is a programming model used in big data processing at a large scale
A good DBA team is one that has been in business for several years as longevity in the business is indicative of good work and of relevant experience. You could get tips on the experience of a Hadoop DBA team from discussion forums and review sites.
Do you know the commercial applications of Hadoop?
Hadoop has more applications than Map Reduce jobs and your DBA should be aware of these possible applications to reap maximum benefits from the solution. Apache is still developing some of these applications and they include Apache Hive Data Warehouse system, HBase database, and Apache Mahout Machine learning system. Some of the commercial applications that you can use Hadoop for are:
- Clickstream analysis
- Log analysis
- Marketing analytics
- Sophisticated data mining
- Machine learning
- Image processing
- XML message processing
- Web crawling
- Text processing
- General archiving, including of tabular/relational data to ensure there is compliance
What is the most important role of a DBA?
This question will help you know if the DBA is aware of his role in the organization, how to interact with other departments, and of the importance of Hadoop in the organization. Hadoop database administration is more than managing MapReduce jobs in a back room. Your DBA wears many hats.
How will you manage upgrades?
Hadoop is open source, meaning you will not pay for the software. This is, however, not to say that you should not keep track of updates. Your DBA should have a plan on updates since these aim to fix bugs and to add functionalities. You should also ask about plans and timelines for migrations and projects as well as lessons learned from failures and successes.
What is the trend in Big Data?
A good DBA is one who is knowledgeable on what is happening in the world of big data since this information will give you an edge over your competitors. Ask if the DBA is involved in relevant discussion forums and developer communities. You want a DBA who is flexible and receptive to new ideas.
How do you troubleshoot in your current or previous role?
Troubleshooting is one of the most important skills in a DBA since downtime means loss of revenue. Although it is possible to learn from one’s failures and successes over time, troubleshooting skills are mostly a talent. A good DBA is one who is methodical in troubleshooting. A good candidate is one who is able to explain his/her thought process clearly, stand up for decisions made, and listens to team members. Ask the DBA how he/she will interact with the vendor since vendor support is not free.
Tell me about yourself?
Although this question will not tell you about the technical abilities of your DBA, it will help you get an insight into the type of person you are interviewing. You should be looking for a DBA who is all-rounded to avoid conflict in the workplace. This question will also help you determine the candidate is confident and bold enough for the critical task ahead.
Tell me about your firm?
It is always a good idea to hire a remote DBA expert who is part of a larger team in an organization as opposed to hiring an individual. This is because a team member will be able to get assistance from team members and an organization is easier to vet compared to an individual.
Which tools do you use?
If you are to succeed in big data, your DBA should also have other related skills. Some of the most important of these are data modeling and networking. He/she should also have the tools necessary to make work easier. Other than Hadoop, your DBA should also be conversant with other relevant tools for such tasks as:
- Performance monitoring
- Data modeling
- Backup compression
- Change management
Throw in one or two questions that would be considerably difficult or stressful to see how your candidate handles pressure.
Are you dealing with other clients?
If you are outsourcing the service, your DBA is most likely dealing with other clients. This question will help you determine the trustworthiness of the candidate. A good candidate is one who is honest about other clients, but who tells you what he/she intends to do to ensure there is no conflict of interest. It is, however, your role to ensure the DBA you are hiring is not also working with your direct competitors.
What do you know about our company/organization?
A good DBA will have researched on what your company/organization is all about, keeping in mind that different companies/organizations have different needs. A positive answer is indicative of a thorough candidate.