Data Science Lead (NLP & GenAI)

Summary
We are seeking a highly experienced and innovative Data Science Lead with 8+ years of expertise in core data science concepts and around 2+ years of focused, hands-on experience in Natural Language Processing (NLP) and Generative AI (GenAI). You will lead strategic AI/ML initiatives, mentor junior data scientists, and deliver intelligent solutions that drive business value using both classical and modern machine learning techniques.

Key Responsibilities
  • Lead end-to-end design and delivery of data science solutions, from problem definition to deployment.
  • Design, build, and fine-tune NLP and GenAI models for tasks such as summarization, classification, question answering, translation, and chatbot applications.
  • Apply statistical modeling, predictive analytics, and machine learning algorithms on structured and unstructured datasets.
  • Collaborate with product, engineering, and business teams to translate high-level business problems into data science solutions.
  • Ensure scalability, reproducibility, and performance optimization in all machine learning workflows.
  • Work with large-scale data processing tools and frameworks in cloud-based environments.
  • Mentor and review work of junior data scientists and collaborate on research and experimentation.
  • Track advancements in GenAI, LLMs, and NLP frameworks and bring innovation to enterprise AI use cases.

Mandatory Skills
  • Python: Strong proficiency in Python for data science, modeling, and scripting
  • Machine Learning: Hands-on with classical and ensemble models (e.g., Random Forest, XGBoost)
  • NLP (2+ years): Experience with transformers, tokenization, embeddings, sentiment analysis
  • GenAI & LLMs: Working with GPT-like models, fine-tuning, prompt engineering
  • Deep Learning (PyTorch / TensorFlow): Building and training deep learning models for NLP and other domains
  • Model Deployment: Deploying models via REST APIs, Docker, or cloud-native services
  • SQL & Data Manipulation: Strong ability to query, clean, and process data
  • Statistical Analysis: Applied statistics, hypothesis testing, and A/B testing
  • Version Control (Git): Experience using Git in collaborative environments

Optional/nice-to-have skills
  • Vector Databases: Experience with FAISS, Pinecone, or ChromaDB for semantic search
  • RAG Architecture: Building Retrieval-Augmented Generation pipelines
  • LLM Orchestration: LangChain, LlamaIndex, or similar frameworks
  • Cloud Platforms (Azure/GCP/AWS): Cloud-based ML workflows, pipelines, and infrastructure
  • MLOps: Model tracking, monitoring, CI/CD with MLflow, Kubeflow, etc.
  • Big Data Tools: Spark, Databricks, or Hadoop ecosystem familiarity
  • Experiment Tracking: Tools like Weights & Biases, MLflow
  • Academic Research / Publications: Experience publishing whitepapers or research contributions
  • Hand-on experience with Databricks, preferably Azure Databricks platform.
  • Hand-on experience with Delta Lake, preferably Azure Databricks and ADLS Gen2 platforms.

Educational Qualifications
Master’s or PhD in Computer Science, Data Science, AI/ML, Statistics, or a related field.

Certifications (preferred but not mandatory)
  • Google Cloud or Azure AI Engineer / Data Scientist Associate
  • Databricks Certified Machine Learning Professional
  • DeepLearning.AI Generative AI certification
  • Hugging Face Transformers certification

Required Skills

Generative AI Lead and Drive Outcomes LLM Communicates Effectively Creative Problem Solving ML Transformative and Strategic Thinking