Lead Data Scientist

Data Science Lead (NLP & GenAI)

Summary

We are seeking a highly experienced and innovative Data Science Lead with 8+ years of expertise in core data science concepts and around 2+ years of focused, hands-on experience in Natural Language Processing (NLP) and Generative AI (GenAI). You will lead strategic AI/ML initiatives, mentor junior data scientists, and deliver intelligent solutions that drive business value using both classical and modern machine learning techniques.

Key Responsibilities

Lead end-to-end design and delivery of data science solutions, from problem definition to deployment.
Design, build, and fine-tune NLP and GenAI models for tasks such as summarization, classification, question answering, translation, and chatbot applications.
Apply statistical modeling, predictive analytics, and machine learning algorithms on structured and unstructured datasets.
Collaborate with product, engineering, and business teams to translate high-level business problems into data science solutions.
Ensure scalability, reproducibility, and performance optimization in all machine learning workflows.
Work with large-scale data processing tools and frameworks in cloud-based environments.
Mentor and review work of junior data scientists and collaborate on research and experimentation.
Track advancements in GenAI, LLMs, and NLP frameworks and bring innovation to enterprise AI use cases.

Mandatory Skills

Python: Strong proficiency in Python for data science, modeling, and scripting
Machine Learning: Hands-on with classical and ensemble models (e.g., Random Forest, XGBoost)
NLP (2+ years): Experience with transformers, tokenization, embeddings, sentiment analysis
GenAI & LLMs: Working with GPT-like models, fine-tuning, prompt engineering
Deep Learning (PyTorch / TensorFlow): Building and training deep learning models for NLP and other domains
Model Deployment: Deploying models via REST APIs, Docker, or cloud-native services
SQL & Data Manipulation: Strong ability to query, clean, and process data
Statistical Analysis: Applied statistics, hypothesis testing, and A/B testing
Version Control (Git): Experience using Git in collaborative environments

Optional/nice-to-have skills

Vector Databases: Experience with FAISS, Pinecone, or ChromaDB for semantic search
RAG Architecture: Building Retrieval-Augmented Generation pipelines
LLM Orchestration: LangChain, LlamaIndex, or similar frameworks
Cloud Platforms (Azure/GCP/AWS): Cloud-based ML workflows, pipelines, and infrastructure
MLOps: Model tracking, monitoring, CI/CD with MLflow, Kubeflow, etc.
Big Data Tools: Spark, Databricks, or Hadoop ecosystem familiarity
Experiment Tracking: Tools like Weights & Biases, MLflow
Academic Research / Publications: Experience publishing whitepapers or research contributions
Hand-on experience with Databricks, preferably Azure Databricks platform.
Hand-on experience with Delta Lake, preferably Azure Databricks and ADLS Gen2 platforms.

Educational Qualifications

Master’s or PhD in Computer Science, Data Science, AI/ML, Statistics, or a related field.

Certifications (preferred but not mandatory)

Google Cloud or Azure AI Engineer / Data Scientist Associate
Databricks Certified Machine Learning Professional
DeepLearning.AI Generative AI certification
Hugging Face Transformers certification

Required Skills

Generative AI Lead and Drive Outcomes LLM Communicates Effectively Creative Problem Solving ML Transformative and Strategic Thinking

View all job openings