Overview
- Develop and implement an automated data labeling system to streamline annotation of large datasets within Microsoft 365.
Goals
- Automatically apply sensitivity labels to documents in Microsoft 365 using classification techniques.
- Define and implement an optimal target architecture with a focus on performance and accuracy.
Technologies
- Microsoft Azure services, Python, RegEx, Open-source LLMs.
Responsibilities
- Build an AI system to identify, categorize, and label data in Microsoft 365 per guidelines and standards.
- Analyze complex datasets and troubleshoot issues efficiently.
- Perform quality checks on labeled data to ensure accuracy and consistency.
- Optimize performance to operate within constrained resources.
Basic Qualifications
- Ability to work in a collaborative team environment.
- Excellent oral and written communication skills.
- Motivation and strong commitment to delivering results.
- Strong organizational skills and ability to make independent decisions.
- Ability to document processes and findings clearly.
Technical Qualifications
- Python programming.
- Familiarity with NLP and machine learning frameworks.
- Familiarity with cloud computing (Azure preferred).
- Bonus: Familiarity with Large Language Models (LLMs).
- Bonus: Experience with Dask/Polars.
📧 Pour postuler: jobs@datahorizon.eu