Datahorizon
Datahorizon
Tunisie

Data Labeling Project within Microsoft 365 environment

Data Analysis / Data ScienceData Science & Machine LearningNLP/LLMsCloud computing (GCP)Microsoft 365 Copilotpython

Publié il y a 12 jours

Stage
⏱️4-6 mois
💼Hybride
📅Expire dans 2 jours
Pas motivé ? 5 minutes chrono, puis stop si tu veux.

Description du poste

Overview

  • Develop and implement an automated data labeling system to streamline annotation of large datasets within Microsoft 365.

Goals

  • Automatically apply sensitivity labels to documents in Microsoft 365 using classification techniques.
  • Define and implement an optimal target architecture with a focus on performance and accuracy.

Technologies

  • Microsoft Azure services, Python, RegEx, Open-source LLMs.

Responsibilities

  • Build an AI system to identify, categorize, and label data in Microsoft 365 per guidelines and standards.
  • Analyze complex datasets and troubleshoot issues efficiently.
  • Perform quality checks on labeled data to ensure accuracy and consistency.
  • Optimize performance to operate within constrained resources.

Basic Qualifications

  • Ability to work in a collaborative team environment.
  • Excellent oral and written communication skills.
  • Motivation and strong commitment to delivering results.
  • Strong organizational skills and ability to make independent decisions.
  • Ability to document processes and findings clearly.

Technical Qualifications

  • Python programming.
  • Familiarity with NLP and machine learning frameworks.
  • Familiarity with cloud computing (Azure preferred).
  • Bonus: Familiarity with Large Language Models (LLMs).
  • Bonus: Experience with Dask/Polars.

📧 Pour postuler: jobs@datahorizon.eu