azure-ai-textanalytics-py

npx skills add https://github.com/microsoft/skills --skill azure-ai-textanalytics-py

SDK Azure AI Text Analytics pour Python

Bibliothèque client pour les capacités NLP du service Azure AI Language, incluant le sentiment, les entités, les expressions clés, et bien plus.

Installation

pip install azure-ai-textanalytics

Variables d'environnement

AZURE_LANGUAGE_ENDPOINT=https://<resource>.cognitiveservices.azure.com  # Obligatoire pour toutes les méthodes d'authentification
AZURE_LANGUAGE_KEY=<your-api-key>  # Obligatoire uniquement pour l'authentification AzureKeyCredential
AZURE_TOKEN_CREDENTIALS=prod # Obligatoire uniquement si DefaultAzureCredential est utilisé en production

Authentification

Clé API

import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient

endpoint = os.environ["AZURE_LANGUAGE_ENDPOINT"]
key = os.environ["AZURE_LANGUAGE_KEY"]

client = TextAnalyticsClient(endpoint, AzureKeyCredential(key))

Entra ID (Recommandé)

from azure.ai.textanalytics import TextAnalyticsClient
from azure.identity import DefaultAzureCredential, ManagedIdentityCredential

# Dev local : DefaultAzureCredential. Production : définir AZURE_TOKEN_CREDENTIALS=prod ou AZURE_TOKEN_CREDENTIALS=<specific_credential>
credential = DefaultAzureCredential(require_envvar=True)
# Ou utiliser une identifiant spécifique directement en production :
# Voir https://learn.microsoft.com/python/api/overview/azure/identity-readme?view=azure-python#credential-classes
# credential = ManagedIdentityCredential()
client = TextAnalyticsClient(
    endpoint=os.environ["AZURE_LANGUAGE_ENDPOINT"],
    credential=credential
)

Analyse de sentiment

documents = [
    "I had a wonderful trip to Seattle last week!",
    "The food was terrible and the service was slow."
]

result = client.analyze_sentiment(documents, show_opinion_mining=True)

for doc in result:
    if not doc.is_error:
        print(f"Sentiment: {doc.sentiment}")
        print(f"Scores: pos={doc.confidence_scores.positive:.2f}, "
              f"neg={doc.confidence_scores.negative:.2f}, "
              f"neu={doc.confidence_scores.neutral:.2f}")

        # Opinion mining (sentiment basé sur les aspects)
        for sentence in doc.sentences:
            for opinion in sentence.mined_opinions:
                target = opinion.target
                print(f"  Target: '{target.text}' - {target.sentiment}")
                for assessment in opinion.assessments:
                    print(f"    Assessment: '{assessment.text}' - {assessment.sentiment}")

Reconnaissance d'entités

documents = ["Microsoft was founded by Bill Gates and Paul Allen in Albuquerque."]

result = client.recognize_entities(documents)

for doc in result:
    if not doc.is_error:
        for entity in doc.entities:
            print(f"Entity: {entity.text}")
            print(f"  Category: {entity.category}")
            print(f"  Subcategory: {entity.subcategory}")
            print(f"  Confidence: {entity.confidence_score:.2f}")

Détection des PII

documents = ["My SSN is 123-45-6789 and my email is john@example.com"]

result = client.recognize_pii_entities(documents)

for doc in result:
    if not doc.is_error:
        print(f"Redacted: {doc.redacted_text}")
        for entity in doc.entities:
            print(f"PII: {entity.text} ({entity.category})")

Extraction d'expressions clés

documents = ["Azure AI provides powerful machine learning capabilities for developers."]

result = client.extract_key_phrases(documents)

for doc in result:
    if not doc.is_error:
        print(f"Key phrases: {doc.key_phrases}")

Détection de langue

documents = ["Ce document est en francais.", "This is written in English."]

result = client.detect_language(documents)

for doc in result:
    if not doc.is_error:
        print(f"Language: {doc.primary_language.name} ({doc.primary_language.iso6391_name})")
        print(f"Confidence: {doc.primary_language.confidence_score:.2f}")

Text Analytics pour la santé

documents = ["Patient has diabetes and was prescribed metformin 500mg twice daily."]

poller = client.begin_analyze_healthcare_entities(documents)
result = poller.result()

for doc in result:
    if not doc.is_error:
        for entity in doc.entities:
            print(f"Entity: {entity.text}")
            print(f"  Category: {entity.category}")
            print(f"  Normalized: {entity.normalized_text}")

            # Liens d'entités (UMLS, etc.)
            for link in entity.data_sources:
                print(f"  Link: {link.name} - {link.entity_id}")

Analyses multiples (Batch)

from azure.ai.textanalytics import (
    RecognizeEntitiesAction,
    ExtractKeyPhrasesAction,
    AnalyzeSentimentAction
)

documents = ["Microsoft announced new Azure AI features at Build conference."]

poller = client.begin_analyze_actions(
    documents,
    actions=[
        RecognizeEntitiesAction(),
        ExtractKeyPhrasesAction(),
        AnalyzeSentimentAction()
    ]
)

results = poller.result()
for doc_results in results:
    for result in doc_results:
        if result.kind == "EntityRecognition":
            print(f"Entities: {[e.text for e in result.entities]}")
        elif result.kind == "KeyPhraseExtraction":
            print(f"Key phrases: {result.key_phrases}")
        elif result.kind == "SentimentAnalysis":
            print(f"Sentiment: {result.sentiment}")

Client asynchrone

from azure.ai.textanalytics.aio import TextAnalyticsClient
from azure.identity.aio import DefaultAzureCredential

async def analyze():
    async with TextAnalyticsClient(
        endpoint=endpoint,
        credential=DefaultAzureCredential()
    ) as client:
        result = await client.analyze_sentiment(documents)
        # Traiter les résultats...

Types de client

Client Objectif
TextAnalyticsClient Toutes les opérations d'analyse de texte
TextAnalyticsClient (aio) Version asynchrone

Opérations disponibles

Méthode Description
analyze_sentiment Analyse de sentiment avec opinion mining
recognize_entities Reconnaissance d'entités nommées
recognize_pii_entities Détection et suppression des PII
recognize_linked_entities Liaison d'entités vers Wikipedia
extract_key_phrases Extraction d'expressions clés
detect_language Détection de langue
begin_analyze_healthcare_entities NLP pour la santé (longue durée)
begin_analyze_actions Analyses multiples en batch

Bonnes pratiques

  1. Utiliser les opérations batch pour plusieurs documents (jusqu'à 10 par requête)
  2. Activer opinion mining pour un sentiment basé sur les aspects détaillé
  3. Utiliser le client asynchrone pour les scénarios à haut débit
  4. Gérer les erreurs de document — la liste des résultats peut contenir des erreurs pour certains documents
  5. Spécifier la langue quand elle est connue pour améliorer la précision
  6. Utiliser le gestionnaire de contexte ou fermer le client explicitement

Skills similaires