SDK Azure AI Text Analytics pour Python
Bibliothèque client pour les capacités NLP du service Azure AI Language, incluant le sentiment, les entités, les expressions clés, et bien plus.
Installation
pip install azure-ai-textanalytics
Variables d'environnement
AZURE_LANGUAGE_ENDPOINT=https://<resource>.cognitiveservices.azure.com # Obligatoire pour toutes les méthodes d'authentification
AZURE_LANGUAGE_KEY=<your-api-key> # Obligatoire uniquement pour l'authentification AzureKeyCredential
AZURE_TOKEN_CREDENTIALS=prod # Obligatoire uniquement si DefaultAzureCredential est utilisé en production
Authentification
Clé API
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient
endpoint = os.environ["AZURE_LANGUAGE_ENDPOINT"]
key = os.environ["AZURE_LANGUAGE_KEY"]
client = TextAnalyticsClient(endpoint, AzureKeyCredential(key))
Entra ID (Recommandé)
from azure.ai.textanalytics import TextAnalyticsClient
from azure.identity import DefaultAzureCredential, ManagedIdentityCredential
# Dev local : DefaultAzureCredential. Production : définir AZURE_TOKEN_CREDENTIALS=prod ou AZURE_TOKEN_CREDENTIALS=<specific_credential>
credential = DefaultAzureCredential(require_envvar=True)
# Ou utiliser une identifiant spécifique directement en production :
# Voir https://learn.microsoft.com/python/api/overview/azure/identity-readme?view=azure-python#credential-classes
# credential = ManagedIdentityCredential()
client = TextAnalyticsClient(
endpoint=os.environ["AZURE_LANGUAGE_ENDPOINT"],
credential=credential
)
Analyse de sentiment
documents = [
"I had a wonderful trip to Seattle last week!",
"The food was terrible and the service was slow."
]
result = client.analyze_sentiment(documents, show_opinion_mining=True)
for doc in result:
if not doc.is_error:
print(f"Sentiment: {doc.sentiment}")
print(f"Scores: pos={doc.confidence_scores.positive:.2f}, "
f"neg={doc.confidence_scores.negative:.2f}, "
f"neu={doc.confidence_scores.neutral:.2f}")
# Opinion mining (sentiment basé sur les aspects)
for sentence in doc.sentences:
for opinion in sentence.mined_opinions:
target = opinion.target
print(f" Target: '{target.text}' - {target.sentiment}")
for assessment in opinion.assessments:
print(f" Assessment: '{assessment.text}' - {assessment.sentiment}")
Reconnaissance d'entités
documents = ["Microsoft was founded by Bill Gates and Paul Allen in Albuquerque."]
result = client.recognize_entities(documents)
for doc in result:
if not doc.is_error:
for entity in doc.entities:
print(f"Entity: {entity.text}")
print(f" Category: {entity.category}")
print(f" Subcategory: {entity.subcategory}")
print(f" Confidence: {entity.confidence_score:.2f}")
Détection des PII
documents = ["My SSN is 123-45-6789 and my email is john@example.com"]
result = client.recognize_pii_entities(documents)
for doc in result:
if not doc.is_error:
print(f"Redacted: {doc.redacted_text}")
for entity in doc.entities:
print(f"PII: {entity.text} ({entity.category})")
Extraction d'expressions clés
documents = ["Azure AI provides powerful machine learning capabilities for developers."]
result = client.extract_key_phrases(documents)
for doc in result:
if not doc.is_error:
print(f"Key phrases: {doc.key_phrases}")
Détection de langue
documents = ["Ce document est en francais.", "This is written in English."]
result = client.detect_language(documents)
for doc in result:
if not doc.is_error:
print(f"Language: {doc.primary_language.name} ({doc.primary_language.iso6391_name})")
print(f"Confidence: {doc.primary_language.confidence_score:.2f}")
Text Analytics pour la santé
documents = ["Patient has diabetes and was prescribed metformin 500mg twice daily."]
poller = client.begin_analyze_healthcare_entities(documents)
result = poller.result()
for doc in result:
if not doc.is_error:
for entity in doc.entities:
print(f"Entity: {entity.text}")
print(f" Category: {entity.category}")
print(f" Normalized: {entity.normalized_text}")
# Liens d'entités (UMLS, etc.)
for link in entity.data_sources:
print(f" Link: {link.name} - {link.entity_id}")
Analyses multiples (Batch)
from azure.ai.textanalytics import (
RecognizeEntitiesAction,
ExtractKeyPhrasesAction,
AnalyzeSentimentAction
)
documents = ["Microsoft announced new Azure AI features at Build conference."]
poller = client.begin_analyze_actions(
documents,
actions=[
RecognizeEntitiesAction(),
ExtractKeyPhrasesAction(),
AnalyzeSentimentAction()
]
)
results = poller.result()
for doc_results in results:
for result in doc_results:
if result.kind == "EntityRecognition":
print(f"Entities: {[e.text for e in result.entities]}")
elif result.kind == "KeyPhraseExtraction":
print(f"Key phrases: {result.key_phrases}")
elif result.kind == "SentimentAnalysis":
print(f"Sentiment: {result.sentiment}")
Client asynchrone
from azure.ai.textanalytics.aio import TextAnalyticsClient
from azure.identity.aio import DefaultAzureCredential
async def analyze():
async with TextAnalyticsClient(
endpoint=endpoint,
credential=DefaultAzureCredential()
) as client:
result = await client.analyze_sentiment(documents)
# Traiter les résultats...
Types de client
| Client |
Objectif |
TextAnalyticsClient |
Toutes les opérations d'analyse de texte |
TextAnalyticsClient (aio) |
Version asynchrone |
Opérations disponibles
| Méthode |
Description |
analyze_sentiment |
Analyse de sentiment avec opinion mining |
recognize_entities |
Reconnaissance d'entités nommées |
recognize_pii_entities |
Détection et suppression des PII |
recognize_linked_entities |
Liaison d'entités vers Wikipedia |
extract_key_phrases |
Extraction d'expressions clés |
detect_language |
Détection de langue |
begin_analyze_healthcare_entities |
NLP pour la santé (longue durée) |
begin_analyze_actions |
Analyses multiples en batch |
Bonnes pratiques
- Utiliser les opérations batch pour plusieurs documents (jusqu'à 10 par requête)
- Activer opinion mining pour un sentiment basé sur les aspects détaillé
- Utiliser le client asynchrone pour les scénarios à haut débit
- Gérer les erreurs de document — la liste des résultats peut contenir des erreurs pour certains documents
- Spécifier la langue quand elle est connue pour améliorer la précision
- Utiliser le gestionnaire de contexte ou fermer le client explicitement