レッスン1: XML と Python: 最初のステップ

XMLとは

XMLは Extensible Markup Language の略で、データの内容と構造を記述するために設計された標準的なフォーマットです。 XMLは柔軟かつ広く利用可能であるため、データの保存ややり取りに広く使われています。

XMLの基本構造

XMLドキュメントは、要素、属性、テキストから構成されます。

要素 (element): 開始タグと終了タグで囲まれたXMLの基本的な構成要素です。例えば、<book></book> は book 要素です。
属性 (attribute): 要素に関する情報です。book 要素の例では、本のタイトルなどの情報が属性として記述できます。例えば、<book author="Jane Doe"></book> では author が属性です。
テキスト (text): 要素の開始タグと終了タグの間にあるテキストです。例えば、<book>XML Basics</book> では XML Basics がテキストです。

XMLドキュメントの例

<library>
    <book author="Jane Doe">XML Basics</book>
    <book author="John Smith">Advanced XML</book>
</library>

XMLをパース・検索する: `xml.etree.ElementTree`

Pythonの標準ライブラリに含まれる xml.etree.ElementTree モジュールを使うことで、XMLドキュメントからデータを取得することができます。

xml.etree.ElementTree モジュールをインポートします。簡単のために、ET という名前でインポートします。

import xml.etree.ElementTree as ET
 
library_xml = """
<library>
    <book author="Jane Doe">XML Basics</book>
    <book title="John Smith">Advanced XML</book>
</library>
"""
 
root = ET.fromstring(library_xml)
 
first_book = root.find('book')
author = first_book.get('author')
title = first_book.text
print(f"{title} by {author}")

XMLドキュメントを文字列からパースするには、ET.fromstring() 関数を用います。この関数はXMLドキュメントの最上位の要素 (ルート要素) を返します。以下の例では、library_xml のルート要素は library 要素です。

import xml.etree.ElementTree as ET
 
library_xml = """
<library>
    <book author="Jane Doe">XML Basics</book>
    <book title="John Smith">Advanced XML</book>
</library>
"""
 
root = ET.fromstring(library_xml)  # <Element 'library' at 0x...>
 
first_book = root.find('book')
author = first_book.get('author')
title = first_book.text
print(f"{title} by {author}")

要素を検索するには、検索対象の要素に対して find() や findall() を用います。

find() は範囲内で最初に見つかった要素を返します。

import xml.etree.ElementTree as ET
 
library_xml = """
<library>
    <book author="Jane Doe">XML Basics</book>
    <book title="John Smith">Advanced XML</book>
</library>
"""
 
root = ET.fromstring(library_xml)
 
first_book = root.find('book')  # <Element 'book' at 0x...>
author = first_book.get('author')
title = first_book.text
print(f"{title} by {author}")

要素の属性を取得するには get() メソッドを、要素のテキストを取得するには text プロパティを使います。

import xml.etree.ElementTree as ET
 
library_xml = """
<library>
    <book author="Jane Doe">XML Basics</book>
    <book title="John Smith">Advanced XML</book>
</library>
"""
 
root = ET.fromstring(library_xml)
 
first_book = root.find('book')
author = first_book.get('author')  # "Jane Doe"
title = first_book.text  # "XML Basics"
print(f"{title} by {author}")

全体の実行結果

XML Basics by Jane Doe

findall() は範囲内で一致するすべての要素をリストとして返します。

import xml.etree.ElementTree as ET
 
library_xml = """
<library>
    <book author="Jane Doe">XML Basics</book>
    <book title="John Smith">Advanced XML</book>
</library>
"""
 
root = ET.fromstring(library_xml)
 
books = root.findall('book')
for book in books:
    print(book.text)

全体の実行結果

XML Basics
Advanced XML

外部のXMLファイルを読み込む

XMLファイルを読み込むには、ET.parse() 関数を使います。

library.xml を読み込む例

import xml.etree.ElementTree as ET
 
root = ET.parse('library.xml')

レッスン1: XML と Python: 最初のステップ

XMLとは

XMLの基本構造

XMLをパース・検索する: xml.etree.ElementTree

外部のXMLファイルを読み込む

XMLをパース・検索する: `xml.etree.ElementTree`