Python 之 dataclasses 和 pydantic 数据解析和校验
dataclasses 是一个标准库这个模块提供了一个装饰器和一些函数用于自动为用户自定义的类添加生成的 特殊方法 。例如常见的 __init__()pydantic 则是 Python 中最流行的数据验证和设置管理库之一利用 Python 类型提示 进行运行时数据验证和解析。包括 FastApi 等大多框架都在使用。dataclassesdataclass 装饰会自动给类添加如下方法__init__初始化方法__repr__字符串表示__eq__相等比较__hash__如果设置frozenTrue__post_init__初始化后处理初始化赋值from dataclasses import dataclass dataclass class InventoryItem: Class for keeping track of an item in inventory. name: str unit_price: float quantity_on_hand: int 0 def total_cost(self) - float: return self.unit_price * self.quantity_on_hand item InventoryItem(Apple, 5.6, 5) print(item) # InventoryItem(nameApple, unit_price5.6, quantity_on_hand5) print(item.total_cost()) # 28.0使用 dataclass 装饰以上类以后相当于自动添加了如下的初始化方法。def __init__(self, name: str, unit_price: float, quantity_on_hand: int 0): self.name name self.unit_price unit_price self.quantity_on_hand quantity_on_hand当然如果类里边手动添加了相应的初始化方法的话上面通过 dataclass 自动添加的方法就会失效其他 __eq__() 类似。from dataclasses import dataclass dataclass class InventoryItem: Class for keeping track of an item in inventory. name: str unit_price: float quantity_on_hand: int 0 def __init__(self, name: str, unit_price: float, quantity_on_hand): self.name name self.unit_price unit_price self.quantity_on_hand quantity_on_hand 5 def total_cost(self) - float: return self.unit_price * self.quantity_on_hand item InventoryItem(Apple, 5.6, 5) print(item) # InventoryItem(nameApple, unit_price5.6, quantity_on_hand10) print(item.total_cost()) # 56.0初始化后处理dataclass 自动生成的初始化方法基本上只做赋值的用途如果除了基本赋值以后还要做其他额外操作。则需要使用初始化后处理。__post_init__是dataclass提供的一个特殊方法它在__init__方法执行完毕后自动被调用主要用于执行初始化后的额外处理逻辑。比如下面自动给数量 5from dataclasses import dataclass dataclass class InventoryItem: Class for keeping track of an item in inventory. name: str unit_price: float quantity_on_hand: int 0 def __post_init__(self): self.quantity_on_hand 5 def total_cost(self) - float: return self.unit_price * self.quantity_on_hand item InventoryItem(Apple, 5.6, 5) print(item) # InventoryItem(nameApple, unit_price5.6, quantity_on_hand10) print(item.total_cost()) # 56.0数据类继承from dataclasses import dataclass from typing import Any dataclass class Base: x: Any 15.0 y: int 0 dataclass class C(Base): z: int 10 x: int 15 c C() print(c) # C(x15, y0, z10) print(type(C.x)) # class int类变量由于数据类不直接使用 self 关键字所以可能不容易区分类变量和实例变量。一般来说数据类中没有注解直接赋值的是类变量。from dataclasses import dataclass from typing import ClassVar dataclass class MyClass: # 实例变量有类型注解 instance_var1: int instance_var2: str default # 有默认值的实例变量 # 类变量没有类型注解 class_var1 shared by all instances myclass1 MyClass(1) myclass2 MyClass(2) MyClass.class_var1 Looking print(myclass1.class_var1) # Looking print(myclass2.class_var1) # Looking在有注解的情况下如果使用了 ClassVar 声明为了类变量则也属于类变量。from dataclasses import dataclass from typing import ClassVar dataclass class MyClass: # 实例变量有类型注解 instance_var1: int instance_var2: str default # 有默认值的实例变量 # 注意如果用了 ClassVar 类型注解也是类变量 class_var3: ClassVar[int] 100 # 明确标记为类变量 myclass1 MyClass(1) myclass2 MyClass(2) MyClass.class_var3 50 print(myclass1.class_var3) # 50 print(myclass2.class_var3) # 50默认工厂函数dataclass 不允许实例变量使用可变默认值。所有实例会共享同一个列表对象导致意外修改。from dataclasses import dataclass from typing import List dataclass class InventoryItem: names: List[str] [] unit_price: float 2 quantity_on_hand: int 0 item InventoryItem([Apple], 5.6, 5) print(item) # ValueError: mutable default class list for field names is not allowed: use default_factory一般可使用默认工厂函数来替换可变默认值。from dataclasses import dataclass, field from typing import List dataclass class InventoryItem: names: List[str] field(default_factorylist) unit_price: float 2 quantity_on_hand: int 0 item InventoryItem([Apple], 5.6, 5) print(item) # InventoryItem(names[Apple], unit_price5.6, quantity_on_hand5)变量顺序根据默认的变量初始顺序有默认值的参数必须在无默认值的参数后边。from dataclasses import dataclass dataclass class InventoryItem: name: str quantity_on_hand: int 0 unit_price: float item InventoryItem(Apple, 5.6, 5) print(item)TypeError: non-default argument unit_price follows default argument字典和元组形式可以使用 asdict 和 astuple 输出数据类实例的字典和元素形式。from dataclasses import dataclass, asdict, astuple dataclass class InventoryItem: Class for keeping track of an item in inventory. name: str unit_price: float quantity_on_hand: int 0 item InventoryItem(Apple, 5.6, 5) print(asdict(item)) # {name: Apple, unit_price: 5.6, quantity_on_hand: 5} print(astuple(item)) # (Apple, 5.6, 5)pydanticapi 文档可参照https://pydantic.com.cn/api/base_model/安装类型提示是基于标准库的pydantic 则是需要进行安装的。pip install pydantic解析赋值将参数解析赋值到类示例变量里边去。这个也是最基本的功能不过pydantic 似乎不支持使用位置参数传参来初始化——必须使用关键字参数。from pydantic import BaseModel class Item(BaseModel): name: str description: str | None None price: float tax: float | None None item Item(**{ name: Looking, description: An optional description, price: 2.3, tax: 0.25 }) print(item) # nameLooking descriptionAn optional description price2.3 tax0.25当然如果只是需要解析赋值的话使用标准库 dataclasses 的 dataclass 也是可以做到的。from dataclasses import dataclass dataclass class Item: name: str description: str price: float tax: float | None None item Item(**{ name: Looking, description: An optional description, price: 2.3, tax: 0.25 }) print(item) # Item(nameLooking, descriptionAn optional description, price2.3, tax0.25)类型提示量如果没有添加类型提示会报错未定义。from pydantic import BaseModel class InventoryItem(BaseModel): name: str unit_price quantity_on_hand: int 0 item InventoryItem(nameApple, unit_price5.6, quantity_on_hand5) print(item)NameError: name unit_price is not defined默认值在类型提示的同时还可以设置默认值这样当输入没有对应字段数据的时候会使用默认值。from pydantic import BaseModel class Item(BaseModel): name: str unit_price: float quantity_on_hand: int 5 item Item(nameApple, unit_price5.6) print(item) # nameApple unit_price5.6 quantity_on_hand5类型转换pydantic 会自动将输入转换成字段所声明的类型。from pydantic import BaseModel class Item(BaseModel): name: str unit_price: float quantity_on_hand: int 0 item Item(nameApple, unit_price5.6, quantity_on_hand5) print(item) # nameApple unit_price5.6 quantity_on_hand5当然如果出现转换失败的情况也仍然会报错。pydantic_core._pydantic_core.ValidationError: 1 validation error for Item unit_price Input should be a valid number, unable to parse string as a number [typefloat_parsing, input_valuetest5.6, input_typestr] For further information visit https://errors.pydantic.dev/2.11/v/float_parsing严格模式默认是宽松模式也就是输入的数据类型与声明的即使不一样但是如果可以进行转换的话就会自动转换成声明的类型。如果要求输入的类型必须与声明的类型保持一致则需要使用 strict 来声明严格模式。from pydantic import BaseModel, ConfigDict class Item(BaseModel): name: str unit_price: float quantity_on_hand: int 0 model_config ConfigDict(strictTrue) item Item(nameApple, unit_price5.6, quantity_on_hand5) print(item)pydantic_core._pydantic_core.ValidationError: 1 validation error for Item unit_price Input should be a valid number [typefloat_type, input_value5.6, input_typestr]可选字段常规的提示声明默认字段是必选字段必须字段如果没有赋值的话会导致报错。pydantic_core._pydantic_core.ValidationError: 1 validation error for Item unit_price Field required [typemissing, input_value{name: Apple}, input_typedict] For further information visit https://errors.pydantic.dev/2.11/v/missing我们可以使用类似Optional[float] None 或 float | None None的方式来声明可选字段。from pydantic import BaseModel from typing import Optional class Item(BaseModel): name: str unit_price: Optional[float] None quantity_on_hand: int 5 item Item(nameApple) print(item) # nameApple unit_priceNone quantity_on_hand5嵌套模型from typing import List from pydantic import BaseModel class User(BaseModel): id: int name: str email: str is_active: bool True class Address(BaseModel): street: str city: str zip_code: str class Company(BaseModel): name: str address: Address # 嵌套模型 employees: List[User] # 模型列表 company Company( nameTech Corp, address{street: 123 Main St, city: SF, zip_code: 94105}, employees[{id: 1, name: Bob, email: bobexample.com}] ) print(company) # nameTech Corp addressAddress(street123 Main St, citySF, zip_code94105) employees[User(id1, nameBob, emailbobexample.com, is_activeTrue)]数据校验比如要求输入的字段必须在指定的范围。from pydantic import BaseModel, Field class Item(BaseModel): name: str unit_price: float Field(gt3, lt10) quantity_on_hand: int 5 item Item(nameApple, unit_price2) print(item)如果不满足给定的要求则进行报错。pydantic_core._pydantic_core.ValidationError: 1 validation error for Item unit_price Input should be greater than 3 [typegreater_than, input_value2, input_typeint] For further information visit https://errors.pydantic.dev/2.11/v/greater_than高级校验我们还可以从 pydantic 引入 EmailStr, HttpUrl 等高级校验器。from pydantic import BaseModel, EmailStr, HttpUrl, conint, condecimal from enum import Enum class Status(str, Enum): ACTIVE active INACTIVE inactive class UserProfile(BaseModel): email: EmailStr # 自动验证邮箱格式 website: HttpUrl # 自动验证 URL 格式 age: conint(ge0, le150) # 约束整数 score: condecimal(max_digits5, decimal_places2) # 约束小数及最大位数 status: Status # 枚举类型 user UserProfile(email123qq.com, websitehttp://test.com, age20, score123.45, statusactive) print(user) # email123qq.com websiteHttpUrl(http://test.com/) age20 scoreDecimal(123.45) statusStatus.ACTIVE: active自定义校验有时候现有的校验方式可能不满足要求这个时候可以针对字段自定义复杂的校验处理逻辑。from typing import List from pydantic import BaseModel, field_validator, model_validator class Order(BaseModel): items: List[str] total_price: float discount: float 0.0 field_validator(discount) def discount_valid(cls, v): if v 0 or v 1: raise ValueError(discount must be between 0 and 1) return v model_validator(modeafter) def check_total(self): if self.total_price 0: raise ValueError(total price cannot be negative) return self order Order(items[Apple, Banana], total_price-2, discount.1) print(order)pydantic_core._pydantic_core.ValidationError: 1 validation error for Order Value error, total price cannot be negative [typevalue_error, input_value{items: [Apple, Bana...e: -2, discount: 0.1}, input_typedict]文件加载和校验我们也可以从配置文件加载数据并使用 pydantic 进行校验并赋值。pip install pydantic-settings比如有配置文件 .envDB_HOSTlocalhost DB_PORT2345 DB_USERroot DB_PASSWORDsecret API_KEYyour_api_key_herefrom pydantic_settings import BaseSettings, SettingsConfigDict class Settings(BaseSettings): DB_HOST: str DB_PORT: int DB_USER: str DB_PASSWORD: str API_KEY: str model_config SettingsConfigDict(env_file.env) settings Settings() print(settings) # DB_HOSTlocalhost DB_PORT2345 DB_USERroot DB_PASSWORDsecret API_KEYyour_api_key_here错误捕获我们可以使用 ValidationError 异常来捕获校验方面的错误还可以以 json 的形式输出报错提示。from pydantic import BaseModel, Field, ValidationError class Item(BaseModel): name: str unit_price: float Field(gt3, lt10) quantity_on_hand: int 5 try: item Item(nameApple, unit_price2) print(item) except ValidationError as e: print(e.json(indent2))[ { type: greater_than, loc: [ unit_price ], msg: Input should be greater than 3, input: 2, ctx: { gt: 3.0 }, url: https://errors.pydantic.dev/2.11/v/greater_than } ]模型信息使用 model_dump 可以返回模型的字典形式。from pydantic import BaseModel class InventoryItem(BaseModel): name: str unit_price: float 5.6 quantity_on_hand: int 0 item InventoryItem(nameApple, unit_price5.6, quantity_on_hand5) print(item.model_dump()) # {name: Apple, unit_price: 5.6, quantity_on_hand: 5}