Image by Author | Leonardo AI & Canva
Data serialization is a basic programming concept with great value in everyday programs. It refers to converting complex data objects to an intermediate format that can be saved and easily converted back to its original form. However, the common data serialization Python libraries like JSON and pickle are very limited in their functionality. With structured programs and object-oriented programming, we need stronger support to handle data classes.
Marshmallow is one of the most famous data-handling libraries that is widely used by Python developers to develop robust software applications. It supports data serialization and provides a strong abstract solution for handling data validation in an object-oriented paradigm.
In this article, we use a running example given below to understand how to use Marshmallow in existing projects. The code shows three classes representing a simple e-commerce model: Product
, Customer
, and Order
. Each class minimally defines its parameters. We’ll see how to save an instance of an object and ensure its correctness when we try to load it again in our code.
from typing import List
class Product:
def __init__(self, _id: int, name: str, price: float):
self._id = _id
self.name = name
self.price = price
class Customer:
def __init__(self, _id: int, name: str):
self._id = _id
self.name = name
class Order:
def __init__(self, _id: int, customer: Customer, products: List[Product]):
self._id = _id
self.customer = customer
self.products = products
Getting Started with Marshmallow
Installation
Marshmallow is available as a Python library at PyPI and can be easily installed using pip. To install or upgrade the Marshmallow dependency, run the below command:
pip install -U marshmallow
This installs the recent stable version of Marshmallow in the active environment. If you want the development version of the library with all the latest functionality, you can install it using the command below:
pip install -U git+https://github.com/marshmallow-code/marshmallow.git@dev
Creating Schemas
Let’s start by adding Marshmallow functionality to the Product
class. We need to create a new class that represents a schema an instance of the Product
class must follow. Think of a schema like a blueprint, that defines the variables in the Product
class and the datatype they belong to.
Let’s break down and understand the basic code below:
from marshmallow import Schema, fields
class ProductSchema(Schema):
_id = fields.Int(required=True)
name = fields.Str(required=True)
price = fields.Float(required=True)
We create a new class that inherits from the Schema
class in Marshmallow. Then, we declare the same variable names as our Product
class and define their field types. The fields class in Marshmallow supports various data types; here, we use the primitive types Int, String, and Float.
Serialization
Now that we have a schema defined for our object, we can now convert a Python class instance into a JSON string or a Python dictionary for serialization. Here’s the basic implementation:
product = Product(_id=4, name="Test Product", price=10.6)
schema = ProductSchema()
# For Python Dictionary object
result = schema.dump(product)
# type(dict) -> {'_id': 4, 'name': 'Test Product', 'price': 10.6}
# For JSON-serializable string
result = schema.dumps(product)
# type(str) -> {"_id": 4, "name": "Test Product", "price": 10.6}
We create an object of our ProductSchema
, which converts a Product object to a serializable format like JSON or dictionary.
Note the difference between
dump
anddumps
function results. One returns a Python dictionary object that can be saved using pickle, and the other returns a string object that follows the JSON format.
Deserialization
To reverse the serialization process, we use deserialization. An object is saved so it can be loaded and accessed later, and Marshmallow helps with that.
A Python dictionary can be validated using the load function, which verifies the variables and their associated datatypes. The below function shows how it works:
product_data = {
"_id": 4,
"name": "Test Product",
"price": 50.4,
}
result = schema.load(product_data)
print(result)
# type(dict) -> {'_id': 4, 'name': 'Test Product', 'price': 50.4}
faulty_data = {
"_id": 5,
"name": "Test Product",
"price": "ABCD" # Wrong input datatype
}
result = schema.load(faulty_data)
# Raises validation error
The schema validates that the dictionary has the correct parameters and data types. If the validation fails, a ValidationError
is raised so it’s essential to wrap the load function
in a try-except block. If it is successful, the result object is still a dictionary when the original argument is also a dictionary. Not so helpful right? What we generally want is to validate the dictionary and convert it back to the original object it was serialized from.
To achieve this, we use the post_load
decorator provided by Marshmallow:
from marshmallow import Schema, fields, post_load
class ProductSchema(Schema):
_id = fields.Int(required=True)
name = fields.Str(required=True)
price = fields.Float(required=True)
@post_load
def create_product(self, data, **kwargs):
return Product(**data)
We create a function in the schema class with the post_load
decorator. This function takes the validated dictionary and converts it back to a Product object. Including **kwargs
is important as Marshmallow may pass additional necessary arguments through the decorator.
This modification to the load functionality ensures that after validation, the Python dictionary is passed to the post_load
function, which creates a Product
object from the dictionary. This makes it possible to deserialize an object using Marshmallow.
Validation
Often, we need additional validation specific to our use case. While data type validation is essential, it doesn’t cover all the validation we might need. Even in this simple example, extra validation is needed for our Product
object. We need to ensure that the price is not below 0. We can also define more rules, such as ensuring that our product name is between 3 and 128 characters. These rules help ensure our codebase conforms to a defined database schema.
Let us now see how we can implement this validation using Marshmallow:
from marshmallow import Schema, fields, validates, ValidationError, post_load
class ProductSchema(Schema):
_id = fields.Int(required=True)
name = fields.Str(required=True)
price = fields.Float(required=True)
@post_load
def create_product(self, data, **kwargs):
return Product(**data)
@validates('price')
def validate_price(self, value):
if value <= 0:
raise ValidationError('Price must be greater than zero.')
@validates('name')
def validate_name(self, value):
if len(value) < 3 or len(value) > 128:
raise ValidationError('Name of Product must be between 3 and 128 letters.')
We modify the ProductSchema
class to add two new functions. One validates the price parameter and the other validates the name parameter. We use the validates function decorator and annotate the name of the variable that the function is supposed to validate. The implementation of these functions is straightforward: if the value is incorrect, we raise a ValidationError
.
Nested Schemas
Now, with the basic Product
class validation, we have covered all the basic functionality provided by the Marshmallow library. Let us now build complexity and see how the other two classes will be validated.
The Customer
class is fairly straightforward as it contains the basic attributes and primitive datatypes.
class CustomerSchema(Schema):
_id = fields.Int(required=True)
name = fields.Int(required=True)
However, defining the schema for the Order
class forces us to learn a new and required concept of Nested Schemas. An order will be associated with a specific customer and the customer can order any number of products. This is defined in the class definition, and when we validate the Order
schema, we also need to validate the Product
and Customer
objects passed to it.
Instead of redefining everything in the OrderSchema
, we will avoid repetition and use nested schemas. The order schema is defined as follows:
class OrderSchema(Schema):
_id = fields.Int(require=True)
customer = fields.Nested(CustomerSchema, required=True)
products = fields.List(fields.Nested(ProductSchema), required=True)
Within the Order
schema, we include the ProductSchema
and CustomerSchema
definitions. This ensures that the defined validations for these schemas are automatically applied, following the DRY (Don’t Repeat Yourself) principle in programming, which allows the reuse of existing code.
Wrapping Up
In this article, we covered the quick start and use case of the Marshmallow library, one of the most popular serialization and data validation libraries in Python. Although similar to Pydantic, many developers prefer Marshmallow due to its schema definition method, which resembles validation libraries in other languages like JavaScript.
Marshmallow is easy to integrate with Python backend frameworks like FastAPI and Flask, making it a popular choice for web framework and data validation tasks, as well as for ORMs like SQLAlchemy.
Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.