Topics Covered
Helpful links
- Full Advanced Python Course Link
- Gitlab Code Page
- Additional Help at Python.org
- Google Colab: The Easiest Way to Code
What is data serialization in Python?
Serialization is the act of turning a data object (for example, Python objects or Tensorflow models) into a form that allows us to save or transmit it, and then re-creating the object using the reverse operation of deserialization.
There are many different ways to serialize data, but in Python we typically use the JSON (JavaScript Object Notation) format. JSON is a lightweight, human-readable format that can be easily parsed by computers. It can be used to store data in a file or transmit it over a network connection.
Why do we need serialization and deserialization in Python?
Python object serialization and deserialization is an essential feature of any nontrivial program. If you save something to a file in Python, if you read a configuration file, or if you respond to an HTTP request, you’re doing object serialization and deserialization.
One common use case for serialization is to save a Python object to disk. When you save an object to disk, you’re creating a copy of the object that can be used later. This is often useful when you want to keep track of a large data set or when you need to store a complex data structure in a file.
Write JSON data to a file using Python
planets = [{
"name": "Sun",
"type": "star",
"mass": "332,900 Earth masses",
"radius": "697,000 kilometers"
},
{
"name": "Mercury",
"type": "planet",
"mass": "3,303 Earth masses",
"radius": "2,440 kilometers"
},
{
"name": "Venus",
"type": "planet",
"mass": "4,869 Earth masses",
"radius": "6,051 kilometers"
},
{
"name": "Earth",
"type": "planet",
"mass": "5,972 Earth masses",
"radius": "6,378 kilometers"
},
{
"name": "Mars",
"type": "planet",
"mass": "6,417 Earth masses",
"radius": "3,396 kilometers"
},
{
"name": "Jupiter",
"type": "planet",
"mass": "1,898 Earth masses",
"radius": "69,911 kilometers"
},
{
"name": "Saturn",
"type": "planet",
"mass": "95,182 Earth masses",
"radius": "58,232 kilometers"
},
{
"name": "Uranus",
"type": "planet",
"mass": "14,536 Earth masses",
"radius": "32,746 kilometers"
},
{
"name": "Neptune",
"type": "planet",
"mass": "17,147 Earth masses",
"radius": "30,759 kilometers"
}
]
import json
with open('solar_system.json', 'w') as outfile:
json.dump(planets, outfile)
Reading JSON data to a file
The JSON file contains information about the planets in our solar system. Let’s take a look at how we can read this file using Python.
First, we import the json module, open the file, then read in the data using json.load:
import json
file = open("solar_system.json")
solar_system = json.load(file)
print(solar_system[0:2])
[{'name': 'Sun', 'type': 'star', 'mass': '332,900 Earth masses', 'radius': '697,000 kilometers'}, {'name': 'Mercury', 'type': 'planet', 'mass': '3,303 Earth masses', 'radius': '2,440 kilometers'}]
Modifying and writing JSON data to a file using Python
Ok, so your mad about Pluto not being the nineth planet. So you decide to add it:
import json
pluto = {
"name": "Pluto",
"type": "planet",
"mass": "0.00219 Earth masses",
"radius": "1,1883 kilometers"
}
solar_system.append(pluto)
file = open("solar_system.json", 'w')
json.dump(solar_system, file)
What modules can be used for serialization in Python?
There are many different modules that can be used for serialization in Python. Some of the most popular modules include:
- json: This module is used to serialize Python objects into JSON format.
- pickle: This module is used to serialize Python objects into a binary format.
- shelve: This module is used to store Python objects in a file-based database.
- cPickle: This module is used to serialize Python objects into a binary format that is faster than pickle.
- bson: This module is used to serialize Python objects into the BSON format.
- msgpack: This module is used to serialize Python objects into the MSGPack format.
- protobuf: This module is used to serialize Python objects into the Protobuf format.
- thrift: This module is used to serialize Python objects into the Thrift format.
- XML: This module is used to serialize Python objects into XML format.
- yaml: This module is used to serialize Python objects into the YAML format.
What is pickling in Python?
Pickling is the process of converting a Python object into a binary format. This format can be used to store the object in a file or transmit it over a network connection.
The pickle module provides a simple interface for pickling Python objects. The pickle module is included in the standard library, so it is available for all Python programs.
What are the differences between Pickle and JSON?
- Pickle is a binary format, while JSON is a text-based format.
- Pickle is faster than JSON, but it is not as human-readable.
- JSON is more popular than Pickle, but it is slower than Pickle.
Using pickle to write to disk
Let’s take a look at an example of how we can use pickle to write data to disk. In this example, we’ll save our list of planets as “planets.pkl”:
import pickle
with open('solar_system.pkl', 'wb') as outfile:
pickle.dump(solar_system, outfile)
This code will save the solar system object to a file called “solar_system.pkl”. The pickle module will convert the data into a binary format and write it to the file.
Reading a pickle file from disk
import pickle
newSolarSystem = pickle.load(open('solar_system.pkl', 'rb'))
print(newSolarSystem[1:2])
[{'name': 'Mercury', 'type': 'planet', 'mass': '3,303 Earth masses', 'radius': '2,440 kilometers'}]
What is YAML?
YAML is a human-readable data serialization format. It can be used to store data in a file or transmit it over a network connection.
The yaml module provides a simple interface for serializing Python objects into the YAML format. The yaml module is included in the standard library, so it is available for all Python programs.
What are the differences between YAML and JSON?
- YAML is more human-readable than JSON, but it is slower than JSON.
- JSON is more popular than YAML, but it is less human-readable than YAML.
- JSON is a text-based format, while YAML is a human-readable data serialization format.
Using yaml to write to disk
import yaml
with open('solar_system.yaml', 'w') as file:
yaml.dump(solar_system, file)
Using yaml to read from disk
import yaml
solar_system_yaml = yaml.load(open('solar_system.yaml', 'rb'))
print(solar_system_yaml[3:4])
[{'mass': '5,972 Earth masses', 'name': 'Earth', 'radius': '6,378 kilometers', 'type': 'planet'}]