A Comprehensive Guide to Python Enumerations for Data Science
Written on
Chapter 1: Understanding Python Enumerations
Python's enumeration types are utilized across many programming languages, including C, C#, C++, Java, Go, Rust, and certainly Python itself. Despite their potential, Python enumerations are often overlooked by data scientists. I have frequently employed them and have witnessed their benefits, yet they remain underutilized. This article aims to help change that.
Enumerations in Python are a powerful tool that should not be disregarded. Many view them as complex, so it's essential to first grasp their functionality before attempting to implement them. After reading this piece, I hope you'll see that enumerations simplify code, making it shorter, less error-prone, and more efficient.
To illustrate the effectiveness of enumerations, we will use a data science example, comparing four classes: two that use a combination of strings and the typing.Literal type, and two that utilize enum.Enum, which is Python's standard library class for enumerations. Each class will represent a time-series model selected by the user.
This article will lay the groundwork for Python enumerations, providing you with the knowledge to use this programming tool effectively. While the basics will serve most scenarios, there is much more to explore. Future articles will delve into the more intricate aspects of Python enumerations.
Introduction to Python Enumerations
A valuable resource for learning about enumerations is the official Python documentation. Enumerations, commonly referred to as enums, are a data type that allows for the definition of a set of named values. For example, days of the week can be represented as:
- Monday (1)
- Tuesday (2)
- Wednesday (3)
- Thursday (4)
- Friday (5)
- Saturday (6)
- Sunday (7)
Similarly, months can be enumerated as:
- January (1)
- February (2)
- March (3)
- April (4)
- May (5)
- June (6)
- July (7)
- August (8)
- September (9)
- October (10)
- November (11)
- December (12)
In these examples, the items are arranged based on their natural order, creating a meaningful list. However, enumerations need not always be sequential, as seen with categories such as colors (red, green, blue) or crop species (wheat, barley, oat).
It is crucial to distinguish between enumerations and the enumerate() function in Python. In this context, "enumerations" refers to classes derived from one of the enumeration classes in the enum module.
Enums are not the only data types you can use, but they offer several distinct advantages, as we will analyze using our first enumeration class:
from enum import Enum
class DayOfWeek(Enum):
MONDAY = 1
TUESDAY = 2
WEDNESDAY = 3
THURSDAY = 4
FRIDAY = 5
SATURDAY = 6
SUNDAY = 7
The convention of using uppercase for member names (e.g., MONDAY, TUESDAY) should be adhered to unless there’s a compelling reason not to.
Alternatives to this enumeration might include:
# Global variables
MONDAY = 1
TUESDAY = 2
# and so forth...
# Integers
1, 2, 3, 4, 5, 6, 7
# Strings
"Monday", "Tuesday", "Wednesday", ...
# List / tuple
DAYS = ["Monday", "Tuesday", "Wednesday", ...]
# Dict
DAYS = {1: "Monday", 2: "Tuesday", 3: "Wednesday", ...}
# or
DAYS = {"Monday": 1, "Tuesday": 2, "Wednesday": 3, ...}
Advantages of Enumerations
The DayOfWeek enum class enhances code clarity and maintainability. Each member can be accessed as DayOfWeek.MONDAY, DayOfWeek.TUESDAY, etc. The term "member" refers to elements within the class, and each member has both a name and a value. For instance, DayOfWeek.MONDAY has the name "MONDAY" and the value of 1, which indicates the day’s position in the week.
The clear definition of the class and its members provides informative context, reducing ambiguity. The enumeration type explicitly communicates that it represents a limited set of values, as demonstrated by the DayOfWeek class, which indicates the days of the week are numbered from 1 to 7.
You can check if a specific value belongs to the DayOfWeek type using isinstance(value, DayOfWeek). This restriction helps to prevent errors, as only the defined members can be used. When enums are stored as integers, they can also enhance performance while maintaining clarity.
In summary, using enums can lead to more readable, maintainable, efficient, and error-resistant code.
Using Enums in Python
Creating an enumeration class in Python is straightforward:
>>> DayOfWeek.TUESDAY
>>> day = DayOfWeek.TUESDAY
>>> day.name
'TUESDAY'
>>> day.value
2
Enumerations come with built-in error handling, as shown below:
>>> DayOfWeek.TUESDAI
Traceback (most recent call last):
...
AttributeError: TUESDAI
When examining the attributes of an instance, you can use:
>>> dir(day)
['FRIDAY', 'MONDAY', 'SATURDAY', 'SUNDAY',
'THURSDAY', 'TUESDAY', 'WEDNESDAY',
'__class__', '__doc__', '__eq__', '__hash__',
'__module__', 'name', 'value']
The attributes of the class itself can be viewed as well:
>>> dir(DayOfWeek)
['FRIDAY', 'MONDAY', 'SATURDAY', 'SUNDAY', 'THURSDAY',
'TUESDAY', 'WEDNESDAY', '__class__', '__contains__',
'__doc__', '__getitem__', '__init_subclass__',
'__iter__', '__len__', '__members__', '__module__',
'__name__', '__qualname__']
Among these, we typically use the values, names, and the attributes of the class.
The DayOfWeek class is iterable:
>>> for day in DayOfWeek:
... print(day)
You can access members via their name using indexing:
>>> DayOfWeek["SUNDAY"]
>>> DayOfWeek["SUNDAI"]
Traceback (most recent call last):
...
KeyError: 'SUNDAI'
Additionally, the __contains__() method allows for membership checking:
>>> DayOfWeek.SUNDAY in DayOfWeek
True
>>> "SUNDAY" in DayOfWeek.__members__
True
>>> "SUNDAI" in DayOfWeek.__members__
False
Conclusion
In this guide, we explored the fundamentals of Python enumerations and highlighted their advantages, including improved readability, maintainability, and performance. If you aim to write clearer and more efficient code, consider implementing enumerations in your projects.
This video, titled "Refactoring Code with Enums: Enhance Your Python Projects," provides further insights on the benefits of using enums in Python programming.
In this video, "MetPy Mondays #283 - Demystifying Enum Data Type in Python," we demystify the enum data type and its applications in Python.