Python wrapper for OS X
.zip file from GitHub.
I’m working on getting the library on PyPi soon.
File Metadata Query Expression Syntax
I have modeled the Python syntax on Apple’s original Spotlight query syntax. File metadata queries are constructed using a simple query language that takes advantage of Python’s flexible class construction. The syntax is relatively straightforward, including comparisons, language agnostic options, and time and date variables.
metadata library implements 3 custom classes (
MDExpression) to represent the various units of
mdfind’s Query Expression Syntax.
Query comparisons have the following basic format:
[attribute] [operator] [value]
The following sub-sections will describe these 3 elements more fully, but any such comparison will generate a
MDComparison object. If you ever want to see what a particular
MDComparison object will look like as an query string, you can coerce it into a unicode string using the
unicode() operation (or into a string using the
The first element of a query comparison is the attribute, which is a
MDAttribute object in
metadata automatically generates
MDAttribute objects for every Spotlight attribute on your system. You can view the names of all of these objects via
metadata.attributes variable. Attributes have a Pythonic naming scheme, so
MDAttribute class is built on top of the metadata information retrieved from
mdimport -A. If you wish to see all of the information for a metadata attributes, you can use the
As with all of the custom classes, you can coerce a
MDAttribute object into a unicode string using the
unicode() operation (i.e.
The operator can be any one of the following:
||less than (available for numeric values and dates only)|
||greater than (available for numeric values and dates only)|
||less than or equal (available for numeric values and dates only)|
||greater than or equal (available for numeric values and dates only)|
||numeric values within the range of min_value through max_value in the specified attribute|
!= operators allow for modification. These modifiers specify how the comparison is made.
||The comparison is case insensitive.|
||The comparison is insensitive to diacritical marks.|
Both modifiers are on by default. In order to turn one off, you need to set the property to
1 2 3 4
import metadata metadata.content_type.ignore_case = False comparison = metadata.content_type == 'com.adobe.pdf'
The value element of a query comparison can be a string or integer. Strings can use wildcard characters (
?) to make the search fuzzy. The
* character matches multiple characters whereas the
? wildcard character matches a single character (Note: Even in the Terminal, I cannot get wildcard searches with
? to function properly. I would recommend using
* as your ony wildcard character). Here are some examples demonstrating how the wildcards function:
1 2 3 4 5 6 7 8 9 10 11
# Matches attribute values that begin with “paris”. For example, matches “paris”, but not “comparison”. metadata.text_content == "paris*" # Matches attribute values that end with “paris”. metadata.text_content == "*paris" # Matches attributes that contain "paris" anywhere within the value. For example, matches “paris” and “comparison”. metadata.text_content == "*paris*" # Matches attribute values that are exactly equal to “paris”. metadata.text_content == "paris"
In order to use any of the greater-than or less-than operators, your value needs either to be an integer (or float) or a date object. In order to make the API as intuitive as possible,
metadata allows for human-readable date statements. That is, you do not need to pass
datetime objects as the value of a comparison with a date attribute (like
metadata uses the
parsedatetime library to convert human-readable dates into
datetime objects. The following are all acceptable date comparisons:
1 2 3 4 5
# Created before today metadata.creation_date < 'today' # Created after last month metadata.creation_date > 'one month ago'
metadata cannot parse your datetime string, it will raise an
Exception. The parsing engine is good, but not perfect and can seem capricious. For example,
one month ago is parsable, but
a month ago is not. Datetime strings that are parsed are converted into an ISO-8601-STR compliant string.
You can combine
MDComparison objects to create a more complex expression, represented by the
MDExpression class. Comparison objects can be combined in one of two ways: using a conjuction (
&) or using a disjuction (
|). Not only can
MDComparison objects be combined, but you can nest and combine any combination of
MDComparison objects and
MDExpression objects. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13
# query for audio files authored by “stephen” (ignoring case) metadata.authors == "stephen" & metadata.content_type == "public.audio" # query for audio files authored by “stephen” or “daniel” (metadata.authors == "daniel" | metadata.authors == "stephen") & metadata.content_type == "public.audio" # query for audio or video files authored by “stephen” or “daniel” (metadata.authors == "daniel" | metadata.authors == "stephen") & (metadata.content_type == "public.audio" | metadata.content_type == "public.video") # you could also break the last expression into chunks author_exp = (metadata.authors == "daniel") | (metadata.authors == "stephen") type_exp = (metadata.content_type == "public.audio") | (metadata.content_type == "public.video") final_exp = author_exp & type_exp
Here’s a complex expression to find only audio or video files that have been changed in the last week authored by someone named either “Stephen” or “Daniel” (ignoring case and diacritics, so it would match a file authored by “danièl”):
1 2 3 4
author_exp = (metadata.authors == "daniel") | (metadata.authors == "stephen") type_exp = (metadata.content_type == "public.audio") | (metadata.content_type == "public.video") time_comp = metadata.content_change_date == 'one week ago' query_expression = author_exp & type_exp & time_comp
Note: parentheses are needed for the first two expressions. Without them, you would get a
TypeError as Python thinks you are trying to combine the string
"daniel" with the
authors, which is an obviously unsupported expression.
Once you have created your query expression (or even a simple comarison), you will pass this to
metadata.find() in order to execute the file searching.
The main function is
metadata.find(). It takes one required argument,
query_expression, which can be either an
MDExpression object or an
MDComparison object. In addition to this one required argument,
metadata.find() also has the optional argument
only_in for you to focus the scope of your search to a particular directory tree. This simply needs to be a full (non-relative) path passed as a Unicode string. Other than that, there’s nothing else to it. Build you query expression, pass it to
find() and get your results as a Python list. Here’s an example of building the sample expression above and passing it to
1 2 3 4 5 6 7
import metadata author_exp = (metadata.authors == "daniel") | (metadata.authors == "stephen") type_exp = (metadata.content_type == "public.audio") | (metadata.content_type == "public.video") time_comp = metadata.content_change_date == 'one week ago' query_expression = author_exp & type_exp & time_comp results = metadata.find(query_expression)
In addition to
metadata module has the
list function, which is a wrapper around the
mdls command. You simply pass it a file path and it returns a dictionary of metadata attributes and values. Once again, the attribute names (the dictionary keys) are simplified using the algorithm used to convert Spotlight attributes to Pythonic names.
1 2 3 4
import metadata file_metadata = metadata.list(file_path) print(file_metadata['name'])
Finally, there is an alpha version of a
write() function, which allows you to write metadata to a file. Right now, I have it defaulted to writing to the
kMDItemUserTags attribute, but a few others have worked. I need to test it more to make it more general.