Mongo has a list of filtering options comparable to SQL. The following code shows how to connect to a local version of Mongo and query for all products with an inventory status of A:
df = spark.read.format("mongo").option("uri", "mongodb://127.0.0.1/products.inventory").load()
pipeline = "{'deviceid':'8ea23889-3677-4ebe-80b6-3fee6e38a42c'}"
df = spark.read.format("mongo").option("pipeline", pipeline).load() df.show()
The next example shows how to do a complicated filter followed by a group by operation. It finally sums the information. The output will show the count of items with a status of A:
pipeline = "[ { '$match': { 'status': 'A' } }, { '$group': { '_id': '$item', 'total': { '$sum': '$qty' } } } ]"
df = spark.read.format("mongo").option("pipeline", pipeline).load()
df.show()