To get you started, we've prepared queries for you to run against the Ride Sharing dataset on the Varada catalog. These queries include various levels of selectivity, including a full scan query with no filters, a cohort analysis query, a selective query, and a query with a selective join.
Feel free to create queries of your own, or to edit these queries by changing the projected columns, the filtered columns, and the filter values.
This query looks at all the trips between 23-25 March 2018 and counts the number of unique trips per day.
SELECT extract(day from "d_date") as day, extract(month from "d_date") as month, count(distinct "tripid") as cnt_trips FROM trips_data_sample.trips_data WHERE "d_date" between date '2018-03-23' and date '2018-03-25' GROUP BY 1,2
This query looks at the cohort of 18-19 years old riders who took a ride on Thursdays. The query reports the trip distribution by rider gender.
SELECT "rider_gender", count(*) as cnt FROM trips_data_sample.trips_data WHERE "rider_age" IN (18,19) AND "d_weekday" = 5 GROUP BY 1 order by 2 desc;
This query looks at all the trips for rider_id
3380311 that took place between 7:00am-10:00am in a specific geographic fence, and reports each trip’s exact route, duration, fare, and more.
SELECT "tripid", "rider_id", "d_date", "t_hour", "fare", "duration", "distance", "rider_age", "rider_gender", "lon", "lat" FROM trips_data_sample.trips_data WHERE "rider_id" = 3380311 And lon < -122.4121814 And lon > -122.4284 And lat > 37.7660289 And lat < 37.7760289 AND "t_hour" between 7 and 10
This query joins the trips_data table with the riders_campaign dimension table by rider_id, and looks for riders who had fewer than nine trips.
The query shows the following stats by month:
- Number of unique trips
- Number of unique riders
- Average trip fare
- Average trip duration
SELECT extract(month from fact."ts") as month, count(distinct fact."tripid") as cnt_unq_trips , count(distinct fact."rider_id") as cnt_unq_riders, round(avg(fact."fare"),2) as avg_fare, round(avg(fact."duration")/60,2) as avg_duration_min FROM trips_data_sample.trips_data as fact, trips_data_sample.riders_campaign as dim WHERE dim."rider_id" = fact."rider_id" AND dim."segment" IN ('Students_Promotion','Churning_Riders_Last3Months') AND dim."num_trips" < 9 group by 1 order by 1;
This query gets the latitude, longitude, timestamp, and ID of each of the first 100 points of trip number 271004853.
SELECT lat, lon, ts, point_id FROM trips_data_sample.trips_data trips_data WHERE (trips_data.tripid = 271004853) ORDER BY point_id LIMIT 100
This query counts the unique trips in a specified geographic fence.
SELECT COUNT(DISTINCT trips_data.tripid) AS ctd_tripid_ok, trips_data.src_zone AS src_zone FROM trips_data_sample.trips_data trips_data WHERE (trips_data.distance >= 0.2341123202347175) AND (trips_data.t_hour BETWEEN 6 AND (6 + 3)) AND (trips_data.lon >= -122.4284) AND (trips_data.lon <= -122.4121814 + 0.0007764634342138149) AND (trips_data.lat >= 37.7660289) AND (trips_data.lat <= 37.7760289 + 0.0002670984564692044) GROUP BY 2
This query counts the trips with fares higher than 26.8136223835386, and reports the number of unique riders, and the average fare.
SELECT COUNT(*) AS num_trips, COUNT(DISTINCT rider_id) AS num_riders, AVG(fare) AS avg_fare FROM trips_data_sample.trips_data trips_data WHERE fare >= 26.8136223835386 AND last_point = 1 LIMIT 1
This query counts the number of trips made per hour by riders aged between18-22 years old during the weekends.
SELECT trips_data.t_hour, COUNT(*) FROM trips_data_sample.trips_data trips_data WHERE ( (trips_data.t_hour >= CAST ( CASE WHEN TRY_CAST((SPLIT_PART('20,24', ',', 1)) AS DOUBLE) >= 0 THEN FLOOR(TRY_CAST((SPLIT_PART('20,24', ',', 1)) AS DOUBLE)) ELSE CEIL(TRY_CAST((SPLIT_PART('20,24', ',', 1)) AS DOUBLE)) END AS BIGINT)) AND (trips_data.t_hour <= CAST ( CASE WHEN TRY_CAST((SPLIT_PART('20,24', ',', 2)) AS DOUBLE) >= 0 THEN FLOOR(TRY_CAST((SPLIT_PART('20,24', ',', 2)) AS DOUBLE)) ELSE CEIL(TRY_CAST((SPLIT_PART('20,24', ',', 2)) AS DOUBLE)) END AS BIGINT)) AND (trips_data.d_weekday >= 5) AND (trips_data.d_weekday <= 7) AND (trips_data.rider_age >= 18) AND (trips_data.rider_age < 22) AND (trips_data.last_point = 1)) GROUP BY 1 ORDER BY 2
Updated 4 months ago