Ride Sharing Dataset Overview

The public Ride Sharing dataset is a great way for you to get hands-on experience working with Varada to accelerate your data analytics workloads.

The dataset was is inspired by a popular ride-sharing app. It includes many trip attributes, such as the trip duration, distance, and fare, as you can see in the following Tableau report that analyzes these dataset attributes.

Dataset Tables

The Ride Sharing dataset includes two tables:

  • trips_data: A fact table of 4.4B rows, in which each row represents key locations on a trip route. One trip can contain multiple rows. This table includes the trip details, such as the trip route, trip duration, driver ID, driver details, and fare. The table is partitioned by the d_date column.

  • riders_campaign: A dimension table containing 1501 rows of marketing campaign details, such as specific segments of the ride-sharing application user base targeted in each marketing campaign.

The data for both tables is stored in an S3 bucket as parquet files with snappy compression.

Trips Data Table Schema

The trips_data table schema is:

Column Name

Column Type

Descripion

Sample Value

tripid

Integer

The unique identifier for the trip.

33820

t_hour

Integer

The start hour of the trip.

23

t_min

Integer

The start minute of the trip.

38

d_weekday

Integer

The day of the week of the trip.

3

t_start_ts

Timestamp

The start timestamp of the trip.

2018-01-04 23:59:38

fare

Double

The fare for the trip.

16.3217

src_zone

Integer

The source zone for the trip.

3431

dst_zone

Integer

The destination zone for the trip.

4304

duration

Double

The duration of the trip.

1117.33

distance

Double

The distance of the trip.

5.61613655

rider_id

Integer

The identifier of the rider.

1043298

rider_age

Integer

The age of the rider.

24

rider_first

String

The first name of the rider.

Fabian

rider_last

String

The last name of the rider.

Pearce

rider_gender

String

The gender of the rider.

M

driver_id

Integer

The identifier of the driver.

31100

driver_age

Integer

The age of the driver.

60

driver_first

String

The first name of the driver.

Jeremy

driver_last

String

The last name of the driver.

COX

driver_gender

String

The gender of the driver.

M

ts

Timestamp

The timestamp of the point in the trip course.

2018-01-05 0:17:36

lon

Double

The longitude of the point in the trip course.

-122.4427719

lat

Double

The latitude of the point in the trip course.

37.74106216

point_id

Integer

The identifier for the point in the trip course.

265

last_point

Integer

An indicator of the last point in the trip course.

0

d_date

Date

The date of the trip. Partition column

2018-01-04

Riders Campaign Table Schema

The riders_campaign table schema is:

Column Name

Column Type

Description

Sample Value

rider_id

Integer

The identifier of the rider.

4923013

num_trips

Integer

The number of trips the rider took as part of the campaign.

8

segment

String

The name of the campaign.

Churning_Riders_Last3Months