python-csanal/README.md

91 lines
3.2 KiB
Markdown
Raw Normal View History

2018-01-14 02:23:42 +00:00
Car Sharing Analyser
====================
This is a collection of scripts and tools to gather, prepare and analyse data about a
German car sharing company.
2018-01-14 21:17:28 +00:00
Requirements
------------
* Python 3.6 or newer (3.5 will probably work, too)
* geopy
* SQLite 3.7.11 or newer commandline client
2018-01-14 02:23:42 +00:00
Gathering
---------
The script `getData.sh` collects a JSON dump from the webpage every 30 seconds and stores
that into the `data/` folder. A week's worth of data is about 3.3 GiB.
Edit the file first to configure your desired city. Then let it run in a tmux session.
Preparing
---------
`init_db.sh` will create a SQLite database file according to `sql/dbschema.sql`.
Make sure you have Python 3 installed. Install required Python packages by running:
sudo -H pip3 install -r requirements.txt
(Or use virtualenv, pipenv, venv, etc.)
Then run `import.py` to import all the JSON dumps into the database. If you continue the data
collection, you can run `import.py` later to import the new data. Successfully imported JSON
dumps can be deleted, of course.
Importing a week's worth of data takes about 5 minutes and results in a 14.5 MiB
SQLite3 database.
Analysing
---------
Run `calc_trips.py` to analyse car's state changes and calculate possible(!) trips from it.
The trips data is written back into the database. (Trips shorter than 70 seconds are filtered.
See notes below.) A week's worth of trip data increases the database by about 4 MiB.
You can also use this script as a starting point for your own analysing scripts.
(Pull requests are welcome.)
For working with the database itself, I recommend
[DB Browser for SQLite](http://sqlitebrowser.org/).
### Example Queries
To see the history of a specific car, you can run e.g. (`XYZ` = number plate):
SELECT * FROM car_history WHERE plate="XYZ";
To see cars in a specific area, you can run:
SELECT *
FROM car_history
WHERE latitude>=52.515652 AND longitude>=13.372373
AND latitude<=52.516813 AND longitude<=13.378115;
Also check out the views in the database.
Notes
=====
* The `distance_km` is beeline from starting point to end point. If somebody runs errands and parks
in the exact same spot, the distance is (almost) 0.
* You can't distinguish between these cases with distances of (almost) zero and no petrol spent:
* a car that made a short trip and parked in the same spot,
* a car that has been taken out of service for a while,
* a car that has been reserved (possible for up to 30 minutes) but the reservation expired
* "Trips" over several days are most probably cars that have been taken out of service.
* A car that has been "reserved" (for up to 30 minutes) disappears from the list of cars. The
reservation time is included in the trip's `duration_minutes`.
* The calculated prices don't factor in additional fees (airport, drop-off), the time a car was
"reserved", and are also calculated for cars taken offline, i.e. where no money was paid by
the customer.
* Smaller negative `fuel_spent` values are probably because the car was parked on a slope.
* The id for a car is it's number plate. Theoretically, a `plate` could be put on another
vehicle (with a different `vin`). But this is very unlikely.