Car Sharing Analyser ==================== This is a collection of scripts and tools to gather, prepare and analyse data about a German car sharing company. Requirements ------------ * Python 3.6 or newer (3.5 will probably work, too) * geopy * SQLite 3.7.11 or newer commandline client Gathering --------- The script `getData.sh` collects a JSON dump from the webpage every 30 seconds and stores that into the `data/` folder. A week's worth of data is about 3.3 GiB. Edit the file first to configure your desired city. Then let it run in a tmux session. Preparing --------- `init_db.sh` will create a SQLite database file according to `sql/dbschema.sql`. Make sure you have Python 3 installed. Install required Python packages by running: sudo -H pip3 install -r requirements.txt (Or use virtualenv, pipenv, venv, etc.) Then run `import.py` to import all the JSON dumps into the database. If you continue the data collection, you can run `import.py` later to import the new data. Successfully imported JSON dumps can be deleted, of course. Importing a week's worth of data takes about 5 minutes and results in a 14.5 MiB SQLite3 database. Analysing --------- Run `calc_trips.py` to analyse car's state changes and calculate possible(!) trips from it. The trips data is written back into the database. (Trips shorter than 70 seconds are filtered. See notes below.) A week's worth of trip data increases the database by about 4 MiB. You can also use this script as a starting point for your own analysing scripts. (Pull requests are welcome.) For working with the database itself, I recommend [DB Browser for SQLite](http://sqlitebrowser.org/). ### Example Queries To see the history of a specific car, you can run e.g. (`XYZ` = number plate): SELECT * FROM car_history WHERE plate="XYZ"; To see cars in a specific area, you can run: SELECT * FROM car_history WHERE latitude>=52.515652 AND longitude>=13.372373 AND latitude<=52.516813 AND longitude<=13.378115; Also check out the views in the database. Notes ===== * The `distance_km` is beeline from starting point to end point. If somebody runs errands and parks in the exact same spot, the distance is (almost) 0. * You can't distinguish between these cases with distances of (almost) zero and no petrol spent: * a car that made a short trip and parked in the same spot, * a car that has been taken out of service for a while, * a car that has been reserved (possible for up to 30 minutes) but the reservation expired * "Trips" over several days are most probably cars that have been taken out of service. * A car that has been "reserved" (for up to 30 minutes) disappears from the list of cars. The reservation time is included in the trip's `duration_minutes`. * The calculated prices don't factor in additional fees (airport, drop-off), the time a car was "reserved", and are also calculated for cars taken offline, i.e. where no money was paid by the customer. * Smaller negative `fuel_spent` values are probably because the car was parked on a slope. * The id for a car is it's number plate. Theoretically, a `plate` could be put on another vehicle (with a different `vin`). But this is very unlikely.