Big refactoring to the tuner link
- Now user friendly
- All dependencies installed via pip (except from profilers)
- Automatic GPU vendor detection
- Possibility to add a time budget to indicate a desired duration of the tuning
- Biggest change: now independent steps (ensemble of kernels) are tuned togheter using multiple optuna studies at the same time
- Single run of the standalone benchmark
- Profiling of multiple step with the same run
- One optuna study per step, which will suggest a new configuration
- This heavily reduces the time needed for tuning