With the tremendous growth in data generated around the world and with increasing demands to process this data efficiently, a large number of data processing systems have come into existence. These systems are usually distributed and have separate layers: client layer, compute layer, scheduler, and a storage layer. All these different layers communicate and exchange data amongst each other. With CPU, network, and storage hitting very high efficiencies, data movement has become a significant bottleneck in these systems. Moving large amounts of data around in distributed data processing systems affects overall query performance and consumes precious bandwidth and CPU cycles which could have been used for processing instead. We aim to tackle this problem using programmable storage, where we embed data processing libraries inside the storage layer of object storage systems and offload part of the query like filters and projections from the compute to the storage layer in order to reduce the data movement in the entire system. We are also working on extending our prototype to offload more operations like aggregates or joins and studying how to slice a query plan into 2 parts: one that is offloaded to the storage and one that executes on the compute layer. With very high-speed networks in data centers, traditional data transport frameworks have become a major source of bottleneck since they still rely on TCP/IP and require expensive (de)serialization for moving columnar data (de-facto in modern OLAP-style data processing systems). We propose using data transport frameworks that leverage hardware-accelerated transport protocols such as RDMA for moving data around since they avoid (de)serialization, copies, and saves CPU cycles. In a parallel ongoing effort, we are testing a hyper-dimensional query language, that aims to be user-friendly by reducing syntactic complexity, on high-energy physics data to ease the effort of writing queries on highly nested and jagged datasets.