In this talk, we explore the data-flow programming approach for massive parallel computing on FPGA accelerator, where an algorithm is described as a data-flow graph and programmed with MaxJ from Maxeler Technologies. Such a directed graph consists of a small set of nodes and arcs. All nodes are fully pipelined and data moves along the arcs through the nodes. We have shown that we can implement complex algorithms like the Wilson Dirac operator from Lattice QCD. Our implementation collects all nearest neighbour terms on the four dimensional lattice to perform all arithmetic operations simultaneously.