We introduce a data distribution scheme for $\mathcal{H}$-matrices and a distributedmemory algorithm for $\mathcal{H}$-matrix-vector multiplication. Our data distribution scheme avoids an expensive Ω(P2) scheduling procedure used in previous work, where P is the number of processes, while data balancing is well-preserved. Based on the data distribution, our distributed-memory algorithm evenly distributes all computations among P processes and adopts a novel tree-communication algorithm to reduce the latency cost. The overall complexity of our algorithm is $\mathcal{O}\left(\frac{N\mathrm{l}\mathrm{o}\mathrm{g}N}{P}+\alpha \mathrm{l}\mathrm{o}\mathrm{g}P+\beta {\mathrm{l}\mathrm{o}\mathrm{g}}^{2}P\right)$ for $\mathcal{H}$-matrices under weak admissibility condition, where N is the matrix size, α denotes the latency, and β denotes the inverse bandwidth. Numerically, our algorithm is applied to address both two- and three-dimensional problems of various sizes among various numbers of processes. On thousands of processes, good parallel efficiency is still observed.