To address the challenge of achieving decentralized, scalable, and adaptive control for large-scale multiple unmanned aerial vehicle (multi-UAV) swarms in dynamic urban environments with obstacles and wind perturbations, we proposed a hybrid framework integrating adaptive reinforcement learning (RL), multi-modal perception fusion, and enhanced pigeon flock optimization (PFO) with curiosity-driven exploration to enable robust autonomous and formation control. The framework leverages meta-learning to optimize RL policies for real-time adaptation, fuses sensor data for precise state estimation, and enhances PFO with learned leader-follower dynamics and exploration rewards to maintain cohesive formations and explore uncertain areas. For swarms of 10–30 UAVs, it achieves 34% faster convergence, 61% reduced stability root mean square error (RMSE), 88% fewer collisions and 85.6%–92.3% success rates in target detection and encirclement, outperforming standard multi-agent RL, pure PFO, and single-modality RL. Three-dimensional trajectory visualizations confirm cohesive formations, collision-free maneuvers, and efficient exploration in urban search-and-rescue scenarios. Innovations include meta-RL for rapid adaptation, multi-modal fusion for robust perception, and curiosity-driven PFO for scalable, decentralized control, advancing real-world multi-UAV swarm autonomy and coordination.