Uber Rider manages extensive datasets, whereas Uber Fleet handles significantly smaller volumes for experimental purposes. Imagine you conduct an AB test for Uber Fleet and discover the data distribution doesn't follow a normal pattern. How would you approach the analysis, and what criteria would you use to determine the winning variant?