2025-04-18 2020, Volume 29 Issue 4

  • Select all
  • Peng Wang , Rong Du , Qiying Hu

    In this paper, we examine how a merchant should choose between discount promotion (offering a discount through an online third-party promotion platform) and coupon promotion (issuing on-package coupons directly to consumers). We develop a two-period model in which the merchant optimizes the promotion decision in the first period and does not promote in the second period. We identify two consumer segments: informed consumers who are aware of the merchant’s offering at the beginning of the first period and know the true quality of the product, and uninformed consumers who are not aware of the merchant’s offering at the beginning of the first period and underestimate the product quality. Moreover, the merchant has access only to informed consumers when adopting coupon promotion or when choosing not to promote, while the merchant can access both informed consumers and uninformed consumers when offering discount promotion. In this setting, we find that the merchant should offer discount promotion when the quality estimation of uninformed consumers is large and/or when the proportion of informed consumers is small; otherwise, the merchant should adopt coupon promotion when the effect of coupons in the first period is large and choose not to promote when the effect of coupons in the first period is small.

  • Kejia Chen , Debiao Li , Xiao Wang

    This paper systematically studies the two-machine flow-shop scheduling problems with no-wait and deterministic unavailable interval constraints. To minimize the makespan, three integer programming mathematical models are formulated for two-machine flow-shop with no-wait constraint, two-machine flow-shop with resumable unavailable interval constraint, and two-machine flow-shop with no-wait and non-resumable unavailable interval constraints problems, respectively. The optimal conditions of solving the two-machine flow-shop with no-wait constraint problem by the permutation schedules, the two-machine flow-shop with resumable unavailable interval constraint problem by the Johnson algorithm, and two-machine flow-shop with no-wait and non-resumable unavailable interval constraints problem by the Gilmore and Gomory Algorithm (GGA) are presented, respectively. And the tight worst-case performance bounds of Johnson and GGA algorithms for these problems are also proved to be 2. Several instances are generated to demonstrate the proposed theorems. Based on the experimental results, GGA obtains the optimal solution for the two-machine flow-shop with no-wait constraint problem. Although it cannot reach the optimal solution for the two-machine flow-shop with resumable unavailable interval constraint problem, the optimal gap is 0.18% on average when the number of jobs is 100. Moreover, under some special conditions, it yields the optimal solution for the two-machine flow-shop with no-wait and non-resumable unavailable interval constraints problem. Therefore, GGA is an efficient heuristic to solve these problems.

  • Bukhoree Sahoh , Anant Choksuriwong

    Emergency events are unexpected and dangerous situations which the authorities must manage and respond to as quickly as possible. The main objectives of emergency management are to provide human safety and security, and Social Big Data (SBD) offers an important information source, created directly from eyewitness reports, to assist with these issues. However, the manual extraction of hidden meaning from SBD is both time-consuming and labor-intensive, which are major drawbacks for a process that needs accurate information to be produced in real-time. The solution is an automatic approach to knowledge discovery, and we propose a semantic description technique based on the use of triple store indexing for named entity recognition and relation extraction. Our technique can discover hidden SBD information more effectively than traditional approaches, and can be used for intelligent emergency management.

  • Haochuan Cui , An Zeng , Ying Fan , Zengru Di

    The most fundamental way to measure the impact of a scientific publication is using the number of citations it received. Though citation count and its variants are widely adopted, they have been pointed out to be poor proxies for a paper’s quality because a citation might result from different reasons. It is thus crucial to quantify the true relevance of the cited papers to the citing paper. There are already some efforts in the literature devoted to addressing this issue, yet a well-accepted method is still lacking, possibly due to the absence of standard ground truth data for comparing different methods. In this paper, we propose a simple method using a local diffusion process on citation networks for identifying the key references for each scientific publication. The effectiveness and of the method are validated in a subset of the American Physical Society data in which the key references are mentioned in the abstract of papers. We further define an effective citation metric for quantifying the actual impact of each paper and its evolution. The effective citation metric additionally reveals the citation preference of research at journal and country levels.

  • Canh Hao Nguyen

    Biological domain has been blessed with more and more data from biotechnologies as well as data integration tools. In the renaissance of machine learning and artificial intelligence, there is so much promise of data-driven biological knowledge discovery. However, it is not straight forward due to the complexity of the domain knowledge hidden in the data. At any level, be it atoms, molecules, cells or organisms, there are rich interdependencies among biological components. Machine learning approaches in this domain usually involves analyzing interdependency structures encoded in graphs and related formalisms. In this report, we review our work in developing new Machine Learning methods for these applications with improved performances in comparison with state-of-the-art methods. We show how the networks among biological components can be used to predict properties.

  • Nuo Xu , Xijin Tang

    Event evolution analysis which provides an effective approach to capture the main context of a story from explosive increased news texts has become the critical basis for many real applications, such as crisis and emergency management and decision making. Especially, the development of societal risk events which may cause some possible harm to society or individuals has been heavily concerned by both the government and the public. In order to capture the evolution and trends of societal risk events, this paper presents an improved algorithm based on the method of information maps. It contains an event-level cluster generation algorithm and an evaluation algorithm. The main work includes: 1) Word embedding representation is adopted and event-level clusters are chosen as nodes of the events evolution chains which may comprehensively present the underlying structure of events. Meanwhile, clusters that consist of risk-labeled events enable to illustrate how events evolve along the time with transitions of risks. 2) One real-world case, the event of “Chinese Red Cross”, is studied and a series of experiments are conducted. 3) An evaluation algorithm is proposed on the basis of indicators of map construction without massive human-annotated dataset. Our approach for event evolution analysis automatically generates a visual evolution of societal risk events, displaying a clear and structural picture of events development.

  • Rong Hou , Yongbo Xiao , Yan Zhu , Hongyan Zhao

    This study hopes to contribute to disease detection by analyzing a medical examination dataset with 123,968 samples. Based on association rules mining and related medical knowledge, 6 models were constructed here to predict hyperuricemia prevalence and investigated its risk factors. Comparing different models, the prediction performances of Lasso logistic regression, traditional logistic regression, and random forest are excellent, and the results can be interpreted. PCA logistic regression model also works well, but it is not analytical. KNN’s prediction performance is relatively poor, while data dimensionality reduction can significantly improve its AUC. SVC has the worst performance and its efficiency of processing highdimensional large dataset is extremely low. The risk factors of hyperuricemia mainly belongs to 4 categories, which are obesity-related factors, renal function factors, liver function factors, and myeloproliferative diseases-related factors. Random forest, Lasso regression, and logistic regression all treat serum creatinine, BMI, triglyceride, fatty liver, and age as key predictive variables. Models also show that serum urea, serum alanine aminotransferase, negative urobilinogen, red blood cell count, white blood cell count and the pH are significantly correlated with the risk.