Research Papers Classification System: Difference between revisions
Jump to navigation
Jump to search
(Created page with "== Presented by == Jill Wang, Junyi Yang, Yu Min Wu, Chun Kit (Calvin) Li == Introduction == This paper introduces a paper classification system that utilizes the Term freque...") |
No edit summary |
||
Line 1: | Line 1: | ||
== Please Do NOT Edit This Summary == | |||
== Presented by == | == Presented by == | ||
Jill Wang, Junyi Yang, Yu Min Wu, Chun Kit (Calvin) Li | Jill Wang, Junyi Yang, Yu Min Wu, Chun Kit (Calvin) Li |
Revision as of 17:03, 24 November 2020
Please Do NOT Edit This Summary
Presented by
Jill Wang, Junyi Yang, Yu Min Wu, Chun Kit (Calvin) Li
Introduction
This paper introduces a paper classification system that utilizes the Term frequency-inverse document frequency (TF-IDF), Latent Dirichlet Allocation (LDA), and K-means clustering. The most important technology the system used to process big data is the Hadoop Distributed File Systems (HDFS). The system can handle quantitatively complex research paper classification problems efficiently and accurately.
General Framework
Data Preprocessing
Crawling of Abstract Data Managing Paper Data
Topic Modeling Using LDA
Term Frequency Inverse Document Frequency (TF-IDF) Calculation
Term Frequency (TF)
Document Frequency (DF)
Inverse Document Frequency (IDF)