f11Stat841proposal: Difference between revisions
No edit summary |
|||
Line 60: | Line 60: | ||
We would like to scan the selected trademarks first, after obtaining all the desired trademarks, we then can do further statistical analysis. Our major goals are to help customers easily distinguish the specific industry by just looking at the color the trademark and also help new entrants who want to enter the market have a better knowledge of their competitors. | We would like to scan the selected trademarks first, after obtaining all the desired trademarks, we then can do further statistical analysis. Our major goals are to help customers easily distinguish the specific industry by just looking at the color the trademark and also help new entrants who want to enter the market have a better knowledge of their competitors. | ||
The possible software and tools we would like to use include: R, Matlab. | The possible software and tools we would like to use include: R, Matlab. | ||
Revision as of 21:40, 5 October 2011
==Project 1 : Title == Classification of Disease Status
By: Lai,ChunWei and Greg Pitt
For our classification project, we are proposing an application in the medical diagnosis field: For each patient or lab animal, there will be results from a large number of genetic and/or chemical tests. We should be able to predict the disease state of the patient/animal, based on the presence or absence of certain biomarkers and/or chemical markers.
Our project work will include the reduction of dimensionality, and the development or one or more classification functions, with the objectives of minimizing the error rate and also reducing the number of markers required in order to make good predictions. Our results could be used at the patient level, to help make accurate diagnoses, and at the population health level, to make epidemiological surveys of the prevalence of certain medical conditions. In both cases, the results should enable the healthcare system to make better decisions regarding the deployment of scarce healthcare resources.
Our methodology will be chosen soon, after we have seen a few more examples in class. If time permits, we will also attempt a novel classification procedure of our own design.
Currently we have access to a dataset from the SSC data mining section, and we hope to be able to get access to some similar, but larger, datasets before the end of the term.
The software tools that we use will probably include Matlab, Python, and R.
We would like to obtain publishable results if possible, but this is not a primary objective.
Proposal 2: The Golden Retrieber
By Cameron Davidson-Pilon and Jenn Smith
Our goal of this project is to determine statistical results from the population of Twitter users that have a specific celebrity in their display picture. Our algorithm will scan through Twitter's display pictures, and attempt to determine whether a display picture features Canada's most famous icon: Justin Beiber. We will hope that most images contain his trademark swoosh hairstyle, as much of or classification will rely on such handsome features.
After we determine, with some probability of error, that a user has a Beiber Display Pics (BDP), we can then do a statistical analysis on the sample population's tweets, hashtags etc.
Applications of this algorithm include be the Twitter behaviour of Bieber fans. It can be used in an app for companies that want to target such demographics.
We will be using Matlab and Python.
Project 3 : Classifying Melanoma Images
By: Robert Amelard
The current diagnosis of melanoma is a very subjective method. Some popular methods for diagnosing are the 7-point checklist and the ABCD rubric. They are both based on very subjective criteria, such as the "irregularity" of a skin lesion. My project will attempt to classify an input image containing a skin lesion into the class "benign" or "malignant" based on features that are regarded as high risk in these rubrics. This will help doctors come to more justified diagnoses of patients.
Project 4 : classifying trademarks
By: Chen Wang; YuanHong Yu; Jia Zhou
Our group decided to use statistical classification methods to distinguish various types of trademarks within an industry, and thus attempt to determine which is the most popular color that is being used by manufactures globally. We would like to scan the selected trademarks first, after obtaining all the desired trademarks, we then can do further statistical analysis. Our major goals are to help customers easily distinguish the specific industry by just looking at the color the trademark and also help new entrants who want to enter the market have a better knowledge of their competitors. The possible software and tools we would like to use include: R, Matlab.