BigData Hadoop IEEE TITLES 2015-2016
Data mining With Big Data(Hadoop+Mango Db)
|
1
|
XT-1
|
An Incremental and
Distributed Inference Method for Large-Scale Ontologies Based on MapReduce
Paradigm
|
2015
|
|
2
|
XT-2
|
Self-Adjusting Slot
Configurations for Homogeneous and Heterogeneous Hadoop Clusters
|
2015
|
|
3
|
XT-3
|
Hadoop Recognition of
Biomedical Named Entity Using Conditional Random Fields
|
2015
|
|
4
|
XT-4
|
Real-Time Big Data
Analytical Architecture for Remote Sensing Application
|
2015
|
|
5
|
XT-5
|
DyScale: a MapReduce Job
Scheduler for Heterogeneous Multicore Processors
|
2015
|
|
6
|
XT-6
|
CaCo: An Efficient Cauchy
Coding Approach for Cloud Storage Systems
|
2015
|
|
7
|
XT-7
|
Cost-Effective Resource
Provisioning for MapReduce in a Cloud
|
2015
|
|
8
|
XT-8
|
PRISM: Fine-Grained
Resource-Aware Scheduling for MapReduce
|
2015
|
|
9
|
XT-9
|
RFHOC: A Random-Forest
Approach to Auto-Tuning Hadoop’s Configuration
|
2015
|
|
10
|
XT-10
|
HFSP: Bringing Size-Based
Scheduling To Hadoop
|
2015
|
|
11
|
XT-11
|
FiDoop: Parallel Mining of
Frequent Itemsets Using MapReduce
|
|
|
12
|
XT-12
|
Processing Cassandra
Datasets with Hadoop-Streaming Based Approaches
|
|
|
13
|
XT-13
|
Efficient Motif Discovery
for Large-Scale Time Series in Healthcare
|
|
|
14
|
XT-14
|
Hadoop Performance Modeling
for Job Estimation and Resource Provisioning
|
|
|
15
|
XT-15
|
Virtual Shuffling for
Efficient Data Movement in MapReduce
|
|
BigData Hadoop IEEE TITLES 2015-2016
Data mining With Big Data(Hadoop+Mango Db)
Data Mining with Big
Data
ABSTRACT:
Big Data concern large-volume, complex, growing data
sets with multiple, autonomous sources. With the fast development of
networking, data storage, and the data collection capacity, Big Data are now
rapidly expanding in all science and engineering domains, including physical,
biological and biomedical sciences. This paper presents a HACE theorem that
characterizes the features of the Big Data revolution, and proposes a Big Data
processing model, from the data mining perspective. This data-driven model involves
demand-driven aggregation of information sources, mining and analysis, user
interest modeling, and security and privacy considerations. We analyze the
challenging issues in the data-driven model and also in the Big Data
revolution.
EXISTING SYSTEM:
Ø The rise of Big Data applications where data
collection has grown tremen dously and is beyond the ability of commonly used
software tools to capture, manage, and process within a “tolerable elapsed
time.” The most fundamental challenge for Big Data applications is to explore
the large volumes of data and extract useful information or knowledge for
future actions. In many situations, the knowledge extraction process has to be
very efficient and close to real time because storing all observed data is
nearly infeasible.
Ø The unprecedented data volumes require an effective
data analysis and prediction platform to achieve fast response and real-time
classification for such Big Data.
DISADVANTAGES
OF EXISTING SYSTEM:
] The
challenges at Tier I focus on data accessing and arithmetic computing
procedures. Because Big Data are often stored at different locations and data
volumes may continuously grow, an effective computing platform will have to
take distributed large-scale data storage into consideration for computing.
] The
challenges at Tier II center around semantics and domain knowledge for
different Big Data applications. Such information can provide additional
benefits to the mining process, as well as add technical barriers to the Big
Data access (Tier I) and mining algorithms (Tier III).
] At
Tier III, the data mining challenges concentrate on algorithm designs in
tackling the difficulties raised by the Big Data volumes, distributed data
distributions, and by complex and dynamic data characteristics.
PROPOSED SYSTEM:
Ø We propose a HACE theorem to model Big Data
characteristics. The characteristics of HACH make it an extreme challenge for
discovering useful knowledge from the Big Data.
Ø The HACE theorem suggests that the key characteristics
of the Big Data are 1) huge with heterogeneous and diverse data sources, 2)
autonomous with distributed and decentralized control, and 3) complex and
evolving in data and knowledge associations.
Ø To support Big Data mining, high-performance computing
platforms are required, which impose systematic designs to unleash the full
power of the Big Data.
ADVANTAGES
OF PROPOSED SYSTEM:
Provide most relevant and most accurate
social sensing feedback to better understand our society at realtime.
SYSTEM ARCHITECTURE:
SYSTEM CONFIGURATION:
HARDWARE CONFIGURATION:
] Processor - Pentium
IV
] Speed - 1.1 Ghz
] RAM - 512
MB (min)
] Hard
Disk - 20GB
] Keyboard - Standard
Keyboard
] Mouse - Two
or Three Button Mouse
] Monitor - LCD/LED
Monitor
SOFTWARE CONFIGURATION:
ü Operating
System - Windows XP/7
ü Programming
Language - Java/J2EE
ü Software
Version - JDK 1.7 or above
ü Database - MYSQL
REFERENCE:
Xindong Wu, Fellow, IEEE, Xingquan Zhu, Senior
Member, IEEE, Gong-Qing Wu, and Wei Ding, Senior Member, IEEE, “Data Mining
with Big Data”, IEEE TRANSACTIONS ON
KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 1, JANUARY 2014.
Discovering Emerging Topics in Social Streams via Link Anomaly Detection
Discovering Emerging Topics in Social Streams via
Link-Anomaly Detection
ABSTRACT:
Detection of
emerging topics is now receiving renewed interest motivated by the rapid growth
of social networks. Conventional-term-frequency-based approaches may not be appropriate
in this context, because the information exchanged in social-network posts
include not only text but also images, URLs, and videos. We focus on emergence
of topics signaled by social aspects of theses networks. Specifically, we focus
on mentions of user links between users that are generated dynamically
(intentionally or unintentionally) through replies, mentions, and retweets. We
propose a probability model of the mentioning behavior of a social network
user, and propose to detect the emergence of a new topic from the anomalies
measured through the model. Aggregating anomaly scores from hundreds of users,
we show that we can detect emerging topics only based on the reply/mention
relationships in social-network posts. We demonstrate our technique in several
real data sets we gathered from Twitter. The experiments show that the proposed
mention-anomaly-based approaches can detect new topics at least as early as
text-anomaly-based approaches, and in some cases much earlier when the topic is
poorly identified by the textual contents in posts.
EXISTING SYSTEM:
Ø A new (emerging) topic is something people feel like
discussing, commenting, or forwarding the information further to their friends.
Conventional approaches for topic detection have mainly been concerned with the
frequencies of (textual) words.
DISADVANTAGES
OF EXISTING SYSTEM:
A
term-frequency-based approach could suffer from the ambiguity caused by
synonyms or homonyms. It may also require complicated preprocessing (e.g.,
segmentation) depending on the target language. Moreover, it cannot be applied
when the contents of the messages are mostly nontextual information. On the
other hand, the “words” formed by mentions are unique, require little
preprocessing to obtain (the information is often separated from the contents),
and are available regardless of the nature of the contents.
PROPOSED SYSTEM:
Ø In this paper, we have proposed a new approach to
detect the emergence of topics in a social network stream.
Ø The basic idea of our approach is to focus on the
social aspect of the posts reflected in the mentioning behavior of users
instead of the textual contents.
Ø We have proposed a probability model that captures
both the number of mentions per post and the frequency of mentionee.
ADVANTAGES
OF PROPOSED SYSTEM:
Ø The proposed method does not rely on the textual
contents of social network posts, it is robust to rephrasing and it can be
applied to the case where topics are concerned with information other than
texts, such as images, video, audio, and so on.
Ø The proposed link-anomaly-based methods performed even
better than the keyword-based methods on “NASA” and “BBC” data sets.
SYSTEM
REQUIREMENTS:
HARDWARE REQUIREMENTS:
Ø
System : Pentium IV 2.4 GHz.
Ø
Hard Disk :
40 GB.
Ø
Floppy Drive : 1.44
Mb.
Ø
Monitor : 15
VGA Colour.
Ø
Mouse :
Logitech.
Ø Ram : 512 Mb.
SOFTWARE
REQUIREMENTS:
Ø Operating system : Windows
XP/7.
Ø Coding Language : JAVA/J2EE
Ø IDE : Netbeans 7.4
Ø Database : MYSQL
REFERENCE:
Toshimitsu
Takahashi, Ryota Tomioka, and Kenji Yamanishi, Member, IEEE,“Discovering Emerging Topics in Social Streams
via Link-Anomaly Detection”, IEEE
TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 1, JANUARY 2014.
No comments:
Post a Comment