stages d`initiation à la recherche / internships M. Vazirgiannis

Transcription

stages d`initiation à la recherche / internships M. Vazirgiannis
stages d'initiation à la recherche / internships
M. Vazirgiannis
October 2008-10-18
Titre
Thématique
Laboratoire,
institution et
université
Ville et pays
Équipe ou
projet dans le
labo
Nom et
adresse
électronique du
directeur de
stage
Nom et
adresse
électronique du
directeur du
laboratoire
Présentation
générale du
domaine (5 à
10 lignes)
Economic applications of social and P2P networks-Distributed
contextual advertising.
informatique distribuée, algorithmique
LIX, Polytechnique in collaboration with
DB-NET lab, in Athens U. of Economics & Business (Athens,
Greece) (http://www.db-net.aueb.gr)
Athens - Greece
LIX
Michalis Vazirgiannis
http://www.db-net.aueb.gr/michalis
[email protected] & [email protected]
J.M. Steyaert
[email protected]
In the last decade the growth of online advertising market is historic
with year growth rate more than 20%. Despite that growth, an
oligopoly of few advertising companies, mainly search engine
firms, share the online advertising market. The most popular
internet advertising models for advertisements distribution, like
Google AdSense, based on a powerful authority center which has
the control over the advertisements distribution. At the same time
with the expansion of online markets, new internet communication
platforms have emerged with more prominent the peer-to-peer
systems (P2P) and social networks with main characteristic the
decentralized nature of the network formation and entities
communication.
References
C. Dellarocas. Reputation Mechanisms, in Handbook on
Information Systems and Economics, T. Hendershott (ed.), Elsevier
Publishing, forthcoming, 2006
Despotovic Z. and Aberer K., 2004, Possibilities for Managing
Trust in P2P Networks. Swiss Federal Institute of Technology
(EPFL) Technical Report IC/2004/84, Lausanne, Switzerland.
Nikhil R. Devanur, Christos H. Papadimitriou, Amin Saberi, and
Vijay V. Vazirani. Market equilibrium via a primal-dual-type
algorithm. In FOCS, 2002.
Objectifs du
stage (10 à 20
lignes)
Compétences
espérées
Books:
Algorithmic Game theory,
http://www.cambridge.org/journals/nisan/downloads/Nisan_Nonprintable.pdf
The objective is to provide novel market based methods for the
exploitation of economic applications of social and P2P networks.
We propose the issue of distributed contextual advertising, where a
group of peers/web-sites buy and sell advertising links with each
other. All sites could serve both as ad publishers advertising other
sites, and as advertisers. The participants (peers) considered as
economic agents and the general network concept implemented by a
distributed communication platform, with a social network
structure. The advertisements allocations have to emerge from the
end-to-end communication among the participants and there is not
central authority responsible for the advertisements assignments.
Proposed methodologies are trust in P2P networks, reputation
mechanisms in economic transactions with moral hazard, truthful
mechanisms and algorithms for market equilibria.
JAVA, MATLAB, familiarity with mathematics, game theory,
distributed systems
stages d'initiation à la recherche / internships
Titre
Thématique
Laboratoire,
institution et
université
Ville et pays
Équipe ou projet
dans le labo
Nom et adresse
électronique du
directeur de stage
Non linear Dimensionality Reduction
Algorithms
LIX, Polytechnique, in collaboration with
DB-NET lab, in Athens U. of Economics & Business (Athens,
Greece) (http://www.db-net.aueb.gr)
Athens - Greece
LIX
Nom et adresse
électronique du
directeur du
laboratoire
Présentation
générale du
domaine (5 à 10
lignes)
J.M. Steyaert
[email protected]
Objectifs du stage
(10 à 20 lignes)
Compétences
espérées
Michalis Vazirgiannis
http://www.db-net.aueb.gr/michalis
[email protected] & [email protected]
Methods of dimensionality reduction provide a way to
understand and visualize the structure of complex data sets.
Traditional methods like principal component analysis and
classical metric multidimensional scaling suffer from being
based on linear models. Until recently, very few methods were
able to reduce the data dimensionality in a nonlinear way.
However, since the late nineties, many new methods have been
developed and nonlinear dimensionality reduction, also called
manifold learning, has become a hot topic. New advances that
account for this rapid growth are, e.g. the use of graphs to
represent the manifold topology, and the use of new metrics like
the geodesic distance. In addition, new optimization schemes,
based on kernel techniques and spectral decomposition, have
lead to spectral embedding, which encompasses many of the
recently developed methods.
The purpose of this essay is manifold. Firstly to summarize
clear facts and ideas about well-known methods as well as
recent developments in the topic of nonlinear dimensionality
reduction. With this goal in mind, methods will all be described
from a unifying point of view, in order to highlight their
respective strengths and shortcomings. For each method, the
description will start from intuitive ideas, develop the necessary
mathematical details, and end by outlining the algorithmic
implementation. Methods will all be compared with each other
with the help of different illustrative examples. An the y will be
seen from a unifying point of view, facilitating a thorough
comparison. Finally the most prominent methods will be
compared in a real life problem (i.e. text mining/classification),
aiming to reduce the dimensionality of numerical databases.
- Sound background in Linear Algebra
- Good programming skill (matlab / C or C++)
stages d'initiation à la recherche / internships
Predictive modelling large evolving graphs - the case of
WWW
Thématique
bases de données, machine learning
Laboratoire,
LIX, Polytechnique
institution et
in collaboration with
université
DB-NET lab, in Athens U. of Economics & Business (Athens,
Greece) (http://www.db-net.aueb.gr)
Ville et pays
Athens - Greece
Équipe ou projet LIX
dans le labo
Nom et adresse
Michalis Vazirgiannis
électronique du
http://www.db-net.aueb.gr/michalis
directeur de
[email protected] & [email protected]
stage
Nom et adresse
J.M. Steyaert
électronique du
[email protected]
directeur du
laboratoire
Présentation
The Web is a highly dynamic structure continuously changing, as
générale du
web pages and hyperlinks are created, deleted, or modified. Due
domaine (5 à 10 to its immense size, dynamism and economic interest, significant
lignes)
research and industrial efforts have been devoted in the last
decade towards effective web management and search. Specific
research areas include text mining, social network analysis,
computational linguistics, business and marketing intelligence,
graph theory and data visualization. The main activity on the Web
is searching for content that matches users’ keyword-based
queries. Ranking of the results is a cornerstone process enabling
users to effectively retrieve relevant and important information.
Titre
PageRank is the dominant algorithm used for ranking web search
results and has received significant attention in the related
research. PageRank computation, considering search engine scale
graph sizes (at the order of 109 nodes) however, is a task that
requires tremendous computing resources as, in principle; it
involves repetitive matrix multiplications of the aforementioned
graph size. Moreover, the ranking algorithm should be applied on
recent web graph snapshots in order to guarantee fresh and
accurate ranking results. Although several techniques for
accelerating PageRank computations or estimating unknown parts
of the web structure have been proposed the motivation for
prediction of web page ranking is valid and justified. Assuming
high quality page rank predictions, search engines can be partially
relieved from the huge effort of continuous crawling, indexing
and page rank computations with regards to the already known
part of the web graph.
Objectifs du
stage (10 à 20
We propose a research project that aims at designing a framework
for predicting the ranking position of a Web page based on
lignes)
previous rankings. Assuming a set of successive past top-k
rankings, we study the evolution of Web pages’ rankings in terms
of ranking trend sequences. We learn predictors from these
sequences and use them to predict future rankings of pages in
query based top-k rankings. Different methods (such as Markov
models, Regression, EM based clustering) have been exploited to
learn the optimal predictors for each case,
On the other hand multiple pages’ features will be used to learn
predictors of higher precision. An interesting issue that will be
researched is the temporal robustness of the predictions.
The prediction quality is quantified as the similarity between the
predicted and the actual rankings and compared as well to
alternative baseline predictors. The framework will undergo
extensive experiments on real world datasets for global and querybased top-k rankings.
The expected outcomes – assuming successful and robust rank
predictions include: a. enabling more effective search engine
resource management in terms of crawling, index update, recomputation of ranking etc. and b. effective ads pricing policy for
pages that are predicted to increase/decrease their rank. Another
potential outcome can be historical top-k queries – i.e. querying
the past snapshots of the web graph.
Compétences
espérées
Context: Assume web pages p_i, each characterized by features a
set of features F={f_j} with values F(p_i), and time ordered
rankings, r(p_i, F(p_i), t_k) where t_k a time instance (assume
t_k+1 > t_k). The objective is to learn the weights of a hidden
ranking function W = sum(w_i * f_i) and subsequently to learn
predictive models in order to predict r(p_i, F(p_i), t_k+o), where o
in {1,2,3…}.
Candidate prediction approaches:
- Markov models
- Multiclass learning
- Unsupervised approaches
- Good understanding of Linear Algebra
- Analytical abilities
- Good programming skills (web programming, Java, SQL).

Documents pareils