stages d`initiation à la recherche / internships M. Vazirgiannis
Transcription
stages d`initiation à la recherche / internships M. Vazirgiannis
stages d'initiation à la recherche / internships M. Vazirgiannis October 2008-10-18 Titre Thématique Laboratoire, institution et université Ville et pays Équipe ou projet dans le labo Nom et adresse électronique du directeur de stage Nom et adresse électronique du directeur du laboratoire Présentation générale du domaine (5 à 10 lignes) Economic applications of social and P2P networks-Distributed contextual advertising. informatique distribuée, algorithmique LIX, Polytechnique in collaboration with DB-NET lab, in Athens U. of Economics & Business (Athens, Greece) (http://www.db-net.aueb.gr) Athens - Greece LIX Michalis Vazirgiannis http://www.db-net.aueb.gr/michalis [email protected] & [email protected] J.M. Steyaert [email protected] In the last decade the growth of online advertising market is historic with year growth rate more than 20%. Despite that growth, an oligopoly of few advertising companies, mainly search engine firms, share the online advertising market. The most popular internet advertising models for advertisements distribution, like Google AdSense, based on a powerful authority center which has the control over the advertisements distribution. At the same time with the expansion of online markets, new internet communication platforms have emerged with more prominent the peer-to-peer systems (P2P) and social networks with main characteristic the decentralized nature of the network formation and entities communication. References C. Dellarocas. Reputation Mechanisms, in Handbook on Information Systems and Economics, T. Hendershott (ed.), Elsevier Publishing, forthcoming, 2006 Despotovic Z. and Aberer K., 2004, Possibilities for Managing Trust in P2P Networks. Swiss Federal Institute of Technology (EPFL) Technical Report IC/2004/84, Lausanne, Switzerland. Nikhil R. Devanur, Christos H. Papadimitriou, Amin Saberi, and Vijay V. Vazirani. Market equilibrium via a primal-dual-type algorithm. In FOCS, 2002. Objectifs du stage (10 à 20 lignes) Compétences espérées Books: Algorithmic Game theory, http://www.cambridge.org/journals/nisan/downloads/Nisan_Nonprintable.pdf The objective is to provide novel market based methods for the exploitation of economic applications of social and P2P networks. We propose the issue of distributed contextual advertising, where a group of peers/web-sites buy and sell advertising links with each other. All sites could serve both as ad publishers advertising other sites, and as advertisers. The participants (peers) considered as economic agents and the general network concept implemented by a distributed communication platform, with a social network structure. The advertisements allocations have to emerge from the end-to-end communication among the participants and there is not central authority responsible for the advertisements assignments. Proposed methodologies are trust in P2P networks, reputation mechanisms in economic transactions with moral hazard, truthful mechanisms and algorithms for market equilibria. JAVA, MATLAB, familiarity with mathematics, game theory, distributed systems stages d'initiation à la recherche / internships Titre Thématique Laboratoire, institution et université Ville et pays Équipe ou projet dans le labo Nom et adresse électronique du directeur de stage Non linear Dimensionality Reduction Algorithms LIX, Polytechnique, in collaboration with DB-NET lab, in Athens U. of Economics & Business (Athens, Greece) (http://www.db-net.aueb.gr) Athens - Greece LIX Nom et adresse électronique du directeur du laboratoire Présentation générale du domaine (5 à 10 lignes) J.M. Steyaert [email protected] Objectifs du stage (10 à 20 lignes) Compétences espérées Michalis Vazirgiannis http://www.db-net.aueb.gr/michalis [email protected] & [email protected] Methods of dimensionality reduction provide a way to understand and visualize the structure of complex data sets. Traditional methods like principal component analysis and classical metric multidimensional scaling suffer from being based on linear models. Until recently, very few methods were able to reduce the data dimensionality in a nonlinear way. However, since the late nineties, many new methods have been developed and nonlinear dimensionality reduction, also called manifold learning, has become a hot topic. New advances that account for this rapid growth are, e.g. the use of graphs to represent the manifold topology, and the use of new metrics like the geodesic distance. In addition, new optimization schemes, based on kernel techniques and spectral decomposition, have lead to spectral embedding, which encompasses many of the recently developed methods. The purpose of this essay is manifold. Firstly to summarize clear facts and ideas about well-known methods as well as recent developments in the topic of nonlinear dimensionality reduction. With this goal in mind, methods will all be described from a unifying point of view, in order to highlight their respective strengths and shortcomings. For each method, the description will start from intuitive ideas, develop the necessary mathematical details, and end by outlining the algorithmic implementation. Methods will all be compared with each other with the help of different illustrative examples. An the y will be seen from a unifying point of view, facilitating a thorough comparison. Finally the most prominent methods will be compared in a real life problem (i.e. text mining/classification), aiming to reduce the dimensionality of numerical databases. - Sound background in Linear Algebra - Good programming skill (matlab / C or C++) stages d'initiation à la recherche / internships Predictive modelling large evolving graphs - the case of WWW Thématique bases de données, machine learning Laboratoire, LIX, Polytechnique institution et in collaboration with université DB-NET lab, in Athens U. of Economics & Business (Athens, Greece) (http://www.db-net.aueb.gr) Ville et pays Athens - Greece Équipe ou projet LIX dans le labo Nom et adresse Michalis Vazirgiannis électronique du http://www.db-net.aueb.gr/michalis directeur de [email protected] & [email protected] stage Nom et adresse J.M. Steyaert électronique du [email protected] directeur du laboratoire Présentation The Web is a highly dynamic structure continuously changing, as générale du web pages and hyperlinks are created, deleted, or modified. Due domaine (5 à 10 to its immense size, dynamism and economic interest, significant lignes) research and industrial efforts have been devoted in the last decade towards effective web management and search. Specific research areas include text mining, social network analysis, computational linguistics, business and marketing intelligence, graph theory and data visualization. The main activity on the Web is searching for content that matches users’ keyword-based queries. Ranking of the results is a cornerstone process enabling users to effectively retrieve relevant and important information. Titre PageRank is the dominant algorithm used for ranking web search results and has received significant attention in the related research. PageRank computation, considering search engine scale graph sizes (at the order of 109 nodes) however, is a task that requires tremendous computing resources as, in principle; it involves repetitive matrix multiplications of the aforementioned graph size. Moreover, the ranking algorithm should be applied on recent web graph snapshots in order to guarantee fresh and accurate ranking results. Although several techniques for accelerating PageRank computations or estimating unknown parts of the web structure have been proposed the motivation for prediction of web page ranking is valid and justified. Assuming high quality page rank predictions, search engines can be partially relieved from the huge effort of continuous crawling, indexing and page rank computations with regards to the already known part of the web graph. Objectifs du stage (10 à 20 We propose a research project that aims at designing a framework for predicting the ranking position of a Web page based on lignes) previous rankings. Assuming a set of successive past top-k rankings, we study the evolution of Web pages’ rankings in terms of ranking trend sequences. We learn predictors from these sequences and use them to predict future rankings of pages in query based top-k rankings. Different methods (such as Markov models, Regression, EM based clustering) have been exploited to learn the optimal predictors for each case, On the other hand multiple pages’ features will be used to learn predictors of higher precision. An interesting issue that will be researched is the temporal robustness of the predictions. The prediction quality is quantified as the similarity between the predicted and the actual rankings and compared as well to alternative baseline predictors. The framework will undergo extensive experiments on real world datasets for global and querybased top-k rankings. The expected outcomes – assuming successful and robust rank predictions include: a. enabling more effective search engine resource management in terms of crawling, index update, recomputation of ranking etc. and b. effective ads pricing policy for pages that are predicted to increase/decrease their rank. Another potential outcome can be historical top-k queries – i.e. querying the past snapshots of the web graph. Compétences espérées Context: Assume web pages p_i, each characterized by features a set of features F={f_j} with values F(p_i), and time ordered rankings, r(p_i, F(p_i), t_k) where t_k a time instance (assume t_k+1 > t_k). The objective is to learn the weights of a hidden ranking function W = sum(w_i * f_i) and subsequently to learn predictive models in order to predict r(p_i, F(p_i), t_k+o), where o in {1,2,3…}. Candidate prediction approaches: - Markov models - Multiclass learning - Unsupervised approaches - Good understanding of Linear Algebra - Analytical abilities - Good programming skills (web programming, Java, SQL).