CabalFind: Difference between revisions

From HaskellWiki
m (category)
m (Link to author's project page dead)
 
(One intermediate revision by one other user not shown)
Line 49: Line 49:
   rsp <- querySearchEngine Google
   rsp <- querySearchEngine Google
   putStrLn $ "Google search for all packages, results: " ++ (show $ length rsp)
   putStrLn $ "Google search for all packages, results: " ++ (show $ length rsp)
   mapM (putStrLn . show) rsp
   mapM_ print rsp
   let pkglist = ["hsffig", "crypto", "http", "newbinary"]
   let pkglist = ["hsffig", "crypto", "http", "newbinary"]
   rsp <- querySearchEngine (GoogleFiltered pkglist)
   rsp <- querySearchEngine (GoogleFiltered pkglist)
   putStrLn $ "Google search for packages " ++ (show pkglist) ++ ", results: " ++ (show $ length rsp)
   putStrLn $ "Google search for packages " ++ (show pkglist) ++ ", results: " ++ (show $ length rsp)
   mapM (putStrLn . show) rsp
   mapM_ print rsp
</pre>
</pre>


Line 93: Line 93:


[[Category:Tools]]
[[Category:Tools]]
[[Category:Pages with broken file links]]

Latest revision as of 22:21, 23 April 2021

This topic was inspired by the CabalGet page, and I initially placed this at its end, and now has been moved to a separate page.

Now that there is "cabal-get", has anybody ever thought of "cabal-list" or "cabal-find"?

There is a way to get a list of Cabal files known to search engines. This makes a zero-cost Cabal packages announce and discovery tool: place your Cabal package (most frequently a darcs repo) somewhere Google Bot or Yahoo Slurp may index it, and eventually information about your package will show up in search results.

Google has a feature to narrow search results to files with given suffixes (extensions).

Consider this Google query:

http://www.google.com/search?q=name+version+filetype:cabal&hl=en&lr=&c2coff=1&filter=0 (page 1)

http://www.google.com/search?q=name+version+filetype:cabal&hl=en&lr=&c2coff=1&start=10&sa=N&filter=0 (page 2)

http://www.google.com/search?q=name+version+filetype:cabal&hl=en&lr=&c2coff=1&start=20&sa=N&filter=0 (page 3)

and I guess, "start" needs to be incremented by 10 for every next page. The last page will not have the link "Next". Or use Google API.

Name: and Version: are mandatory fields per Cabal specification (are there any others?). This query currently returns 28 results, and all what is needed for one's package to appear there is to make the package directory (darcs repo) be visible to Google'Bot.

It will be necessary to parse the returned HTML for <a> tags containing filenames ending with ".cabal".

Interestingly, Yahoo has a similar feature to filter by file extension:

http://search.yahoo.com/search?ei=UTF-8&p=name+version+originurlextension%3Acabal&xargs=0&pstart=1&fr=sfp&dups=1

also results in a list of URLs pointing to cabal files.

It makes sense to look for generalization of search engine interfaces.

An experimental utility based on these ideas is now work in progress...

And here it is:

darcs get http://www.golubovsky.org/repos/cabalfind/

An example of a program (assuming that the package CabalFind is installed):

-- cbftest.hs

module Main where

import Control.Monad
import CabalFind.SearchEngine
import CabalFind.GoogleSearch

main =do
   rsp <- querySearchEngine Google
   putStrLn $ "Google search for all packages, results: " ++ (show $ length rsp)
   mapM_ print rsp
   let pkglist = ["hsffig", "crypto", "http", "newbinary"]
   rsp <- querySearchEngine (GoogleFiltered pkglist)
   putStrLn $ "Google search for packages " ++ (show pkglist) ++ ", results: " ++ (show $ length rsp)
   mapM_ print rsp

The program may be build as follows:

ghc --make -package CabalFind -o cbftest cbftest.hs 

A generalized interface to search engines is defined in [1]. It is assumed that a search engine returns some HTML page containing URIs of .cabal files, labelled in some way. The implementation of a concrete search engine interface (see [2] for example) defines some "methods" to handle URIs returned by the search engine:

  • which URIs point to the .cabal files (target URIs)
  • which URI may be used to continue search (get the next page)
  • how to rewrite URIs (some search engines, e. g. Yahoo, return target URIs encapsulated in other URIs, see the Yahoo search engine interface implementation, http://www.golubovsky.org/repos/cabalfind/CabalFind/GoogleSearch.hs).

The CabalFind library provides a function,

querySearchEngine :: SearchEngine s => s        -- search engine request
                  -> IO [(URI,String)]          -- returned list of (URI, label) pairs

which takes an interface to a search engine (e. g. Google, Yahoo, GoogleFiltered) and returns a list of URI-label pairs.

The function querySearchEngine does not check for validity of returned URIs: it only collects responses from search engines and invokes appropriate methods specific to concrete search engines to filter results properly.


As of 12/29/2005, the GHC 6.4 compatible version of CabalFind is available at the same location:

darcs get http://www.golubovsky.org/repos/cabalfind/

For an older 6.2.2 compatible version please extract the tag `worked_with_ghc_6.2.2'.


User:DimitryGolubovsky