(Migrated from Hawiki)
m (Fixed markup)
|Line 75:||Line 75:|
which takes an interface to a search engine (e. g. Google, Yahoo,
which takes an interface to a search engine (e. g. Google, Yahoo, ) and returns
a list of URI-label pairs.
a list of URI-label pairs.
Revision as of 18:58, 3 October 2006
This topic was inspired by the CabalGet page, and I initially placed this at its end, and now has been moved to a separate page.
Now that there is "cabal-get", has anybody ever thought of "cabal-list" or "cabal-find"?
There is a way to get a list of Cabal files known to search engines. This makes a zero-cost Cabal packages announce and discovery tool: place your Cabal package (most frequently a darcs repo) somewhere Google Bot or Yahoo Slurp may index it, and eventually information about your package will show up in search results.
Google has a feature to narrow search results to files with given suffixes (extensions).
Consider this Google query:
and I guess, "start" needs to be incremented by 10 for every next page. The last page will not have the link "Next". Or use Google API.
Name: and Version: are mandatory fields per Cabal specification (are there any others?). This query currently returns 28 results, and all what is needed for one's package to appear there is to make the package directory (darcs repo) be visible to Google'Bot.
It will be necessary to parse the returned HTML for <a> tags containing filenames ending with ".cabal".
Interestingly, Yahoo has a similar feature to filter by file extension:
also results in a list of URLs pointing to cabal files.
It makes sense to look for generalization of search engine interfaces.
An experimental utility based on these ideas is now work in progress...
And here it is:
An example of a program (assuming that the package CabalFind is installed):
-- cbftest.hs module Main where import Control.Monad import CabalFind.SearchEngine import CabalFind.GoogleSearch main =do rsp <- querySearchEngine Google putStrLn $ "Google search for all packages, results: " ++ (show $ length rsp) mapM (putStrLn . show) rsp let pkglist = ["hsffig", "crypto", "http", "newbinary"] rsp <- querySearchEngine (GoogleFiltered pkglist) putStrLn $ "Google search for packages " ++ (show pkglist) ++ ", results: " ++ (show $ length rsp) mapM (putStrLn . show) rsp
The program may be build as follows:
ghc --make -package CabalFind -o cbftest cbftest.hs
A generalized interface to search engines is defined in . It is assumed that a search engine returns some HTML page containing URIs of .cabal files, labelled in some way. The implementation of a concrete search engine interface (see  for example) defines some "methods" to handle URIs returned by the search engine:
- which URIs point to the .cabal files (target URIs)
- which URI may be used to continue search (get the next page)
- how to rewrite URIs (some search engines, e. g. Yahoo, return target URIs encapsulated in other URIs, see the Yahoo search engine interface implementation, http://www.golubovsky.org/repos/cabalfind/CabalFind/GoogleSearch.hs).
The CabalFind library provides a function,
querySearchEngine :: SearchEngine s => s -- search engine request -> IO [(URI,String)] -- returned list of (URI, label) pairs
which takes an interface to a search engine (e. g. Google, Yahoo, GoogleFiltered) and returns a list of URI-label pairs.
The function querySearchEngine does not check for validity of returned URIs: it only collects responses from search engines and invokes appropriate methods specific to concrete search engines to filter results properly.
As of 12/29/2005, the GHC 6.4 compatible version of CabalFind is available at the same location:
For an older 6.2.2 compatible version please extract the tag `worked_with_ghc_6.2.2'.