Target
HaskellでGoogle Complete Suggestion APIの結果(XML)をパージングして、候補クエリの配列(の形をした文字列)を取得する。
Environment
- OS
- Linux 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
- GHC
部分ごとの調査
APIアクセス
Google Complete Suggestion APIは以下のURLにGETすれば取得出来る。
[search words]の部分にはスペース区切りの単語の羅列をurlエンコードして設定する。
http://suggestqueries.google.com/complete/search?hl=ja&ie=utf_8&oe=utf_8&client=toolbar&q=[search words]
HaskellでGETするにはNetwork.HTTPモジュールが便利。
以下のようにしてAPIの結果を文字列で取得できる。
import Control.Monad.Trans (liftIO)
import Network.HTTP
main = do
resultxml <- liftIO $ (Network.HTTP.simpleHTTP (Network.HTTP.getRequest ("http://suggestqueries.google.com/complete/search?hl=ja&ie=utf_8&oe=utf_8&client=toolbar&q=" ++ urlEncode("haskell cabal"))) >>= getResponseBody)
putStrLn resultxml
ここで利用している関数の型は以下。
Prelude> import Network.HTTP
Prelude Network.HTTP> :t Network.HTTP.simpleHTTP
Network.HTTP.simpleHTTP
:: HStream ty =>
Request ty -> IO (Network.Stream.Result (Response ty))
Prelude Network.HTTP> :t Network.HTTP.getRequest
Network.HTTP.getRequest :: String -> Request_String
Prelude Network.HTTP> :t getResponseBody
getResponseBody :: Network.Stream.Result (Response ty) -> IO ty
Prelude Network.HTTP> :t urlEncode
urlEncode :: String -> String
取得した結果のXMLは以下の様な感じ。
<?xml version="1.0"?>
<toplevel>
<CompleteSuggestion>
<suggestion data="haskell cabal" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal sandbox" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal install" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal ã¢ãããã¼ã" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal windows" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal ä¾åé¢ä¿" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal 使ã æ¹" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal ã¢ã³ã¤ã³ã¹ãã¼ã«" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal proxy" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal github" />
</CompleteSuggestion>
</toplevel>
文字化けしてしまっているので、
Just using Codec.Binary.UTF8.String
を使って直す。
import Control.Monad.Trans (liftIO)
import Network.HTTP
import Codec.Binary.UTF8.String as S
main = do
resultxml <- liftIO $ (Network.HTTP.simpleHTTP (Network.HTTP.getRequest ("http://suggestqueries.google.com/complete/search?hl=ja&ie=utf_8&oe=utf_8&client=toolbar&q=" ++ urlEncode("haskell cabal"))) >>= getResponseBody)
putStrLn $ S.decodeString resultxml
ちなみに、そのまんまだけど、”decodeString” 関数の型は以下。
Prelude> import Codec.Binary.UTF8.String as S
Prelude S> :t S.decodeString
S.decodeString :: String -> String
文字化け解消後の結果は以下。
<?xml version="1.0"?>
<toplevel>
<CompleteSuggestion>
<suggestion data="haskell cabal" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal sandbox" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal install" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal アップデート" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal windows" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal 依存関係" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal 使い方" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal アンインストール" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal proxy" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal github" />
</CompleteSuggestion>
</toplevel>
XMLのパース
簡単のために、APIの取得結果の一部分を切り取った以下のXMLに対してパージングしてみる。
<?xml version="1.0"?>
<toplevel>
<CompleteSuggestion>
<suggestion data="haskell cabal" />
</CompleteSuggestion>
</toplevel>
ここでは、Text.XML.Lightモジュールを利用してパージングする。
手始めに、”CompleteSuggestion”要素を取り出してみる。
import Text.XML.Light
main = do
case parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" of
Nothing -> putStrLn "Error"
Just root -> putStrLn $ show $ findChildren (unqual "CompleteSuggestion") root
ここで利用している関数の型は以下。
Prelude> import Text.XML.Light
Prelude Text.XML.Light> :t parseXMLDoc
parseXMLDoc
:: Text.XML.Light.Lexer.XmlSource s => s -> Maybe Element
Prelude Text.XML.Light> parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>"
Just (
Element {
elName = QName {
qName = "toplevel",
qURI = Nothing,
qPrefix = Nothing
},
elAttribs = [],
elContent = [
Elem (
Element {
elName = QName {
qName = "CompleteSuggestion",
qURI = Nothing,
qPrefix = Nothing
},
elAttribs = [],
elContent = [
Elem (
Element {
elName = QName {
qName = "suggestion", qURI = Nothing, qPrefix = Nothing
},
elAttribs = [
Attr {
attrKey = QName {
qName = "data",
qURI = Nothing,
qPrefix = Nothing
},
attrVal = "haskell cabal"
}
],
elContent = [],
elLine = Just 1
}
)
],
elLine = Just 1
}
)
],
elLine = Just 1
}
)
Prelude Text.XML.Light> :t unqual
unqual :: String -> QName
Prelude Text.XML.Light> unqual "CompleteSuggestion"
QName {qName = "CompleteSuggestion", qURI = Nothing, qPrefix = Nothing}
Prelude Text.XML.Light> :t findChildren
findChildren :: QName -> Element -> [Element]
そして結果は以下。
[
Element {
elName = QName {
qName = "CompleteSuggestion",
qURI = Nothing,
qPrefix = Nothing
},
elAttribs = [],
elContent = [
Elem (
Element {
elName = QName {
qName = "suggestion",
qURI = Nothing, qPrefix = Nothing
},
elAttribs = [
Attr {
attrKey = QName {
qName = "data",
qURI = Nothing,
qPrefix = Nothing
},
attrVal = "haskell cabal"
}
],
elContent = [],
elLine = Just 1
}
)
],
elLine = Just 1
}
]
“CompleteSuggestion”要素だけ取得できているから、
次は”suggestion”要素を取得してみる。
import Text.XML.Light
main = do
case parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" of
Nothing -> putStrLn "Error"
Just root -> putStrLn $ show $ findChild (unqual "suggestion") (findChildren (unqual "CompleteSuggestion") root !! 0)
ここで使った”findChild”関数の型は以下。
Prelude Text.XML.Light> :t findChild
findChild :: QName -> Element -> Maybe Element
取得結果は以下。
Just (
Element {
elName = QName {
qName = "suggestion",
qURI = Nothing,
qPrefix = Nothing
},
elAttribs = [
Attr {
attrKey = QName {
qName = "data",
qURI = Nothing,
qPrefix = Nothing
},
attrVal = "haskell cabal"
}
],
elContent = [],
elLine = Just 1
}
)
“suggestion”要素だけ取り出すことができている。
次は、”suggestion”要素の属性を取り出す。
import Text.XML.Light
main = do
case parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" of
Nothing -> putStrLn "Error"
Just root -> putStrLn $ show $ case findChild (unqual "suggestion") (findChildren (unqual "CompleteSuggestion") root !! 0) of
Nothing -> []
Just elem -> elAttribs elem
“elAttribs”の型はこんな感じ。
Prelude Text.XML.Light> :t elAttribs
elAttribs :: Element -> [Attr]
これで属性の配列が取得できる。
[
Attr {
attrKey = QName {
qName = "data",
qURI = Nothing,
qPrefix = Nothing
},
attrVal = "haskell cabal"
}
]
目的の値は”attrVal”のところにあるから、
最後にその取得方法を見てみる。
main = do
case parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" of
Nothing -> putStrLn "Error"
Just root -> putStrLn $ show $ case findChild (unqual "suggestion") (findChildren (unqual "CompleteSuggestion") root !! 0) of
Nothing -> ""
Just elem ->
let
[Attr key val] = elAttribs elem
in
val
これで取得できるようになった。
"haskell cabal"
完全版
実際には、”CompleteSuggestion”要素は複数あるから、
上のように最初の一つを取ってくるだけじゃなくて
全部の”CompleteSuggestion”要素に同じ変換処理が必要になってくる。
そのため、それぞれの”attrVal”を取ってくるために変換処理を関数化する。
import Control.Monad.Trans (liftIO)
import Network.HTTP
import Codec.Binary.UTF8.String as S
import Data.List
import Text.XML.Light
getText :: Element -> String -> String
getText parent childName =
case findChild (unqual childName) parent of
Nothing -> ""
Just child ->
let
[Attr key val] = elAttribs child
in
val
parseCompleteSuggestionTag :: Element -> String
parseCompleteSuggestionTag element = getText element "suggestion"
main = do
resultxml <- liftIO $ (Network.HTTP.simpleHTTP (Network.HTTP.getRequest ("http://suggestqueries.google.com/complete/search?hl=ja&ie=utf_8&oe=utf_8&client=toolbar&q=" ++ urlEncode("haskell cabal"))) >>= getResponseBody)
case parseXMLDoc resultxml of
Nothing -> putStrLn "parse error"
Just root ->
putStrLn ("[\"" ++ (Data.List.intercalate "\",\"" $ map S.decodeString $ map (\a -> parseCompleteSuggestionTag a) (findChildren (unqual "CompleteSuggestion") root)) ++ "\"]")
これで完成。
実行結果はこんな感じ。
[
"haskell cabal",
"haskell cabal sandbox",
"haskell cabal install",
"haskell cabal アップデート",
"haskell cabal windows",
"haskell cabal 依存関係",
"haskell cabal 使い方",
"haskell cabal アンインストール",
"haskell cabal proxy",
"haskell cabal github"
]
HaskellのウェブフレームワークYesodとこんな感じのXMLパージングスクリプト使って
簡単なウェブアプリを作成してみた。
関連語検索ツールブログのタイトルをニーズがありかつライバルが少ない単語の組み合わせで構成するための便利ツール。